All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports
@ 2019-03-01 18:04 Jakub Kicinski
  2019-03-01 18:04 ` [PATCH net-next v2 1/7] nfp: split devlink port init from registration Jakub Kicinski
                   ` (9 more replies)
  0 siblings, 10 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:04 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Hi!

This series is a long overdue follow up to Jiri's work on providing
a common .ndo_phys_port_name implementation based on devlink ports.

First devlink port flavours for PF and VF ports are added, and
registered by the NFP. Port numbers and split info are reserved
for physical and DSA ports. For PCIe ports we add pf/vf identifiers.
Note that devices may have more than one PF, including multi host
scenarios where not all pfs are connected to the same host.

The port_index is slightly tricky to figure out, we use a bit of
math to create unique IDs for ports.

Next subports for PCIe ports are introduced. This is in case device
has more than one netdev on a physical function (e.g. multi port
SmartNIC).

With the above features in place we can remove the ndo_phys_port_name
implementation from the NFP and use the standard devlink one for
port netdevs.

Last but not least a concept of peer netdevs is added. In multi-host
scenarios its currently not possible to figure out which PF is
associated with the local host. Peer device is "the other side
of the wire" for PCIe ports. In case of NFP we only link the PF
netdevs, but it should be possible to also link VF peers after
VF driver probes, if users request it.

This is the conceptual image of devlink instances:

                    HOST A             ||          HOST B
                                       ||
        PF A       | V | V | V | V     ||       PF B        | V | V | V
                   |*F |*F |*F |*F ... ||                   |*F |*F |*F
*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2
                                       ||
             PCI Express link          ||        PCI Express link
        \      \      \  |   |   |          |       |      /   /   /
         \      \      \ |   |   |          |       |     /   /   /
      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||
  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
d n H ||  |               <<==========                              |   || d n H
e s O ||  |                            ==========>>                 |   || e s O
v t S ||  |                    SR-IOV e-switch                      |   || v t S
l a T ||  |               <<==========                              |   || l a T
i n   ||  |                            ==========>>                 |   || i n
n c A ||  |               ________ _________ ________               |   || n c B
k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
      ||  \---------------------------------------------------------/   ||
      \/                      |        |         |                      \/
                              |        |         |
                                 ||         ||
                          MAC 0  ||  MAC 1  || MAC 2
                                 ||         ||

 '+' marks the devlink ports and port (-representor-) netdevs
 '*' marks host netdevs (actual VF/PF netdev)

v2 (Jiri):
 - update devlink user space output in a commit message;
 - split the pci attribute setting helper into pf and vf versions;
 - add peer IBDEV_NAME;
 - add nest for peer attributes.

Jakub Kicinski (7):
  nfp: split devlink port init from registration
  devlink: add PF and VF port flavours
  nfp: register devlink ports of all reprs
  devlink: allow subports on devlink PCI ports
  nfp: switch to devlink_port_get_phys_port_name()
  devlink: introduce port's peer netdevs
  nfp: expose PF peer netdevs

 drivers/net/ethernet/netronome/nfp/abm/main.c |   1 +
 .../net/ethernet/netronome/nfp/flower/main.c  |   1 +
 .../net/ethernet/netronome/nfp/nfp_devlink.c  |  54 +++++-
 .../net/ethernet/netronome/nfp/nfp_net_main.c |  17 +-
 .../net/ethernet/netronome/nfp/nfp_net_repr.c |  16 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.c |  33 +---
 drivers/net/ethernet/netronome/nfp/nfp_port.h |   4 +
 include/net/devlink.h                         |  48 ++++-
 include/uapi/linux/devlink.h                  |  12 ++
 net/core/devlink.c                            | 178 ++++++++++++++++--
 10 files changed, 310 insertions(+), 54 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH net-next v2 1/7] nfp: split devlink port init from registration
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
@ 2019-03-01 18:04 ` Jakub Kicinski
  2019-03-01 18:04 ` [PATCH net-next v2 2/7] devlink: add PF and VF port flavours Jakub Kicinski
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:04 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev, oss-drivers, Jakub Kicinski

In the future we want to switch to using the
devlink_port_get_phys_port_name() helper for .ndo_phys_port_name.
This requires that we initialize the devlink ports before the
netdevs are registered. Split the registration from init.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c    | 13 ++++++++++---
 .../net/ethernet/netronome/nfp/nfp_net_main.c   | 17 ++++++++++++++---
 drivers/net/ethernet/netronome/nfp/nfp_port.h   |  2 ++
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index e9eca99cf493..9af3cb1f2f17 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -350,10 +350,9 @@ const struct devlink_ops nfp_devlink_ops = {
 	.flash_update		= nfp_devlink_flash_update,
 };
 
-int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
 {
 	struct nfp_eth_table_port eth_port;
-	struct devlink *devlink;
 	int ret;
 
 	rtnl_lock();
@@ -366,8 +365,16 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
 	devlink_port_attrs_set(&port->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
 			       eth_port.label_port, eth_port.is_split,
 			       eth_port.label_subport);
+	return 0;
+}
+
+void nfp_devlink_port_clean(struct nfp_port *port)
+{
+}
 
-	devlink = priv_to_devlink(app->pf);
+int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
+{
+	struct devlink *devlink = priv_to_devlink(app->pf);
 
 	return devlink_port_register(devlink, &port->dl_port, port->eth_id);
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 08f5fdbd8e41..39e87bb73ebf 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -150,9 +150,15 @@ nfp_net_pf_init_vnic(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id)
 
 	nn->id = id;
 
+	if (nn->port) {
+		err = nfp_devlink_port_init(pf->app, nn->port);
+		if (err)
+			return err;
+	}
+
 	err = nfp_net_init(nn);
 	if (err)
-		return err;
+		goto err_port_clean;
 
 	nfp_net_debugfs_vnic_add(nn, pf->ddir);
 
@@ -167,17 +173,20 @@ nfp_net_pf_init_vnic(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id)
 	if (nfp_net_is_data_vnic(nn)) {
 		err = nfp_app_vnic_init(pf->app, nn);
 		if (err)
-			goto err_devlink_port_clean;
+			goto err_port_unreg;
 	}
 
 	return 0;
 
-err_devlink_port_clean:
+err_port_unreg:
 	if (nn->port)
 		nfp_devlink_port_unregister(nn->port);
 err_dfs_clean:
 	nfp_net_debugfs_dir_clean(&nn->debugfs_dir);
 	nfp_net_clean(nn);
+err_port_clean:
+	if (nn->port)
+		nfp_devlink_port_clean(nn->port);
 	return err;
 }
 
@@ -224,6 +233,8 @@ static void nfp_net_pf_clean_vnic(struct nfp_pf *pf, struct nfp_net *nn)
 		nfp_devlink_port_unregister(nn->port);
 	nfp_net_debugfs_dir_clean(&nn->debugfs_dir);
 	nfp_net_clean(nn);
+	if (nn->port)
+		nfp_devlink_port_clean(nn->port);
 }
 
 static int nfp_net_pf_alloc_irqs(struct nfp_pf *pf)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index 90ae053f5c07..09c55ca2371a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -129,6 +129,8 @@ int nfp_net_refresh_eth_port(struct nfp_port *port);
 void nfp_net_refresh_port_table(struct nfp_port *port);
 int nfp_net_refresh_port_table_sync(struct nfp_pf *pf);
 
+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port);
+void nfp_devlink_port_clean(struct nfp_port *port);
 int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port);
 void nfp_devlink_port_unregister(struct nfp_port *port);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH net-next v2 2/7] devlink: add PF and VF port flavours
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
  2019-03-01 18:04 ` [PATCH net-next v2 1/7] nfp: split devlink port init from registration Jakub Kicinski
@ 2019-03-01 18:04 ` Jakub Kicinski
  2019-03-01 18:04 ` [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs Jakub Kicinski
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:04 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Current port flavours cover simple switches and DSA.  Add PF
and VF flavours to cover "switchdev" SR-IOV NICs.

Example devlink user space output:

$ devlink port
pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
pci/0000:82:00.0/10000: type eth netdev eth0 flavour pci_pf pf 0
pci/0000:82:00.0/10001: type eth netdev eth1 flavour pci_vf pf 0 vf 0
pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1

$ devlink -jp port
{
    "port": {
        "pci/0000:82:00.0/0": {
            "type": "eth",
            "netdev": "p4p1",
            "flavour": "physical"
        },
        "pci/0000:82:00.0/10000": {
            "type": "eth",
            "netdev": "eth0",
            "flavour": "pci_pf",
            "pf": 0,
        },
        "pci/0000:82:00.0/10001": {
            "type": "eth",
            "netdev": "eth1",
            "flavour": "pci_vf",
            "pf": 0,
            "vf": 0
        },
        "pci/0000:82:00.0/10002": {
            "type": "eth",
            "netdev": "eth2",
            "flavour": "pci_vf",
            "pf": 0,
            "vf": 1
        }
    }
}

v2:
 - fix old output in commit message s/pcie_/pci_/ (Jiri);
 - split the pci helper into separate ones for PF and VF (Jiri);
 - flip the condition in WARN_ON for devlink_port_attrs_set() to
   whitelist from blacklist.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h        | 32 +++++++++++--
 include/uapi/linux/devlink.h |  5 ++
 net/core/devlink.c           | 88 ++++++++++++++++++++++++++++++++----
 3 files changed, 113 insertions(+), 12 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7f5a0bdca228..00ceff76762c 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -42,9 +42,19 @@ struct devlink {
 struct devlink_port_attrs {
 	bool set;
 	enum devlink_port_flavour flavour;
-	u32 port_number; /* same value as "split group" */
-	bool split;
-	u32 split_subport_number;
+	union { /* port identifiers differ per-flavour */
+		/* PHYSICAL, CPU, DSA */
+		struct {
+			bool split;
+			u32 split_subport_number;
+			u32 port_number; /* same value as "split group" */
+		};
+		 /* PCI_PF, PCI_VF */
+		struct {
+			u32 pf_number;
+			u32 vf_number;
+		} pci;
+	};
 };
 
 struct devlink_port {
@@ -568,6 +578,10 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    enum devlink_port_flavour flavour,
 			    u32 port_number, bool split,
 			    u32 split_subport_number);
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
+				   u32 pf_number);
+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+				   u32 pf_number, u32 vf_number);
 int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 				    char *name, size_t len);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
@@ -782,6 +796,18 @@ static inline void devlink_port_attrs_set(struct devlink_port *devlink_port,
 {
 }
 
+static inline void
+devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
+			      u32 pf_number)
+{
+}
+
+static inline void
+devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+			      u32 pf_number, u32 vf_number)
+{
+}
+
 static inline int
 devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 				char *name, size_t len)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 5bb4ea67d84f..9ce76d4f640d 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -167,6 +167,8 @@ enum devlink_port_flavour {
 	DEVLINK_PORT_FLAVOUR_DSA, /* Distributed switch architecture
 				   * interconnect port.
 				   */
+	DEVLINK_PORT_FLAVOUR_PCI_PF, /* PCI Physical function port */
+	DEVLINK_PORT_FLAVOUR_PCI_VF, /* PCI Physical function port */
 };
 
 enum devlink_param_cmode {
@@ -332,6 +334,9 @@ enum devlink_attr {
 	DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME,	/* string */
 	DEVLINK_ATTR_FLASH_UPDATE_COMPONENT,	/* string */
 
+	DEVLINK_ATTR_PORT_PCI_PF_NUMBER,	/* u32 */
+	DEVLINK_ATTR_PORT_PCI_VF_NUMBER,	/* u32 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6515fbec0dcd..49216b688c5b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -516,16 +516,35 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		return 0;
 	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour))
 		return -EMSGSIZE;
-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, attrs->port_number))
-		return -EMSGSIZE;
-	if (!attrs->split)
+
+	switch (attrs->flavour) {
+	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
+	case DEVLINK_PORT_FLAVOUR_CPU:
+	case DEVLINK_PORT_FLAVOUR_DSA:
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER,
+				attrs->port_number))
+			return -EMSGSIZE;
+
+		if (attrs->split &&
+		    (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP,
+				 attrs->port_number) ||
+		     nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER,
+				 attrs->split_subport_number)))
+			return -EMSGSIZE;
 		return 0;
-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP, attrs->port_number))
-		return -EMSGSIZE;
-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER,
-			attrs->split_subport_number))
-		return -EMSGSIZE;
-	return 0;
+	case DEVLINK_PORT_FLAVOUR_PCI_VF:
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
+				attrs->pci.vf_number))
+			return -EMSGSIZE;
+		/* fall through */
+	case DEVLINK_PORT_FLAVOUR_PCI_PF:
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
+				attrs->pci.pf_number))
+			return -EMSGSIZE;
+		return 0;
+	default:
+		return -EINVAL;
+	}
 }
 
 static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
@@ -5411,6 +5430,10 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 
+	WARN_ON(flavour != DEVLINK_PORT_FLAVOUR_PHYSICAL &&
+		flavour != DEVLINK_PORT_FLAVOUR_CPU &&
+		flavour != DEVLINK_PORT_FLAVOUR_DSA);
+
 	attrs->set = true;
 	attrs->flavour = flavour;
 	attrs->port_number = port_number;
@@ -5420,6 +5443,46 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
 
+/**
+ *	devlink_port_attrs_pci_pf_set - Set port attributes for a PCI PF port
+ *
+ *	@devlink_port: devlink port
+ *	@pf_number: PCI PF number, in multi-host mapping to hosts depends
+ *	            on the platform
+ */
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
+				   u32 pf_number)
+{
+	struct devlink_port_attrs *attrs = &devlink_port->attrs;
+
+	attrs->set = true;
+	attrs->flavour = DEVLINK_PORT_FLAVOUR_PCI_PF;
+	attrs->pci.pf_number = pf_number;
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+}
+EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
+
+/**
+ *	devlink_port_attrs_pci_vf_set - Set port attributes for a PCI VF port
+ *
+ *	@devlink_port: devlink port
+ *	@pf_number: PCI PF number, in multi-host mapping to hosts depends
+ *	            on the platform
+ *	@vf_number: PCI VF number within given PF (ignored for PF itself)
+ */
+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+				   u32 pf_number, u32 vf_number)
+{
+	struct devlink_port_attrs *attrs = &devlink_port->attrs;
+
+	attrs->set = true;
+	attrs->flavour = DEVLINK_PORT_FLAVOUR_PCI_VF;
+	attrs->pci.pf_number = pf_number;
+	attrs->pci.vf_number = vf_number;
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+}
+EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
+
 int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 				    char *name, size_t len)
 {
@@ -5444,6 +5507,13 @@ int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 		 */
 		WARN_ON(1);
 		return -EINVAL;
+	case DEVLINK_PORT_FLAVOUR_PCI_PF:
+		n = snprintf(name, len, "pf%u", attrs->pci.pf_number);
+		break;
+	case DEVLINK_PORT_FLAVOUR_PCI_VF:
+		n = snprintf(name, len, "pf%uvf%u",
+			     attrs->pci.pf_number, attrs->pci.vf_number);
+		break;
 	}
 
 	if (n >= len)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
  2019-03-01 18:04 ` [PATCH net-next v2 1/7] nfp: split devlink port init from registration Jakub Kicinski
  2019-03-01 18:04 ` [PATCH net-next v2 2/7] devlink: add PF and VF port flavours Jakub Kicinski
@ 2019-03-01 18:04 ` Jakub Kicinski
  2019-03-02  8:43   ` Jiri Pirko
  2019-03-01 18:04 ` [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports Jakub Kicinski
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:04 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Register all representors as devlink ports.

The port_index is slightly tricky to figure out, we use a bit of
arbitrary math to create unique IDs for PCI ports.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c  | 40 ++++++++++++++++++-
 .../net/ethernet/netronome/nfp/nfp_net_repr.c | 16 +++++++-
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 9af3cb1f2f17..bf7fd9614152 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -350,7 +350,8 @@ const struct devlink_ops nfp_devlink_ops = {
 	.flash_update		= nfp_devlink_flash_update,
 };
 
-int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
+static int
+nfp_devlink_port_init_phys(struct devlink *devlink, struct nfp_port *port)
 {
 	struct nfp_eth_table_port eth_port;
 	int ret;
@@ -368,6 +369,27 @@ int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
 	return 0;
 }
 
+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
+{
+	struct devlink *devlink = priv_to_devlink(app->pf);
+
+	switch (port->type) {
+	case NFP_PORT_PHYS_PORT:
+		return nfp_devlink_port_init_phys(devlink, port);
+	case NFP_PORT_PF_PORT:
+		devlink_port_type_eth_set(&port->dl_port, port->netdev);
+		devlink_port_attrs_pci_pf_set(&port->dl_port, port->pf_id);
+		return 0;
+	case NFP_PORT_VF_PORT:
+		devlink_port_type_eth_set(&port->dl_port, port->netdev);
+		devlink_port_attrs_pci_vf_set(&port->dl_port, port->pf_id,
+					      port->vf_id);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 void nfp_devlink_port_clean(struct nfp_port *port)
 {
 }
@@ -376,7 +398,21 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
 {
 	struct devlink *devlink = priv_to_devlink(app->pf);
 
-	return devlink_port_register(devlink, &port->dl_port, port->eth_id);
+	switch (port->type) {
+	case NFP_PORT_PHYS_PORT:
+		return devlink_port_register(devlink, &port->dl_port,
+					     port->eth_id);
+	case NFP_PORT_PF_PORT:
+		return devlink_port_register(devlink, &port->dl_port,
+					     (port->pf_id + 1) * 10000 +
+					     port->pf_split_id * 1000);
+	case NFP_PORT_VF_PORT:
+		return devlink_port_register(devlink, &port->dl_port,
+					     (port->pf_id + 1) * 10000 +
+					     port->vf_id + 1);
+	default:
+		return -EINVAL;
+	}
 }
 
 void nfp_devlink_port_unregister(struct nfp_port *port)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index d2c803bb4e56..869d22760a6e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -292,7 +292,9 @@ nfp_repr_transfer_features(struct net_device *netdev, struct net_device *lower)
 
 static void nfp_repr_clean(struct nfp_repr *repr)
 {
+	nfp_devlink_port_unregister(repr->port);
 	unregister_netdev(repr->netdev);
+	nfp_devlink_port_clean(repr->port);
 	nfp_app_repr_clean(repr->app, repr->netdev);
 	dst_release((struct dst_entry *)repr->dst);
 	nfp_port_free(repr->port);
@@ -395,12 +397,24 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev,
 	if (err)
 		goto err_clean;
 
-	err = register_netdev(netdev);
+	err = nfp_devlink_port_init(app, repr->port);
 	if (err)
 		goto err_repr_clean;
 
+	err = register_netdev(netdev);
+	if (err)
+		goto err_port_clean;
+
+	err = nfp_devlink_port_register(app, repr->port);
+	if (err)
+		goto err_unreg_netdev;
+
 	return 0;
 
+err_unreg_netdev:
+	unregister_netdev(repr->netdev);
+err_port_clean:
+	nfp_devlink_port_clean(repr->port);
 err_repr_clean:
 	nfp_app_repr_clean(app, netdev);
 err_clean:
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
                   ` (2 preceding siblings ...)
  2019-03-01 18:04 ` [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs Jakub Kicinski
@ 2019-03-01 18:04 ` Jakub Kicinski
  2019-03-02  9:41   ` Jiri Pirko
  2019-03-04 11:08   ` Jiri Pirko
  2019-03-01 18:04 ` [PATCH net-next v2 5/7] nfp: switch to devlink_port_get_phys_port_name() Jakub Kicinski
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:04 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev, oss-drivers, Jakub Kicinski

PCI endpoint corresponds to a PCI device, but such device
can have one more more logical device ports associated with it.
We need a way to distinguish those. Add a PCI subport in the
dumps and print the info in phys_port_name appropriately.

This is not equivalent to port splitting, there is no split
group. It's just a way of representing multiple netdevs on
a single PCI function.

Note that the quality of being multiport pertains only to
the PCI function itself. A PF having multiple netdevs does
not mean that its VFs will also have multiple, or that VFs
are associated with any particular port of a multiport VF.

Example (bus 05 device has subports, bus 82 has only one port per
function):

$ devlink port
pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1
pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
pci/0000:82:00.0/10000: type eth netdev eth0 flavour pci_pf pf 0

$ devlink -jp port
{
    "port": {
        "pci/0000:05:00.0/0": {
            "type": "eth",
            "netdev": "enp5s0np0",
            "flavour": "physical"
        },
        "pci/0000:05:00.0/10000": {
            "type": "eth",
            "netdev": "enp5s0npf0s0",
            "flavour": "pci_pf",
            "pf": 0,
            "subport": 0
        },
        "pci/0000:05:00.0/4": {
            "type": "eth",
            "netdev": "enp5s0np1",
            "flavour": "physical"
        },
        "pci/0000:05:00.0/11000": {
            "type": "eth",
            "netdev": "enp5s0npf0s1",
            "flavour": "pci_pf",
            "pf": 0,
            "subport": 1
        },
        "pci/0000:82:00.0/0": {
            "type": "eth",
            "netdev": "p4p1",
            "flavour": "physical"
        },
        "pci/0000:82:00.0/10000": {
            "type": "eth",
            "netdev": "eth0",
            "flavour": "pci_pf",
            "pf": 0
        }
    }
}

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c  |  6 ++--
 include/net/devlink.h                         | 13 ++++---
 include/uapi/linux/devlink.h                  |  1 +
 net/core/devlink.c                            | 36 ++++++++++++++++---
 4 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index bf7fd9614152..6ad2805f1efc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -378,12 +378,14 @@ int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
 		return nfp_devlink_port_init_phys(devlink, port);
 	case NFP_PORT_PF_PORT:
 		devlink_port_type_eth_set(&port->dl_port, port->netdev);
-		devlink_port_attrs_pci_pf_set(&port->dl_port, port->pf_id);
+		devlink_port_attrs_pci_pf_set(&port->dl_port, port->pf_id,
+					      port->pf_split,
+					      port->pf_split_id);
 		return 0;
 	case NFP_PORT_VF_PORT:
 		devlink_port_type_eth_set(&port->dl_port, port->netdev);
 		devlink_port_attrs_pci_vf_set(&port->dl_port, port->pf_id,
-					      port->vf_id);
+					      port->vf_id, false, 0);
 		return 0;
 	default:
 		return -EINVAL;
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 00ceff76762c..6a29ce80cb38 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -53,6 +53,8 @@ struct devlink_port_attrs {
 		struct {
 			u32 pf_number;
 			u32 vf_number;
+			bool multiport;
+			u32 subport_number;
 		} pci;
 	};
 };
@@ -579,9 +581,11 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    u32 port_number, bool split,
 			    u32 split_subport_number);
 void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
-				   u32 pf_number);
+				   u32 pf_number, bool multiport,
+				   u32 subport_number);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
-				   u32 pf_number, u32 vf_number);
+				   u32 pf_number, u32 vf_number, bool multiport,
+				   u32 subport_number);
 int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 				    char *name, size_t len);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
@@ -798,13 +802,14 @@ static inline void devlink_port_attrs_set(struct devlink_port *devlink_port,
 
 static inline void
 devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
-			      u32 pf_number)
+			      u32 pf_number, bool multiport, u32 subport_number)
 {
 }
 
 static inline void
 devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
-			      u32 pf_number, u32 vf_number)
+			      u32 pf_number, u32 vf_number, bool multiport,
+			      u32 subport_number)
 {
 }
 
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 9ce76d4f640d..417ae8233cce 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -336,6 +336,7 @@ enum devlink_attr {
 
 	DEVLINK_ATTR_PORT_PCI_PF_NUMBER,	/* u32 */
 	DEVLINK_ATTR_PORT_PCI_VF_NUMBER,	/* u32 */
+	DEVLINK_ATTR_PORT_PCI_SUBPORT,		/* u32 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 49216b688c5b..a7dd958be513 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -541,6 +541,11 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
 				attrs->pci.pf_number))
 			return -EMSGSIZE;
+
+		if (attrs->pci.multiport &&
+		    nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SUBPORT,
+				attrs->pci.subport_number))
+			return -EMSGSIZE;
 		return 0;
 	default:
 		return -EINVAL;
@@ -5449,15 +5454,20 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
  *	@devlink_port: devlink port
  *	@pf_number: PCI PF number, in multi-host mapping to hosts depends
  *	            on the platform
+ *	@multiport: PCI function has more than one logical port
+ *	@subport_number: PCI function has more than one logical port
  */
 void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
-				   u32 pf_number)
+				   u32 pf_number, bool multiport,
+				   u32 subport_number)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 
 	attrs->set = true;
 	attrs->flavour = DEVLINK_PORT_FLAVOUR_PCI_PF;
 	attrs->pci.pf_number = pf_number;
+	attrs->pci.multiport = multiport;
+	attrs->pci.subport_number = subport_number;
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
@@ -5469,9 +5479,12 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
  *	@pf_number: PCI PF number, in multi-host mapping to hosts depends
  *	            on the platform
  *	@vf_number: PCI VF number within given PF (ignored for PF itself)
+ *	@multiport: PCI function has more than one logical port
+ *	@subport_number: PCI function has more than one logical port
  */
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
-				   u32 pf_number, u32 vf_number)
+				   u32 pf_number, u32 vf_number, bool multiport,
+				   u32 subport_number)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 
@@ -5479,6 +5492,8 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
 	attrs->flavour = DEVLINK_PORT_FLAVOUR_PCI_VF;
 	attrs->pci.pf_number = pf_number;
 	attrs->pci.vf_number = vf_number;
+	attrs->pci.multiport = multiport;
+	attrs->pci.subport_number = subport_number;
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
@@ -5508,11 +5523,22 @@ int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 		WARN_ON(1);
 		return -EINVAL;
 	case DEVLINK_PORT_FLAVOUR_PCI_PF:
-		n = snprintf(name, len, "pf%u", attrs->pci.pf_number);
+		if (!attrs->pci.multiport)
+			n = snprintf(name, len, "pf%u", attrs->pci.pf_number);
+		else
+			n = snprintf(name, len, "pf%us%u", attrs->pci.pf_number,
+				     attrs->pci.subport_number);
 		break;
 	case DEVLINK_PORT_FLAVOUR_PCI_VF:
-		n = snprintf(name, len, "pf%uvf%u",
-			     attrs->pci.pf_number, attrs->pci.vf_number);
+		if (!attrs->pci.multiport)
+			n = snprintf(name, len, "pf%uvf%u",
+				     attrs->pci.pf_number,
+				     attrs->pci.vf_number);
+		else
+			n = snprintf(name, len, "pf%uvf%us%u",
+				     attrs->pci.pf_number,
+				     attrs->pci.vf_number,
+				     attrs->pci.subport_number);
 		break;
 	}
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH net-next v2 5/7] nfp: switch to devlink_port_get_phys_port_name()
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
                   ` (3 preceding siblings ...)
  2019-03-01 18:04 ` [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports Jakub Kicinski
@ 2019-03-01 18:04 ` Jakub Kicinski
  2019-03-01 18:04 ` [PATCH net-next v2 6/7] devlink: introduce port's peer netdevs Jakub Kicinski
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:04 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Now that devlink understands all port flavours - switch
to the devlink_port_get_phys_port_name() helper.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_port.c | 33 +------------------
 1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c b/drivers/net/ethernet/netronome/nfp/nfp_port.c
index 93c5bfc0510b..3e2ff8d35e8d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
@@ -117,44 +117,13 @@ struct nfp_eth_table_port *nfp_port_get_eth_port(struct nfp_port *port)
 int
 nfp_port_get_phys_port_name(struct net_device *netdev, char *name, size_t len)
 {
-	struct nfp_eth_table_port *eth_port;
 	struct nfp_port *port;
-	int n;
 
 	port = nfp_port_from_netdev(netdev);
 	if (!port)
 		return -EOPNOTSUPP;
 
-	switch (port->type) {
-	case NFP_PORT_PHYS_PORT:
-		eth_port = __nfp_port_get_eth_port(port);
-		if (!eth_port)
-			return -EOPNOTSUPP;
-
-		if (!eth_port->is_split)
-			n = snprintf(name, len, "p%d", eth_port->label_port);
-		else
-			n = snprintf(name, len, "p%ds%d", eth_port->label_port,
-				     eth_port->label_subport);
-		break;
-	case NFP_PORT_PF_PORT:
-		if (!port->pf_split)
-			n = snprintf(name, len, "pf%d", port->pf_id);
-		else
-			n = snprintf(name, len, "pf%ds%d", port->pf_id,
-				     port->pf_split_id);
-		break;
-	case NFP_PORT_VF_PORT:
-		n = snprintf(name, len, "pf%dvf%d", port->pf_id, port->vf_id);
-		break;
-	default:
-		return -EOPNOTSUPP;
-	}
-
-	if (n >= len)
-		return -EINVAL;
-
-	return 0;
+	return devlink_port_get_phys_port_name(&port->dl_port, name, len);
 }
 
 /**
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH net-next v2 6/7] devlink: introduce port's peer netdevs
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
                   ` (4 preceding siblings ...)
  2019-03-01 18:04 ` [PATCH net-next v2 5/7] nfp: switch to devlink_port_get_phys_port_name() Jakub Kicinski
@ 2019-03-01 18:04 ` Jakub Kicinski
  2019-03-01 18:04 ` [PATCH net-next v2 7/7] nfp: expose PF " Jakub Kicinski
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:04 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Devlink ports represent ports of a switch device (or SR-IOV
NIC which has an embedded switch). In case of SR-IOV when
PCIe PFs are exposed the PFs which are directly connected
to the local machine may also spawn PF netdev (much like
VFs have a port/"repr" and an actual VF netdev).

Allow devlink to expose such linking. There is currently no
way to find out which netdev corresponds to which PF.

Example:

$ devlink port
pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
pci/0000:82:00.0/10000: type eth netdev eth1 flavour pci_pf pf 0 peer_netdev enp130s0
pci/0000:82:00.0/10001: type eth netdev eth0 flavour pci_vf pf 0 vf 0
pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1

v2: - move the peer info into a nested attr.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h        | 11 ++++++
 include/uapi/linux/devlink.h |  6 ++++
 net/core/devlink.c           | 68 +++++++++++++++++++++++++++++++++---
 3 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 6a29ce80cb38..f3ced79a30a8 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -68,6 +68,7 @@ struct devlink_port {
 	enum devlink_port_type type;
 	enum devlink_port_type desired_type;
 	void *type_dev;
+	void *type_peer;
 	struct devlink_port_attrs attrs;
 };
 
@@ -573,6 +574,9 @@ int devlink_port_register(struct devlink *devlink,
 void devlink_port_unregister(struct devlink_port *devlink_port);
 void devlink_port_type_eth_set(struct devlink_port *devlink_port,
 			       struct net_device *netdev);
+void devlink_port_type_eth_set_peer(struct devlink_port *devlink_port,
+				    struct net_device *netdev,
+				    struct net_device *peer);
 void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 			      struct ib_device *ibdev);
 void devlink_port_type_clear(struct devlink_port *devlink_port);
@@ -784,6 +788,13 @@ static inline void devlink_port_type_eth_set(struct devlink_port *devlink_port,
 {
 }
 
+static inline void
+devlink_port_type_eth_set_peer(struct devlink_port *devlink_port,
+			       struct net_device *netdev,
+			       struct net_device *peer)
+{
+}
+
 static inline void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 					    struct ib_device *ibdev)
 {
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 417ae8233cce..34ed03bee9fc 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -338,6 +338,12 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_PCI_VF_NUMBER,	/* u32 */
 	DEVLINK_ATTR_PORT_PCI_SUBPORT,		/* u32 */
 
+	DEVLINK_ATTR_PORT_PEER,			/* nested */
+	DEVLINK_ATTR_PORT_PEER_TYPE,		/* u16 */
+	DEVLINK_ATTR_PORT_PEER_NETDEV_IFINDEX,	/* u32 */
+	DEVLINK_ATTR_PORT_PEER_NETDEV_NAME,	/* string */
+	DEVLINK_ATTR_PORT_PEER_IBDEV_NAME,	/* string */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index a7dd958be513..75c313b5b616 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -552,6 +552,47 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 	}
 }
 
+static int devlink_nl_port_peer_put(struct sk_buff *msg,
+				    struct devlink_port *devlink_port)
+{
+	struct nlattr *peer_attr;
+
+	if (!devlink_port->type_peer)
+		return 0;
+
+	peer_attr = nla_nest_start(msg, DEVLINK_ATTR_PORT_PEER);
+	if (!peer_attr)
+		return -EMSGSIZE;
+
+	/* Peer's type is got to be the same as the port's type */
+	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PEER_TYPE, devlink_port->type))
+		goto cancel_peer_attr;
+
+	if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) {
+		struct net_device *netdev = devlink_port->type_peer;
+
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PEER_NETDEV_IFINDEX,
+				netdev->ifindex) ||
+		    nla_put_string(msg, DEVLINK_ATTR_PORT_PEER_NETDEV_NAME,
+				   netdev->name))
+			goto cancel_peer_attr;
+	}
+	if (devlink_port->type == DEVLINK_PORT_TYPE_IB) {
+		struct ib_device *ibdev = devlink_port->type_peer;
+
+		if (ibdev &&
+		    nla_put_string(msg, DEVLINK_ATTR_PORT_PEER_IBDEV_NAME,
+				   ibdev->name))
+			goto cancel_peer_attr;
+	}
+	nla_nest_end(msg, peer_attr);
+	return 0;
+
+cancel_peer_attr:
+	nla_nest_cancel(msg, peer_attr);
+	return -EMSGSIZE;
+}
+
 static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 				struct devlink_port *devlink_port,
 				enum devlink_command cmd, u32 portid,
@@ -593,6 +634,8 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 	}
 	if (devlink_nl_port_attrs_put(msg, devlink_port))
 		goto nla_put_failure;
+	if (devlink_nl_port_peer_put(msg, devlink_port))
+		goto nla_put_failure;
 
 	genlmsg_end(msg, hdr);
 	return 0;
@@ -5370,10 +5413,11 @@ EXPORT_SYMBOL_GPL(devlink_port_unregister);
 
 static void __devlink_port_type_set(struct devlink_port *devlink_port,
 				    enum devlink_port_type type,
-				    void *type_dev)
+				    void *type_dev, void *type_peer)
 {
 	devlink_port->type = type;
 	devlink_port->type_dev = type_dev;
+	devlink_port->type_peer = type_peer;
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
 }
 
@@ -5387,10 +5431,26 @@ void devlink_port_type_eth_set(struct devlink_port *devlink_port,
 			       struct net_device *netdev)
 {
 	return __devlink_port_type_set(devlink_port,
-				       DEVLINK_PORT_TYPE_ETH, netdev);
+				       DEVLINK_PORT_TYPE_ETH, netdev, NULL);
 }
 EXPORT_SYMBOL_GPL(devlink_port_type_eth_set);
 
+/**
+ *	devlink_port_type_eth_set_peer - Set port type to Ethernet with peer
+ *
+ *	@devlink_port: devlink port
+ *	@netdev: related netdevice
+ *	@peer: for PCIe ports the non-port netdev (actual VF or PF)
+ */
+void devlink_port_type_eth_set_peer(struct devlink_port *devlink_port,
+				    struct net_device *netdev,
+				    struct net_device *peer)
+{
+	return __devlink_port_type_set(devlink_port,
+				       DEVLINK_PORT_TYPE_ETH, netdev, peer);
+}
+EXPORT_SYMBOL_GPL(devlink_port_type_eth_set_peer);
+
 /**
  *	devlink_port_type_ib_set - Set port type to InfiniBand
  *
@@ -5401,7 +5461,7 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 			      struct ib_device *ibdev)
 {
 	return __devlink_port_type_set(devlink_port,
-				       DEVLINK_PORT_TYPE_IB, ibdev);
+				       DEVLINK_PORT_TYPE_IB, ibdev, NULL);
 }
 EXPORT_SYMBOL_GPL(devlink_port_type_ib_set);
 
@@ -5413,7 +5473,7 @@ EXPORT_SYMBOL_GPL(devlink_port_type_ib_set);
 void devlink_port_type_clear(struct devlink_port *devlink_port)
 {
 	return __devlink_port_type_set(devlink_port,
-				       DEVLINK_PORT_TYPE_NOTSET, NULL);
+				       DEVLINK_PORT_TYPE_NOTSET, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(devlink_port_type_clear);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH net-next v2 7/7] nfp: expose PF peer netdevs
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
                   ` (5 preceding siblings ...)
  2019-03-01 18:04 ` [PATCH net-next v2 6/7] devlink: introduce port's peer netdevs Jakub Kicinski
@ 2019-03-01 18:04 ` Jakub Kicinski
  2019-03-02 10:13 ` [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jiri Pirko
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:04 UTC (permalink / raw)
  To: jiri, davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Expose PF netdevs as devlink port's peers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/abm/main.c    | 1 +
 drivers/net/ethernet/netronome/nfp/flower/main.c | 1 +
 drivers/net/ethernet/netronome/nfp/nfp_devlink.c | 3 ++-
 drivers/net/ethernet/netronome/nfp/nfp_port.h    | 2 ++
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c
index 4d4ff5844c47..8d7ff1200fd4 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -113,6 +113,7 @@ nfp_abm_spawn_repr(struct nfp_app *app, struct nfp_abm_link *alink,
 		port->pf_id = alink->abm->pf_id;
 		port->pf_split = app->pf->max_data_vnics > 1;
 		port->pf_split_id = alink->id;
+		port->peer = alink->vnic->dp.netdev;
 		port->vnic = alink->vnic->dp.ctrl_bar;
 	}
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 408089133599..13aa21923bf7 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -300,6 +300,7 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app,
 		if (repr_type == NFP_REPR_TYPE_PF) {
 			port->pf_id = i;
 			port->vnic = priv->nn->dp.ctrl_bar;
+			port->peer = priv->nn->dp.netdev;
 		} else {
 			port->pf_id = 0;
 			port->vf_id = i;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 6ad2805f1efc..ced21a9951aa 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -377,7 +377,8 @@ int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
 	case NFP_PORT_PHYS_PORT:
 		return nfp_devlink_port_init_phys(devlink, port);
 	case NFP_PORT_PF_PORT:
-		devlink_port_type_eth_set(&port->dl_port, port->netdev);
+		devlink_port_type_eth_set_peer(&port->dl_port, port->netdev,
+					       port->peer);
 		devlink_port_attrs_pci_pf_set(&port->dl_port, port->pf_id,
 					      port->pf_split,
 					      port->pf_split_id);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index 09c55ca2371a..c75a25cc5cea 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -51,6 +51,7 @@ enum nfp_port_flags {
  * @eth_forced:	for %NFP_PORT_PHYS_PORT port is forced UP or DOWN, don't change
  * @eth_port:	for %NFP_PORT_PHYS_PORT translated ETH Table port entry
  * @eth_stats:	for %NFP_PORT_PHYS_PORT MAC stats if available
+ * @peer:	for %NFP_PORT_PF_PORT netdev of the actual vNIC, if reachable
  * @pf_id:	for %NFP_PORT_PF_PORT, %NFP_PORT_VF_PORT ID of the PCI PF (0-3)
  * @vf_id:	for %NFP_PORT_VF_PORT ID of the PCI VF within @pf_id
  * @pf_split:	for %NFP_PORT_PF_PORT %true if PCI PF has more than one vNIC
@@ -79,6 +80,7 @@ struct nfp_port {
 		};
 		/* NFP_PORT_PF_PORT, NFP_PORT_VF_PORT */
 		struct {
+			struct net_device *peer;
 			unsigned int pf_id;
 			unsigned int vf_id;
 			bool pf_split;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs
  2019-03-01 18:04 ` [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs Jakub Kicinski
@ 2019-03-02  8:43   ` Jiri Pirko
  2019-03-02 19:07     ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-02  8:43 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Fri, Mar 01, 2019 at 07:04:49PM CET, jakub.kicinski@netronome.com wrote:
>Register all representors as devlink ports.
>
>The port_index is slightly tricky to figure out, we use a bit of
>arbitrary math to create unique IDs for PCI ports.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> .../net/ethernet/netronome/nfp/nfp_devlink.c  | 40 ++++++++++++++++++-
> .../net/ethernet/netronome/nfp/nfp_net_repr.c | 16 +++++++-
> 2 files changed, 53 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>index 9af3cb1f2f17..bf7fd9614152 100644
>--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>@@ -350,7 +350,8 @@ const struct devlink_ops nfp_devlink_ops = {
> 	.flash_update		= nfp_devlink_flash_update,
> };
> 
>-int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
>+static int
>+nfp_devlink_port_init_phys(struct devlink *devlink, struct nfp_port *port)
> {
> 	struct nfp_eth_table_port eth_port;
> 	int ret;
>@@ -368,6 +369,27 @@ int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
> 	return 0;
> }
> 
>+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
>+{
>+	struct devlink *devlink = priv_to_devlink(app->pf);
>+
>+	switch (port->type) {
>+	case NFP_PORT_PHYS_PORT:
>+		return nfp_devlink_port_init_phys(devlink, port);
>+	case NFP_PORT_PF_PORT:
>+		devlink_port_type_eth_set(&port->dl_port, port->netdev);
>+		devlink_port_attrs_pci_pf_set(&port->dl_port, port->pf_id);
>+		return 0;
>+	case NFP_PORT_VF_PORT:
>+		devlink_port_type_eth_set(&port->dl_port, port->netdev);
>+		devlink_port_attrs_pci_vf_set(&port->dl_port, port->pf_id,
>+					      port->vf_id);

What is the reason to expose vf/pf id for switch port? Isn't it rather
an attribute of a peer?


>+		return 0;
>+	default:
>+		return -EINVAL;
>+	}
>+}
>+
> void nfp_devlink_port_clean(struct nfp_port *port)
> {
> }
>@@ -376,7 +398,21 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
> {
> 	struct devlink *devlink = priv_to_devlink(app->pf);
> 
>-	return devlink_port_register(devlink, &port->dl_port, port->eth_id);
>+	switch (port->type) {
>+	case NFP_PORT_PHYS_PORT:
>+		return devlink_port_register(devlink, &port->dl_port,
>+					     port->eth_id);
>+	case NFP_PORT_PF_PORT:
>+		return devlink_port_register(devlink, &port->dl_port,
>+					     (port->pf_id + 1) * 10000 +
>+					     port->pf_split_id * 1000);

Wait. What this 10000/1000 magic about?


>+	case NFP_PORT_VF_PORT:
>+		return devlink_port_register(devlink, &port->dl_port,
>+					     (port->pf_id + 1) * 10000 +
>+					     port->vf_id + 1);
>+	default:
>+		return -EINVAL;
>+	}
> }
> 
> void nfp_devlink_port_unregister(struct nfp_port *port)
>diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
>index d2c803bb4e56..869d22760a6e 100644
>--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
>+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
>@@ -292,7 +292,9 @@ nfp_repr_transfer_features(struct net_device *netdev, struct net_device *lower)
> 
> static void nfp_repr_clean(struct nfp_repr *repr)
> {
>+	nfp_devlink_port_unregister(repr->port);
> 	unregister_netdev(repr->netdev);
>+	nfp_devlink_port_clean(repr->port);
> 	nfp_app_repr_clean(repr->app, repr->netdev);
> 	dst_release((struct dst_entry *)repr->dst);
> 	nfp_port_free(repr->port);
>@@ -395,12 +397,24 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev,
> 	if (err)
> 		goto err_clean;
> 
>-	err = register_netdev(netdev);
>+	err = nfp_devlink_port_init(app, repr->port);
> 	if (err)
> 		goto err_repr_clean;
> 
>+	err = register_netdev(netdev);
>+	if (err)
>+		goto err_port_clean;
>+
>+	err = nfp_devlink_port_register(app, repr->port);

Don't you want to take my patch ("nfp: register devlink port before
netdev") to change order of register_netdev and devlink_port_register,
include it to this patchset before this patch and change the order in
this patch too? I think it would be clearer to do it from the beginning.


>+	if (err)
>+		goto err_unreg_netdev;
>+
> 	return 0;
> 
>+err_unreg_netdev:
>+	unregister_netdev(repr->netdev);
>+err_port_clean:
>+	nfp_devlink_port_clean(repr->port);
> err_repr_clean:
> 	nfp_app_repr_clean(app, netdev);
> err_clean:
>-- 
>2.19.2
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-01 18:04 ` [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports Jakub Kicinski
@ 2019-03-02  9:41   ` Jiri Pirko
  2019-03-02 19:48     ` Jakub Kicinski
  2019-03-04 11:08   ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-02  9:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:
>PCI endpoint corresponds to a PCI device, but such device
>can have one more more logical device ports associated with it.
>We need a way to distinguish those. Add a PCI subport in the
>dumps and print the info in phys_port_name appropriately.
>
>This is not equivalent to port splitting, there is no split
>group. It's just a way of representing multiple netdevs on
>a single PCI function.
>
>Note that the quality of being multiport pertains only to
>the PCI function itself. A PF having multiple netdevs does
>not mean that its VFs will also have multiple, or that VFs
>are associated with any particular port of a multiport VF.
>
>Example (bus 05 device has subports, bus 82 has only one port per
>function):
>
>$ devlink port
>pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
>pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
>pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
>pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1

So these subport devlink ports are eswitch ports for subports, right?

Please see the following drawing:

                                 +---+      +---+      +---+
                            pfsub| 5 |    vf| 6 |      | 7 |pfsub
                                 +-+-+      +-+-+      +-+-+
physical link <---------+          |          |          |
                        |          |          |          |
                        |          |          |          |
                        |          |          |          |
                      +-+-+      +-+-+      +-+-+      +-+-+
                      | 1 |      | 2 |      | 3 |      | 4 |
                   +--+---+------+---+------+---+------+---+--+
                   |  physical    pfsub      vf         pfsub |
                   |  port        port       port       port  |
                   |                                          |
                   |                  eswitch                 |
                   |                                          |
                   |                                          |
                   +------------------------------------------+

1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f

This is basically what you have and I think we are in sync with that.
But what about 5,6,7? Should they have devlink port instances too?

5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1

These are the "peers".
I think that there could be flavours "pci_pf" and "pci_vf". Then the
"representors" (switch ports) could have flavours "pci_pf_port" and
"pci_vf_port" or something like that. User can see right away
that is not "PF" of "VF" but rather something "on the other end".
Note there is no "switch_id" for these devlink ports that tells the user
these devlink ports are not part of any switch.
What do you think?


>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>pci/0000:82:00.0/10000: type eth netdev eth0 flavour pci_pf pf 0
>

[...]

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
                   ` (6 preceding siblings ...)
  2019-03-01 18:04 ` [PATCH net-next v2 7/7] nfp: expose PF " Jakub Kicinski
@ 2019-03-02 10:13 ` Jiri Pirko
  2019-03-02 19:49   ` [oss-drivers] " Jakub Kicinski
  2019-03-04  5:12   ` Parav Pandit
  2019-03-04 18:22 ` David Miller
  2019-03-20 20:25 ` Jakub Kicinski
  9 siblings, 2 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-02 10:13 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Fri, Mar 01, 2019 at 07:04:46PM CET, jakub.kicinski@netronome.com wrote:
>Hi!
>
>This series is a long overdue follow up to Jiri's work on providing
>a common .ndo_phys_port_name implementation based on devlink ports.
>
>First devlink port flavours for PF and VF ports are added, and
>registered by the NFP. Port numbers and split info are reserved
>for physical and DSA ports. For PCIe ports we add pf/vf identifiers.
>Note that devices may have more than one PF, including multi host
>scenarios where not all pfs are connected to the same host.
>
>The port_index is slightly tricky to figure out, we use a bit of
>math to create unique IDs for ports.
>
>Next subports for PCIe ports are introduced. This is in case device
>has more than one netdev on a physical function (e.g. multi port
>SmartNIC).
>
>With the above features in place we can remove the ndo_phys_port_name
>implementation from the NFP and use the standard devlink one for
>port netdevs.
>
>Last but not least a concept of peer netdevs is added. In multi-host
>scenarios its currently not possible to figure out which PF is
>associated with the local host. Peer device is "the other side
>of the wire" for PCIe ports. In case of NFP we only link the PF
>netdevs, but it should be possible to also link VF peers after
>VF driver probes, if users request it.
>
>This is the conceptual image of devlink instances:
>
>                    HOST A             ||          HOST B
>                                       ||
>        PF A       | V | V | V | V     ||       PF B        | V | V | V
>                   |*F |*F |*F |*F ... ||                   |*F |*F |*F
>*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2
>                                       ||
>             PCI Express link          ||        PCI Express link
>        \      \      \  |   |   |          |       |      /   /   /
>         \      \      \ |   |   |          |       |     /   /   /
>      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
>      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||
>  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
>d n H ||  |               <<==========                              |   || d n H
>e s O ||  |                            ==========>>                 |   || e s O
>v t S ||  |                    SR-IOV e-switch                      |   || v t S
>l a T ||  |               <<==========                              |   || l a T
>i n   ||  |                            ==========>>                 |   || i n
>n c A ||  |               ________ _________ ________               |   || n c B
>k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
>      ||  \---------------------------------------------------------/   ||
>      \/                      |        |         |                      \/
>                              |        |         |
>                                 ||         ||
>                          MAC 0  ||  MAC 1  || MAC 2
>                                 ||         ||
>
> '+' marks the devlink ports and port (-representor-) netdevs
> '*' marks host netdevs (actual VF/PF netdev)

As I wrote in the reply to patch 4, I think we need to figure out if we
want to model all entities that belong under specific devlink
instance/pci address - which I prefer, or we want to have only eswitch
ports there.

One way or another, I think that it is not good idea to merge this
patchset this late, I would prefer to wait for next net-next opening...
In the meantime we can sync and make this whole thing crystal clear, for
everyone.

Thanks!

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs
  2019-03-02  8:43   ` Jiri Pirko
@ 2019-03-02 19:07     ` Jakub Kicinski
  2019-03-04  7:36       ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-02 19:07 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Sat, 2 Mar 2019 09:43:47 +0100, Jiri Pirko wrote:
> Fri, Mar 01, 2019 at 07:04:49PM CET, jakub.kicinski@netronome.com wrote:
> >Register all representors as devlink ports.
> >
> >The port_index is slightly tricky to figure out, we use a bit of
> >arbitrary math to create unique IDs for PCI ports.
> >
> >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >---
> > .../net/ethernet/netronome/nfp/nfp_devlink.c  | 40 ++++++++++++++++++-
> > .../net/ethernet/netronome/nfp/nfp_net_repr.c | 16 +++++++-
> > 2 files changed, 53 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
> >index 9af3cb1f2f17..bf7fd9614152 100644
> >--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
> >+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
> >@@ -350,7 +350,8 @@ const struct devlink_ops nfp_devlink_ops = {
> > 	.flash_update		= nfp_devlink_flash_update,
> > };
> > 
> >-int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
> >+static int
> >+nfp_devlink_port_init_phys(struct devlink *devlink, struct nfp_port *port)
> > {
> > 	struct nfp_eth_table_port eth_port;
> > 	int ret;
> >@@ -368,6 +369,27 @@ int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
> > 	return 0;
> > }
> > 
> >+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
> >+{
> >+	struct devlink *devlink = priv_to_devlink(app->pf);
> >+
> >+	switch (port->type) {
> >+	case NFP_PORT_PHYS_PORT:
> >+		return nfp_devlink_port_init_phys(devlink, port);
> >+	case NFP_PORT_PF_PORT:
> >+		devlink_port_type_eth_set(&port->dl_port, port->netdev);
> >+		devlink_port_attrs_pci_pf_set(&port->dl_port, port->pf_id);
> >+		return 0;
> >+	case NFP_PORT_VF_PORT:
> >+		devlink_port_type_eth_set(&port->dl_port, port->netdev);
> >+		devlink_port_attrs_pci_vf_set(&port->dl_port, port->pf_id,
> >+					      port->vf_id);  
> 
> What is the reason to expose vf/pf id for switch port? Isn't it rather
> an attribute of a peer?

Naw, its an attribute of the port.  I leave the ASIC via PF n or VF m
of PF n.  Whatever is on the other side is isolated from the topology
of the ASIC.

Is the physical port ID an attribute of the other end of the cable?

> >+		return 0;
> >+	default:
> >+		return -EINVAL;
> >+	}
> >+}
> >+
> > void nfp_devlink_port_clean(struct nfp_port *port)
> > {
> > }
> >@@ -376,7 +398,21 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
> > {
> > 	struct devlink *devlink = priv_to_devlink(app->pf);
> > 
> >-	return devlink_port_register(devlink, &port->dl_port, port->eth_id);
> >+	switch (port->type) {
> >+	case NFP_PORT_PHYS_PORT:
> >+		return devlink_port_register(devlink, &port->dl_port,
> >+					     port->eth_id);
> >+	case NFP_PORT_PF_PORT:
> >+		return devlink_port_register(devlink, &port->dl_port,
> >+					     (port->pf_id + 1) * 10000 +
> >+					     port->pf_split_id * 1000);  
> 
> Wait. What this 10000/1000 magic about?

port_index has to be unique, I need some unique number here, as I
stated both in the commit message and the cover letter, this is
arbitrary. 

I can put the datapath port identifier in there but its (a)
meaningless, (b) a bitfield, so it will look like 8972367083.  And it
may change depending on the FW load, so its not stable either.

> >+	case NFP_PORT_VF_PORT:
> >+		return devlink_port_register(devlink, &port->dl_port,
> >+					     (port->pf_id + 1) * 10000 +
> >+					     port->vf_id + 1);
> >+	default:
> >+		return -EINVAL;
> >+	}
> > }
> > 
> > void nfp_devlink_port_unregister(struct nfp_port *port)
> >diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
> >index d2c803bb4e56..869d22760a6e 100644
> >--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
> >+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
> >@@ -292,7 +292,9 @@ nfp_repr_transfer_features(struct net_device *netdev, struct net_device *lower)
> > 
> > static void nfp_repr_clean(struct nfp_repr *repr)
> > {
> >+	nfp_devlink_port_unregister(repr->port);
> > 	unregister_netdev(repr->netdev);
> >+	nfp_devlink_port_clean(repr->port);
> > 	nfp_app_repr_clean(repr->app, repr->netdev);
> > 	dst_release((struct dst_entry *)repr->dst);
> > 	nfp_port_free(repr->port);
> >@@ -395,12 +397,24 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev,
> > 	if (err)
> > 		goto err_clean;
> > 
> >-	err = register_netdev(netdev);
> >+	err = nfp_devlink_port_init(app, repr->port);
> > 	if (err)
> > 		goto err_repr_clean;
> > 
> >+	err = register_netdev(netdev);
> >+	if (err)
> >+		goto err_port_clean;
> >+
> >+	err = nfp_devlink_port_register(app, repr->port);  
> 
> Don't you want to take my patch ("nfp: register devlink port before
> netdev") to change order of register_netdev and devlink_port_register,
> include it to this patchset before this patch and change the order in
> this patch too? I think it would be clearer to do it from the beginning.

This way both netdev and devlink_port can get registered fully
initialized.  Otherwise we'd get two notifications.  Are we trying to
establish some ordering rules to get around the rtnl locking? :)

> >+	if (err)
> >+		goto err_unreg_netdev;
> >+
> > 	return 0;
> > 
> >+err_unreg_netdev:
> >+	unregister_netdev(repr->netdev);
> >+err_port_clean:
> >+	nfp_devlink_port_clean(repr->port);
> > err_repr_clean:
> > 	nfp_app_repr_clean(app, netdev);
> > err_clean:

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-02  9:41   ` Jiri Pirko
@ 2019-03-02 19:48     ` Jakub Kicinski
  2019-03-04  7:56       ` Jiri Pirko
  2019-03-04 11:19       ` Jiri Pirko
  0 siblings, 2 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-02 19:48 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Sat, 2 Mar 2019 10:41:16 +0100, Jiri Pirko wrote:
> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:
> >PCI endpoint corresponds to a PCI device, but such device
> >can have one more more logical device ports associated with it.
> >We need a way to distinguish those. Add a PCI subport in the
> >dumps and print the info in phys_port_name appropriately.
> >
> >This is not equivalent to port splitting, there is no split
> >group. It's just a way of representing multiple netdevs on
> >a single PCI function.
> >
> >Note that the quality of being multiport pertains only to
> >the PCI function itself. A PF having multiple netdevs does
> >not mean that its VFs will also have multiple, or that VFs
> >are associated with any particular port of a multiport VF.
> >
> >Example (bus 05 device has subports, bus 82 has only one port per
> >function):
> >
> >$ devlink port
> >pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
> >pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
> >pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
> >pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1  
> 
> So these subport devlink ports are eswitch ports for subports, right?
> 
> Please see the following drawing:
> 
>                                  +---+      +---+      +---+
>                             pfsub| 5 |    vf| 6 |      | 7 |pfsub
>                                  +-+-+      +-+-+      +-+-+
> physical link <---------+          |          |          |
>                         |          |          |          |
>                         |          |          |          |
>                         |          |          |          |
>                       +-+-+      +-+-+      +-+-+      +-+-+
>                       | 1 |      | 2 |      | 3 |      | 4 |
>                    +--+---+------+---+------+---+------+---+--+
>                    |  physical    pfsub      vf         pfsub |
>                    |  port        port       port       port  |
>                    |                                          |
>                    |                  eswitch                 |
>                    |                                          |
>                    |                                          |
>                    +------------------------------------------+
> 
> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f
> 
> This is basically what you have and I think we are in sync with that.
> But what about 5,6,7? Should they have devlink port instances too?
> 
> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
> 7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1
> 
> These are the "peers".
> I think that there could be flavours "pci_pf" and "pci_vf". Then the
> "representors" (switch ports) could have flavours "pci_pf_port" and
> "pci_vf_port" or something like that. User can see right away
> that is not "PF" of "VF" but rather something "on the other end".
> Note there is no "switch_id" for these devlink ports that tells the user
> these devlink ports are not part of any switch.
> What do you think?

Hmmm.. Hm. Hm.

To me its neat if the devlink instance matches an ASIC.  I think it's
kind of clear for people to understand what it stands for then.  So if
we wanted to do the above we'd have to make the switch_id the first
class identifier for devlink instances, rather than the bus?  But then
VF instances don't have a switch ID so that doesn't work...

I need to think about it.

It's also kind of strange that we have to add the noun *port* to the
flavour of... a port...  So I would prefer not to have those showing up
as ports.  Can we invent a new command (say "partition"?) that'd take
the bus info where the partition is to be spawned?

My next goal is to find a way of grouping multiple bus devices under one
"ASIC" (which is a devlink instance to me) so it can be understood
easily how things are laid out when there is more than one PF connected
to one host.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [oss-drivers] Re: [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports
  2019-03-02 10:13 ` [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jiri Pirko
@ 2019-03-02 19:49   ` Jakub Kicinski
  2019-03-04  5:12   ` Parav Pandit
  1 sibling, 0 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-02 19:49 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Sat, 2 Mar 2019 11:13:37 +0100, Jiri Pirko wrote:
> >This is the conceptual image of devlink instances:
> >
> >                    HOST A             ||          HOST B
> >                                       ||
> >        PF A       | V | V | V | V     ||       PF B        | V | V | V
> >                   |*F |*F |*F |*F ... ||                   |*F |*F |*F
> >*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2
> >                                       ||
> >             PCI Express link          ||        PCI Express link
> >        \      \      \  |   |   |          |       |      /   /   /
> >         \      \      \ |   |   |          |       |     /   /   /
> >      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
> >      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||
> >  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
> >d n H ||  |               <<==========                              |   || d n H
> >e s O ||  |                            ==========>>                 |   || e s O
> >v t S ||  |                    SR-IOV e-switch                      |   || v t S
> >l a T ||  |               <<==========                              |   || l a T
> >i n   ||  |                            ==========>>                 |   || i n
> >n c A ||  |               ________ _________ ________               |   || n c B
> >k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
> >      ||  \---------------------------------------------------------/   ||
> >      \/                      |        |         |                      \/
> >                              |        |         |
> >                                 ||         ||
> >                          MAC 0  ||  MAC 1  || MAC 2
> >                                 ||         ||
> >
> > '+' marks the devlink ports and port (-representor-) netdevs
> > '*' marks host netdevs (actual VF/PF netdev)  
> 
> As I wrote in the reply to patch 4, I think we need to figure out if we
> want to model all entities that belong under specific devlink
> instance/pci address - which I prefer, or we want to have only eswitch
> ports there.
> 
> One way or another, I think that it is not good idea to merge this
> patchset this late, I would prefer to wait for next net-next opening...
> In the meantime we can sync and make this whole thing crystal clear, for
> everyone.

Makes sense, let's keep talking.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports
  2019-03-02 10:13 ` [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jiri Pirko
  2019-03-02 19:49   ` [oss-drivers] " Jakub Kicinski
@ 2019-03-04  5:12   ` Parav Pandit
  1 sibling, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-04  5:12 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski; +Cc: davem, netdev, oss-drivers



> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jiri Pirko
> Sent: Saturday, March 2, 2019 4:14 AM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; oss-
> drivers@netronome.com
> Subject: Re: [PATCH net-next v2 0/7] devlink: expose PF and VF representors
> as ports
> 
> Fri, Mar 01, 2019 at 07:04:46PM CET, jakub.kicinski@netronome.com wrote:
> >Hi!
> >
> >This series is a long overdue follow up to Jiri's work on providing a
> >common .ndo_phys_port_name implementation based on devlink ports.
> >
> >First devlink port flavours for PF and VF ports are added, and
> >registered by the NFP. Port numbers and split info are reserved for
> >physical and DSA ports. For PCIe ports we add pf/vf identifiers.
> >Note that devices may have more than one PF, including multi host
> >scenarios where not all pfs are connected to the same host.
> >
> >The port_index is slightly tricky to figure out, we use a bit of math
> >to create unique IDs for ports.
> >
> >Next subports for PCIe ports are introduced. This is in case device has
> >more than one netdev on a physical function (e.g. multi port SmartNIC).
> >
> >With the above features in place we can remove the ndo_phys_port_name
> >implementation from the NFP and use the standard devlink one for port
> >netdevs.
> >
> >Last but not least a concept of peer netdevs is added. In multi-host
> >scenarios its currently not possible to figure out which PF is
> >associated with the local host. Peer device is "the other side of the
> >wire" for PCIe ports. In case of NFP we only link the PF netdevs, but
> >it should be possible to also link VF peers after VF driver probes, if
> >users request it.
> >
> >This is the conceptual image of devlink instances:
> >
> >                    HOST A             ||          HOST B
> >                                       ||
> >        PF A       | V | V | V | V     ||       PF B        | V | V | V
> >                   |*F |*F |*F |*F ... ||                   |*F |*F |*F
> >*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2
> >                                       ||
> >             PCI Express link          ||        PCI Express link
> >        \      \      \  |   |   |          |       |      /   /   /
> >         \      \      \ |   |   |          |       |     /   /   /
> >      /\  \______\______\'___|___|__________|_______'____/___/___/__
> /\
> >      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||
> >  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
> >d n H ||  |               <<==========                              |   || d n H
> >e s O ||  |                            ==========>>                 |   || e s O
> >v t S ||  |                    SR-IOV e-switch                      |   || v t S
> >l a T ||  |               <<==========                              |   || l a T
> >i n   ||  |                            ==========>>                 |   || i n
> >n c A ||  |               ________ _________ ________               |   || n c B
> >k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
> >      ||  \---------------------------------------------------------/   ||
> >      \/                      |        |         |                      \/
> >                              |        |         |
> >                                 ||         ||
> >                          MAC 0  ||  MAC 1  || MAC 2
> >                                 ||         ||
> >
> > '+' marks the devlink ports and port (-representor-) netdevs '*' marks
> > host netdevs (actual VF/PF netdev)
> 
> As I wrote in the reply to patch 4, I think we need to figure out if we want to
> model all entities that belong under specific devlink instance/pci address -
> which I prefer, or we want to have only eswitch ports there.
> 
> One way or another, I think that it is not good idea to merge this patchset
> this late, I would prefer to wait for next net-next opening...
> In the meantime we can sync and make this whole thing crystal clear, for
> everyone.
> 
Yes, please.
I replied in other thread, we should not bring peer port concept. It is convoluted.
We just need hostport and switchport representation (likely as port flavours) to configure each separately.
Whether a given devlink device is PF or VF is devlink devie attribute anyway.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs
  2019-03-02 19:07     ` Jakub Kicinski
@ 2019-03-04  7:36       ` Jiri Pirko
  2019-03-04 23:32         ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-04  7:36 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Sat, Mar 02, 2019 at 08:07:24PM CET, jakub.kicinski@netronome.com wrote:
>On Sat, 2 Mar 2019 09:43:47 +0100, Jiri Pirko wrote:
>> Fri, Mar 01, 2019 at 07:04:49PM CET, jakub.kicinski@netronome.com wrote:
>> >Register all representors as devlink ports.
>> >
>> >The port_index is slightly tricky to figure out, we use a bit of
>> >arbitrary math to create unique IDs for PCI ports.
>> >
>> >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >---
>> > .../net/ethernet/netronome/nfp/nfp_devlink.c  | 40 ++++++++++++++++++-
>> > .../net/ethernet/netronome/nfp/nfp_net_repr.c | 16 +++++++-
>> > 2 files changed, 53 insertions(+), 3 deletions(-)
>> >
>> >diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>> >index 9af3cb1f2f17..bf7fd9614152 100644
>> >--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>> >+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
>> >@@ -350,7 +350,8 @@ const struct devlink_ops nfp_devlink_ops = {
>> > 	.flash_update		= nfp_devlink_flash_update,
>> > };
>> > 
>> >-int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
>> >+static int
>> >+nfp_devlink_port_init_phys(struct devlink *devlink, struct nfp_port *port)
>> > {
>> > 	struct nfp_eth_table_port eth_port;
>> > 	int ret;
>> >@@ -368,6 +369,27 @@ int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
>> > 	return 0;
>> > }
>> > 
>> >+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
>> >+{
>> >+	struct devlink *devlink = priv_to_devlink(app->pf);
>> >+
>> >+	switch (port->type) {
>> >+	case NFP_PORT_PHYS_PORT:
>> >+		return nfp_devlink_port_init_phys(devlink, port);
>> >+	case NFP_PORT_PF_PORT:
>> >+		devlink_port_type_eth_set(&port->dl_port, port->netdev);
>> >+		devlink_port_attrs_pci_pf_set(&port->dl_port, port->pf_id);
>> >+		return 0;
>> >+	case NFP_PORT_VF_PORT:
>> >+		devlink_port_type_eth_set(&port->dl_port, port->netdev);
>> >+		devlink_port_attrs_pci_vf_set(&port->dl_port, port->pf_id,
>> >+					      port->vf_id);  
>> 
>> What is the reason to expose vf/pf id for switch port? Isn't it rather
>> an attribute of a peer?
>
>Naw, its an attribute of the port.  I leave the ASIC via PF n or VF m
>of PF n.  Whatever is on the other side is isolated from the topology
>of the ASIC.

Ok.


>
>Is the physical port ID an attribute of the other end of the cable?
>
>> >+		return 0;
>> >+	default:
>> >+		return -EINVAL;
>> >+	}
>> >+}
>> >+
>> > void nfp_devlink_port_clean(struct nfp_port *port)
>> > {
>> > }
>> >@@ -376,7 +398,21 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
>> > {
>> > 	struct devlink *devlink = priv_to_devlink(app->pf);
>> > 
>> >-	return devlink_port_register(devlink, &port->dl_port, port->eth_id);
>> >+	switch (port->type) {
>> >+	case NFP_PORT_PHYS_PORT:
>> >+		return devlink_port_register(devlink, &port->dl_port,
>> >+					     port->eth_id);
>> >+	case NFP_PORT_PF_PORT:
>> >+		return devlink_port_register(devlink, &port->dl_port,
>> >+					     (port->pf_id + 1) * 10000 +
>> >+					     port->pf_split_id * 1000);  
>> 
>> Wait. What this 10000/1000 magic about?
>
>port_index has to be unique, I need some unique number here, as I
>stated both in the commit message and the cover letter, this is
>arbitrary. 

You can at least use some defines for that.


>
>I can put the datapath port identifier in there but its (a)
>meaningless, (b) a bitfield, so it will look like 8972367083.  And it
>may change depending on the FW load, so its not stable either.
>
>> >+	case NFP_PORT_VF_PORT:
>> >+		return devlink_port_register(devlink, &port->dl_port,
>> >+					     (port->pf_id + 1) * 10000 +
>> >+					     port->vf_id + 1);
>> >+	default:
>> >+		return -EINVAL;
>> >+	}
>> > }
>> > 
>> > void nfp_devlink_port_unregister(struct nfp_port *port)
>> >diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
>> >index d2c803bb4e56..869d22760a6e 100644
>> >--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
>> >+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
>> >@@ -292,7 +292,9 @@ nfp_repr_transfer_features(struct net_device *netdev, struct net_device *lower)
>> > 
>> > static void nfp_repr_clean(struct nfp_repr *repr)
>> > {
>> >+	nfp_devlink_port_unregister(repr->port);
>> > 	unregister_netdev(repr->netdev);
>> >+	nfp_devlink_port_clean(repr->port);
>> > 	nfp_app_repr_clean(repr->app, repr->netdev);
>> > 	dst_release((struct dst_entry *)repr->dst);
>> > 	nfp_port_free(repr->port);
>> >@@ -395,12 +397,24 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev,
>> > 	if (err)
>> > 		goto err_clean;
>> > 
>> >-	err = register_netdev(netdev);
>> >+	err = nfp_devlink_port_init(app, repr->port);
>> > 	if (err)
>> > 		goto err_repr_clean;
>> > 
>> >+	err = register_netdev(netdev);
>> >+	if (err)
>> >+		goto err_port_clean;
>> >+
>> >+	err = nfp_devlink_port_register(app, repr->port);  
>> 
>> Don't you want to take my patch ("nfp: register devlink port before
>> netdev") to change order of register_netdev and devlink_port_register,
>> include it to this patchset before this patch and change the order in
>> this patch too? I think it would be clearer to do it from the beginning.
>
>This way both netdev and devlink_port can get registered fully
>initialized.  Otherwise we'd get two notifications.  Are we trying to
>establish some ordering rules to get around the rtnl locking? :)

The order of devlink_port_register and register_netdev is given by
layering. For example, for port change, the devlink_port is still there
and registered, only the netdev is unregistered and ib_dev registered
instead of vice versa. It has really no relation to rtnl locking.


>
>> >+	if (err)
>> >+		goto err_unreg_netdev;
>> >+
>> > 	return 0;
>> > 
>> >+err_unreg_netdev:
>> >+	unregister_netdev(repr->netdev);
>> >+err_port_clean:
>> >+	nfp_devlink_port_clean(repr->port);
>> > err_repr_clean:
>> > 	nfp_app_repr_clean(app, netdev);
>> > err_clean:

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-02 19:48     ` Jakub Kicinski
@ 2019-03-04  7:56       ` Jiri Pirko
  2019-03-05  0:33         ` Jakub Kicinski
  2019-03-04 11:19       ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-04  7:56 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Sat, Mar 02, 2019 at 08:48:47PM CET, jakub.kicinski@netronome.com wrote:
>On Sat, 2 Mar 2019 10:41:16 +0100, Jiri Pirko wrote:
>> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:
>> >PCI endpoint corresponds to a PCI device, but such device
>> >can have one more more logical device ports associated with it.
>> >We need a way to distinguish those. Add a PCI subport in the
>> >dumps and print the info in phys_port_name appropriately.
>> >
>> >This is not equivalent to port splitting, there is no split
>> >group. It's just a way of representing multiple netdevs on
>> >a single PCI function.
>> >
>> >Note that the quality of being multiport pertains only to
>> >the PCI function itself. A PF having multiple netdevs does
>> >not mean that its VFs will also have multiple, or that VFs
>> >are associated with any particular port of a multiport VF.
>> >
>> >Example (bus 05 device has subports, bus 82 has only one port per
>> >function):
>> >
>> >$ devlink port
>> >pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
>> >pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
>> >pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
>> >pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1  
>> 
>> So these subport devlink ports are eswitch ports for subports, right?
>> 
>> Please see the following drawing:
>> 
>>                                  +---+      +---+      +---+
>>                             pfsub| 5 |    vf| 6 |      | 7 |pfsub
>>                                  +-+-+      +-+-+      +-+-+
>> physical link <---------+          |          |          |
>>                         |          |          |          |
>>                         |          |          |          |
>>                         |          |          |          |
>>                       +-+-+      +-+-+      +-+-+      +-+-+
>>                       | 1 |      | 2 |      | 3 |      | 4 |
>>                    +--+---+------+---+------+---+------+---+--+
>>                    |  physical    pfsub      vf         pfsub |
>>                    |  port        port       port       port  |
>>                    |                                          |
>>                    |                  eswitch                 |
>>                    |                                          |
>>                    |                                          |
>>                    +------------------------------------------+
>> 
>> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
>> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
>> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
>> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f
>> 
>> This is basically what you have and I think we are in sync with that.
>> But what about 5,6,7? Should they have devlink port instances too?
>> 
>> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
>> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
>> 7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1
>> 
>> These are the "peers".
>> I think that there could be flavours "pci_pf" and "pci_vf". Then the
>> "representors" (switch ports) could have flavours "pci_pf_port" and
>> "pci_vf_port" or something like that. User can see right away
>> that is not "PF" of "VF" but rather something "on the other end".
>> Note there is no "switch_id" for these devlink ports that tells the user
>> these devlink ports are not part of any switch.
>> What do you think?
>
>Hmmm.. Hm. Hm.
>
>To me its neat if the devlink instance matches an ASIC.  I think it's
>kind of clear for people to understand what it stands for then.  So if
>we wanted to do the above we'd have to make the switch_id the first
>class identifier for devlink instances, rather than the bus?  But then

What do you mean by "first class identifier"? Like "a handle"?


>VF instances don't have a switch ID so that doesn't work...

Wait a sec. VF-ports do have. VFs them selves don't. But that is the
same for PF. PF would also not have switch id.


>
>I need to think about it.
>
>It's also kind of strange that we have to add the noun *port* to the
>flavour of... a port...  So I would prefer not to have those showing up

Yeah.

>as ports.  Can we invent a new command (say "partition"?) that'd take
>the bus info where the partition is to be spawned?

Got it. But the question is how different this object would be from the
existing "port" we have today.


>
>My next goal is to find a way of grouping multiple bus devices under one
>"ASIC" (which is a devlink instance to me) so it can be understood
>easily how things are laid out when there is more than one PF connected
>to one host.

These are the "aliases" you mentioned before right? Makes sense.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-01 18:04 ` [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports Jakub Kicinski
  2019-03-02  9:41   ` Jiri Pirko
@ 2019-03-04 11:08   ` Jiri Pirko
  2019-03-05  0:51     ` Jakub Kicinski
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-04 11:08 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:
>PCI endpoint corresponds to a PCI device, but such device
>can have one more more logical device ports associated with it.
>We need a way to distinguish those. Add a PCI subport in the
>dumps and print the info in phys_port_name appropriately.
>
>This is not equivalent to port splitting, there is no split
>group. It's just a way of representing multiple netdevs on
>a single PCI function.
>
>Note that the quality of being multiport pertains only to
>the PCI function itself. A PF having multiple netdevs does
>not mean that its VFs will also have multiple, or that VFs
>are associated with any particular port of a multiport VF.
>
>Example (bus 05 device has subports, bus 82 has only one port per
>function):

How do you plan to added/remove these subports?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-02 19:48     ` Jakub Kicinski
  2019-03-04  7:56       ` Jiri Pirko
@ 2019-03-04 11:19       ` Jiri Pirko
  2019-03-05  0:40         ` Jakub Kicinski
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-04 11:19 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Sat, Mar 02, 2019 at 08:48:47PM CET, jakub.kicinski@netronome.com wrote:
>On Sat, 2 Mar 2019 10:41:16 +0100, Jiri Pirko wrote:
>> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:
>> >PCI endpoint corresponds to a PCI device, but such device
>> >can have one more more logical device ports associated with it.
>> >We need a way to distinguish those. Add a PCI subport in the
>> >dumps and print the info in phys_port_name appropriately.
>> >
>> >This is not equivalent to port splitting, there is no split
>> >group. It's just a way of representing multiple netdevs on
>> >a single PCI function.
>> >
>> >Note that the quality of being multiport pertains only to
>> >the PCI function itself. A PF having multiple netdevs does
>> >not mean that its VFs will also have multiple, or that VFs
>> >are associated with any particular port of a multiport VF.
>> >
>> >Example (bus 05 device has subports, bus 82 has only one port per
>> >function):
>> >
>> >$ devlink port
>> >pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
>> >pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
>> >pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
>> >pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1  
>> 
>> So these subport devlink ports are eswitch ports for subports, right?
>> 
>> Please see the following drawing:
>> 
>>                                  +---+      +---+      +---+
>>                             pfsub| 5 |    vf| 6 |      | 7 |pfsub
>>                                  +-+-+      +-+-+      +-+-+
>> physical link <---------+          |          |          |
>>                         |          |          |          |
>>                         |          |          |          |
>>                         |          |          |          |
>>                       +-+-+      +-+-+      +-+-+      +-+-+
>>                       | 1 |      | 2 |      | 3 |      | 4 |
>>                    +--+---+------+---+------+---+------+---+--+
>>                    |  physical    pfsub      vf         pfsub |
>>                    |  port        port       port       port  |
>>                    |                                          |
>>                    |                  eswitch                 |
>>                    |                                          |
>>                    |                                          |
>>                    +------------------------------------------+
>> 
>> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
>> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
>> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
>> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f
>> 
>> This is basically what you have and I think we are in sync with that.
>> But what about 5,6,7? Should they have devlink port instances too?
>> 
>> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
>> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
>> 7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1
>> 
>> These are the "peers".
>> I think that there could be flavours "pci_pf" and "pci_vf". Then the
>> "representors" (switch ports) could have flavours "pci_pf_port" and
>> "pci_vf_port" or something like that. User can see right away
>> that is not "PF" of "VF" but rather something "on the other end".
>> Note there is no "switch_id" for these devlink ports that tells the user
>> these devlink ports are not part of any switch.
>> What do you think?
>
>Hmmm.. Hm. Hm.
>
>To me its neat if the devlink instance matches an ASIC.  I think it's
>kind of clear for people to understand what it stands for then.  So if
>we wanted to do the above we'd have to make the switch_id the first
>class identifier for devlink instances, rather than the bus?  But then
>VF instances don't have a switch ID so that doesn't work...
>
>I need to think about it.
>
>It's also kind of strange that we have to add the noun *port* to the
>flavour of... a port...  So I would prefer not to have those showing up
>as ports.  Can we invent a new command (say "partition"?) that'd take
>the bus info where the partition is to be spawned?

Devlink does not supposed to be only there for switches. From the
beginning the design was to handle cases where the netdev/ib_dev is not
the correct handle. Not only in case you have multiple instances (ports)
for one ASIC, but also in case you have only one. Example use case is
port-type-change (eth->ib,ib->eth).

I chose word "port" as the parent devlink instance is "dev" and if you
partition the ASIC you basically got "ports", each of a different flavour.

And as you said, devlink instance matches one ASIC. Therefore the
devlink instance should contain all bits there are part of that ASIC,
not only switch/eswitch ports. That would be very limitting.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
                   ` (7 preceding siblings ...)
  2019-03-02 10:13 ` [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jiri Pirko
@ 2019-03-04 18:22 ` David Miller
  2019-03-20 20:25 ` Jakub Kicinski
  9 siblings, 0 replies; 100+ messages in thread
From: David Miller @ 2019-03-04 18:22 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: jiri, netdev, oss-drivers

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Fri,  1 Mar 2019 10:04:46 -0800

> This series is a long overdue follow up to Jiri's work on providing
> a common .ndo_phys_port_name implementation based on devlink ports.

I think this needs enough discussion still that it will have to wait
until the next merge window.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs
  2019-03-04  7:36       ` Jiri Pirko
@ 2019-03-04 23:32         ` Jakub Kicinski
  0 siblings, 0 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-04 23:32 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Mon, 4 Mar 2019 08:36:31 +0100, Jiri Pirko wrote:
> >> >+	case NFP_PORT_PF_PORT:
> >> >+		return devlink_port_register(devlink, &port->dl_port,
> >> >+					     (port->pf_id + 1) * 10000 +
> >> >+					     port->pf_split_id * 1000);    
> >> 
> >> Wait. What this 10000/1000 magic about?  
> >
> >port_index has to be unique, I need some unique number here, as I
> >stated both in the commit message and the cover letter, this is
> >arbitrary.   
> 
> You can at least use some defines for that.

Ok.

> >I can put the datapath port identifier in there but its (a)
> >meaningless, (b) a bitfield, so it will look like 8972367083.  And it
> >may change depending on the FW load, so its not stable either.

> >> > void nfp_devlink_port_unregister(struct nfp_port *port)
> >> >diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
> >> >index d2c803bb4e56..869d22760a6e 100644
> >> >--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
> >> >+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
> >> >@@ -395,12 +397,24 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev,
> >> > 	if (err)
> >> > 		goto err_clean;
> >> > 
> >> >-	err = register_netdev(netdev);
> >> >+	err = nfp_devlink_port_init(app, repr->port);
> >> > 	if (err)
> >> > 		goto err_repr_clean;
> >> > 
> >> >+	err = register_netdev(netdev);
> >> >+	if (err)
> >> >+		goto err_port_clean;
> >> >+
> >> >+	err = nfp_devlink_port_register(app, repr->port);    
> >> 
> >> Don't you want to take my patch ("nfp: register devlink port before
> >> netdev") to change order of register_netdev and devlink_port_register,
> >> include it to this patchset before this patch and change the order in
> >> this patch too? I think it would be clearer to do it from the beginning.  
> >
> >This way both netdev and devlink_port can get registered fully
> >initialized.  Otherwise we'd get two notifications.  Are we trying to
> >establish some ordering rules to get around the rtnl locking? :)  
> 
> The order of devlink_port_register and register_netdev is given by
> layering. For example, for port change, the devlink_port is still there
> and registered, only the netdev is unregistered and ib_dev registered
> instead of vice versa. It has really no relation to rtnl locking.

Ok, I shouldn't worry about the notifications too much, I agree the
order you suggests makes sense in principal.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-04  7:56       ` Jiri Pirko
@ 2019-03-05  0:33         ` Jakub Kicinski
  2019-03-05 11:06           ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-05  0:33 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Mon, 4 Mar 2019 08:56:09 +0100, Jiri Pirko wrote:
> Sat, Mar 02, 2019 at 08:48:47PM CET, jakub.kicinski@netronome.com wrote:
> >On Sat, 2 Mar 2019 10:41:16 +0100, Jiri Pirko wrote:  
> >> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:  
> >> >PCI endpoint corresponds to a PCI device, but such device
> >> >can have one more more logical device ports associated with it.
> >> >We need a way to distinguish those. Add a PCI subport in the
> >> >dumps and print the info in phys_port_name appropriately.
> >> >
> >> >This is not equivalent to port splitting, there is no split
> >> >group. It's just a way of representing multiple netdevs on
> >> >a single PCI function.
> >> >
> >> >Note that the quality of being multiport pertains only to
> >> >the PCI function itself. A PF having multiple netdevs does
> >> >not mean that its VFs will also have multiple, or that VFs
> >> >are associated with any particular port of a multiport VF.
> >> >
> >> >Example (bus 05 device has subports, bus 82 has only one port per
> >> >function):
> >> >
> >> >$ devlink port
> >> >pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
> >> >pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
> >> >pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
> >> >pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1    
> >> 
> >> So these subport devlink ports are eswitch ports for subports, right?
> >> 
> >> Please see the following drawing:
> >> 
> >>                                  +---+      +---+      +---+
> >>                             pfsub| 5 |    vf| 6 |      | 7 |pfsub
> >>                                  +-+-+      +-+-+      +-+-+
> >> physical link <---------+          |          |          |
> >>                         |          |          |          |
> >>                         |          |          |          |
> >>                         |          |          |          |
> >>                       +-+-+      +-+-+      +-+-+      +-+-+
> >>                       | 1 |      | 2 |      | 3 |      | 4 |
> >>                    +--+---+------+---+------+---+------+---+--+
> >>                    |  physical    pfsub      vf         pfsub |
> >>                    |  port        port       port       port  |
> >>                    |                                          |
> >>                    |                  eswitch                 |
> >>                    |                                          |
> >>                    |                                          |
> >>                    +------------------------------------------+
> >> 
> >> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
> >> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
> >> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
> >> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f
> >> 
> >> This is basically what you have and I think we are in sync with that.
> >> But what about 5,6,7? Should they have devlink port instances too?
> >> 
> >> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
> >> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
> >> 7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1
> >> 
> >> These are the "peers".
> >> I think that there could be flavours "pci_pf" and "pci_vf". Then the
> >> "representors" (switch ports) could have flavours "pci_pf_port" and
> >> "pci_vf_port" or something like that. User can see right away
> >> that is not "PF" of "VF" but rather something "on the other end".
> >> Note there is no "switch_id" for these devlink ports that tells the user
> >> these devlink ports are not part of any switch.
> >> What do you think?  
> >
> >Hmmm.. Hm. Hm.
> >
> >To me its neat if the devlink instance matches an ASIC.  I think it's
> >kind of clear for people to understand what it stands for then.  So if
> >we wanted to do the above we'd have to make the switch_id the first
> >class identifier for devlink instances, rather than the bus?  But then  
> 
> What do you mean by "first class identifier"? Like "a handle"?

Yes, a handle.

> >VF instances don't have a switch ID so that doesn't work...  
> 
> Wait a sec. VF-ports do have. VFs them selves don't. 

Looking at your example this one:

6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0

that uses VF's DBDF in the devlink instance handle, so I presume this
is a VF's devlink instance that will get passed to the VM together with
the VF device?

> But that is the same for PF. PF would also not have switch id.

Yes :(  You'd have to mark what constitutes a devlink instance on your
drawing.  The semantics for devlink instances seem to be the focal point
of the discussion.

Right now it seems a little bit that folks on the NIC side see a devlink
instance as a PCI function and on switch side it's the whole ASIC.

> >I need to think about it.
> >
> >It's also kind of strange that we have to add the noun *port* to the
> >flavour of... a port...  So I would prefer not to have those showing up  
> 
> Yeah.
> 
> >as ports.  Can we invent a new command (say "partition"?) that'd take
> >the bus info where the partition is to be spawned?  
> 
> Got it. But the question is how different this object would be from the
> existing "port" we have today.

They'd be where "the other side of a PCI link" is represented,
restricting ports to only ASIC's forwarding plane ports.

> >My next goal is to find a way of grouping multiple bus devices under one
> >"ASIC" (which is a devlink instance to me) so it can be understood
> >easily how things are laid out when there is more than one PF connected
> >to one host.  
> 
> These are the "aliases" you mentioned before right? Makes sense.

Yes.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-04 11:19       ` Jiri Pirko
@ 2019-03-05  0:40         ` Jakub Kicinski
  2019-03-05 11:07           ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-05  0:40 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Mon, 4 Mar 2019 12:19:02 +0100, Jiri Pirko wrote:
> Sat, Mar 02, 2019 at 08:48:47PM CET, jakub.kicinski@netronome.com wrote:
> >On Sat, 2 Mar 2019 10:41:16 +0100, Jiri Pirko wrote:  
> >> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:  
> >> >PCI endpoint corresponds to a PCI device, but such device
> >> >can have one more more logical device ports associated with it.
> >> >We need a way to distinguish those. Add a PCI subport in the
> >> >dumps and print the info in phys_port_name appropriately.
> >> >
> >> >This is not equivalent to port splitting, there is no split
> >> >group. It's just a way of representing multiple netdevs on
> >> >a single PCI function.
> >> >
> >> >Note that the quality of being multiport pertains only to
> >> >the PCI function itself. A PF having multiple netdevs does
> >> >not mean that its VFs will also have multiple, or that VFs
> >> >are associated with any particular port of a multiport VF.
> >> >
> >> >Example (bus 05 device has subports, bus 82 has only one port per
> >> >function):
> >> >
> >> >$ devlink port
> >> >pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
> >> >pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
> >> >pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
> >> >pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1    
> >> 
> >> So these subport devlink ports are eswitch ports for subports, right?
> >> 
> >> Please see the following drawing:
> >> 
> >>                                  +---+      +---+      +---+
> >>                             pfsub| 5 |    vf| 6 |      | 7 |pfsub
> >>                                  +-+-+      +-+-+      +-+-+
> >> physical link <---------+          |          |          |
> >>                         |          |          |          |
> >>                         |          |          |          |
> >>                         |          |          |          |
> >>                       +-+-+      +-+-+      +-+-+      +-+-+
> >>                       | 1 |      | 2 |      | 3 |      | 4 |
> >>                    +--+---+------+---+------+---+------+---+--+
> >>                    |  physical    pfsub      vf         pfsub |
> >>                    |  port        port       port       port  |
> >>                    |                                          |
> >>                    |                  eswitch                 |
> >>                    |                                          |
> >>                    |                                          |
> >>                    +------------------------------------------+
> >> 
> >> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
> >> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
> >> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
> >> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f
> >> 
> >> This is basically what you have and I think we are in sync with that.
> >> But what about 5,6,7? Should they have devlink port instances too?
> >> 
> >> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
> >> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
> >> 7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1
> >> 
> >> These are the "peers".
> >> I think that there could be flavours "pci_pf" and "pci_vf". Then the
> >> "representors" (switch ports) could have flavours "pci_pf_port" and
> >> "pci_vf_port" or something like that. User can see right away
> >> that is not "PF" of "VF" but rather something "on the other end".
> >> Note there is no "switch_id" for these devlink ports that tells the user
> >> these devlink ports are not part of any switch.
> >> What do you think?  
> >
> >Hmmm.. Hm. Hm.
> >
> >To me its neat if the devlink instance matches an ASIC.  I think it's
> >kind of clear for people to understand what it stands for then.  So if
> >we wanted to do the above we'd have to make the switch_id the first
> >class identifier for devlink instances, rather than the bus?  But then
> >VF instances don't have a switch ID so that doesn't work...
> >
> >I need to think about it.
> >
> >It's also kind of strange that we have to add the noun *port* to the
> >flavour of... a port...  So I would prefer not to have those showing up
> >as ports.  Can we invent a new command (say "partition"?) that'd take
> >the bus info where the partition is to be spawned?  
> 
> Devlink does not supposed to be only there for switches. From the
> beginning the design was to handle cases where the netdev/ib_dev is not
> the correct handle. Not only in case you have multiple instances (ports)
> for one ASIC, but also in case you have only one. Example use case is
> port-type-change (eth->ib,ib->eth).
> 
> I chose word "port" as the parent devlink instance is "dev" and if you
> partition the ASIC you basically got "ports", each of a different flavour.
> 
> And as you said, devlink instance matches one ASIC. Therefore the
> devlink instance should contain all bits there are part of that ASIC,
> not only switch/eswitch ports. That would be very limitting.

I could read this as us being in full agreement, but I'm not sure..
I think we agree that all objects of an ASIC should be under one
devlink instance, the question remains whether both ends of the pipe
for PCI devices (subdevs or not) should appear under ports or does the
"far end" (from ASICs perspective)/"host end" get its own category.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-04 11:08   ` Jiri Pirko
@ 2019-03-05  0:51     ` Jakub Kicinski
  2019-03-05 11:09       ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-05  0:51 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Mon, 4 Mar 2019 12:08:57 +0100, Jiri Pirko wrote:
> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:
> >PCI endpoint corresponds to a PCI device, but such device
> >can have one more more logical device ports associated with it.
> >We need a way to distinguish those. Add a PCI subport in the
> >dumps and print the info in phys_port_name appropriately.
> >
> >This is not equivalent to port splitting, there is no split
> >group. It's just a way of representing multiple netdevs on
> >a single PCI function.
> >
> >Note that the quality of being multiport pertains only to
> >the PCI function itself. A PF having multiple netdevs does
> >not mean that its VFs will also have multiple, or that VFs
> >are associated with any particular port of a multiport VF.
> >
> >Example (bus 05 device has subports, bus 82 has only one port per
> >function):  
> 
> How do you plan to added/remove these subports?

I can't say I got that figured out fully, but I was wondering if we can
have some form of:

$ devlink partition pci/0000:82:00.0 new
pci/0000:82:00.0/1001002

Which would create appropriate sub-port and port (-repr-) netdev.

Plus optionally the ability to work with something like the already
existing mdev infrastructure for passing to a VM.  But I haven't even
looked at that, yet.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-05  0:33         ` Jakub Kicinski
@ 2019-03-05 11:06           ` Jiri Pirko
  2019-03-05 17:15             ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-05 11:06 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Tue, Mar 05, 2019 at 01:33:02AM CET, jakub.kicinski@netronome.com wrote:
>On Mon, 4 Mar 2019 08:56:09 +0100, Jiri Pirko wrote:
>> Sat, Mar 02, 2019 at 08:48:47PM CET, jakub.kicinski@netronome.com wrote:
>> >On Sat, 2 Mar 2019 10:41:16 +0100, Jiri Pirko wrote:  
>> >> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:  
>> >> >PCI endpoint corresponds to a PCI device, but such device
>> >> >can have one more more logical device ports associated with it.
>> >> >We need a way to distinguish those. Add a PCI subport in the
>> >> >dumps and print the info in phys_port_name appropriately.
>> >> >
>> >> >This is not equivalent to port splitting, there is no split
>> >> >group. It's just a way of representing multiple netdevs on
>> >> >a single PCI function.
>> >> >
>> >> >Note that the quality of being multiport pertains only to
>> >> >the PCI function itself. A PF having multiple netdevs does
>> >> >not mean that its VFs will also have multiple, or that VFs
>> >> >are associated with any particular port of a multiport VF.
>> >> >
>> >> >Example (bus 05 device has subports, bus 82 has only one port per
>> >> >function):
>> >> >
>> >> >$ devlink port
>> >> >pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
>> >> >pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
>> >> >pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
>> >> >pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1    
>> >> 
>> >> So these subport devlink ports are eswitch ports for subports, right?
>> >> 
>> >> Please see the following drawing:
>> >> 
>> >>                                  +---+      +---+      +---+
>> >>                             pfsub| 5 |    vf| 6 |      | 7 |pfsub
>> >>                                  +-+-+      +-+-+      +-+-+
>> >> physical link <---------+          |          |          |
>> >>                         |          |          |          |
>> >>                         |          |          |          |
>> >>                         |          |          |          |
>> >>                       +-+-+      +-+-+      +-+-+      +-+-+
>> >>                       | 1 |      | 2 |      | 3 |      | 4 |
>> >>                    +--+---+------+---+------+---+------+---+--+
>> >>                    |  physical    pfsub      vf         pfsub |
>> >>                    |  port        port       port       port  |
>> >>                    |                                          |
>> >>                    |                  eswitch                 |
>> >>                    |                                          |
>> >>                    |                                          |
>> >>                    +------------------------------------------+
>> >> 
>> >> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
>> >> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
>> >> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
>> >> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f
>> >> 
>> >> This is basically what you have and I think we are in sync with that.
>> >> But what about 5,6,7? Should they have devlink port instances too?
>> >> 
>> >> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
>> >> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
>> >> 7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1
>> >> 
>> >> These are the "peers".
>> >> I think that there could be flavours "pci_pf" and "pci_vf". Then the
>> >> "representors" (switch ports) could have flavours "pci_pf_port" and
>> >> "pci_vf_port" or something like that. User can see right away
>> >> that is not "PF" of "VF" but rather something "on the other end".
>> >> Note there is no "switch_id" for these devlink ports that tells the user
>> >> these devlink ports are not part of any switch.
>> >> What do you think?  
>> >
>> >Hmmm.. Hm. Hm.
>> >
>> >To me its neat if the devlink instance matches an ASIC.  I think it's
>> >kind of clear for people to understand what it stands for then.  So if
>> >we wanted to do the above we'd have to make the switch_id the first
>> >class identifier for devlink instances, rather than the bus?  But then  
>> 
>> What do you mean by "first class identifier"? Like "a handle"?
>
>Yes, a handle.

Odd.


>
>> >VF instances don't have a switch ID so that doesn't work...  
>> 
>> Wait a sec. VF-ports do have. VFs them selves don't. 
>
>Looking at your example this one:
>
>6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
>
>that uses VF's DBDF in the devlink instance handle, so I presume this
>is a VF's devlink instance that will get passed to the VM together with
>the VF device?

Yes. Correct. That this does not have switch_id.


>
>> But that is the same for PF. PF would also not have switch id.
>
>Yes :(  You'd have to mark what constitutes a devlink instance on your
>drawing.  The semantics for devlink instances seem to be the focal point
>of the discussion.
>
>Right now it seems a little bit that folks on the NIC side see a devlink
>instance as a PCI function and on switch side it's the whole ASIC.

I think it should be the whole ASIC for both. I don't see why not. It's
one entity, one set parameter, one flash function, one info report etc.


>
>> >I need to think about it.
>> >
>> >It's also kind of strange that we have to add the noun *port* to the
>> >flavour of... a port...  So I would prefer not to have those showing up  
>> 
>> Yeah.
>> 
>> >as ports.  Can we invent a new command (say "partition"?) that'd take
>> >the bus info where the partition is to be spawned?  
>> 
>> Got it. But the question is how different this object would be from the
>> existing "port" we have today.
>
>They'd be where "the other side of a PCI link" is represented,
>restricting ports to only ASIC's forwarding plane ports.

Basically a "host port", right? It can still be the same port object,
only with different flavour and attributes. So we would have:

1) pci/0000:05:00.0/0: type eth netdev enp5s0np0
                       flavour physical switch_id 00154d130d2f
2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0
                           flavour pci_pf pf 0 subport 0
                           switch_id 00154d130d2f
                           peer pci/0000:05:00.0/1
3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0
                           flavour pci_vf pf 0 vf 0
                           switch_id 00154d130d2f
                           peer pci/0000:05:10.1/0
4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1
                           flavour pci_pf pf 0 subport 1
                           switch_id 00154d130d2f
                           peer pci/0000:05:00.0/2
5) pci/0000:05:00.0/1: type eth netdev enp5s0f0??
                       flavour host          <----------------
                       peer pci/0000:05:00.0/10000
6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 
                       flavour host          <----------------
                       peer pci/0000:05:00.0/10001
7) pci/0000:05:00.0/2: type eth netdev enp5s0f0??
                       flavour host          <----------------
                       peer pci/0000:05:00.0/10001

I think it looks quite clear, it gives complete topology view.

>
>> >My next goal is to find a way of grouping multiple bus devices under one
>> >"ASIC" (which is a devlink instance to me) so it can be understood
>> >easily how things are laid out when there is more than one PF connected
>> >to one host.  
>> 
>> These are the "aliases" you mentioned before right? Makes sense.
>
>Yes.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-05  0:40         ` Jakub Kicinski
@ 2019-03-05 11:07           ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-05 11:07 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Tue, Mar 05, 2019 at 01:40:07AM CET, jakub.kicinski@netronome.com wrote:
>On Mon, 4 Mar 2019 12:19:02 +0100, Jiri Pirko wrote:
>> Sat, Mar 02, 2019 at 08:48:47PM CET, jakub.kicinski@netronome.com wrote:
>> >On Sat, 2 Mar 2019 10:41:16 +0100, Jiri Pirko wrote:  
>> >> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:  
>> >> >PCI endpoint corresponds to a PCI device, but such device
>> >> >can have one more more logical device ports associated with it.
>> >> >We need a way to distinguish those. Add a PCI subport in the
>> >> >dumps and print the info in phys_port_name appropriately.
>> >> >
>> >> >This is not equivalent to port splitting, there is no split
>> >> >group. It's just a way of representing multiple netdevs on
>> >> >a single PCI function.
>> >> >
>> >> >Note that the quality of being multiport pertains only to
>> >> >the PCI function itself. A PF having multiple netdevs does
>> >> >not mean that its VFs will also have multiple, or that VFs
>> >> >are associated with any particular port of a multiport VF.
>> >> >
>> >> >Example (bus 05 device has subports, bus 82 has only one port per
>> >> >function):
>> >> >
>> >> >$ devlink port
>> >> >pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
>> >> >pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
>> >> >pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
>> >> >pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1    
>> >> 
>> >> So these subport devlink ports are eswitch ports for subports, right?
>> >> 
>> >> Please see the following drawing:
>> >> 
>> >>                                  +---+      +---+      +---+
>> >>                             pfsub| 5 |    vf| 6 |      | 7 |pfsub
>> >>                                  +-+-+      +-+-+      +-+-+
>> >> physical link <---------+          |          |          |
>> >>                         |          |          |          |
>> >>                         |          |          |          |
>> >>                         |          |          |          |
>> >>                       +-+-+      +-+-+      +-+-+      +-+-+
>> >>                       | 1 |      | 2 |      | 3 |      | 4 |
>> >>                    +--+---+------+---+------+---+------+---+--+
>> >>                    |  physical    pfsub      vf         pfsub |
>> >>                    |  port        port       port       port  |
>> >>                    |                                          |
>> >>                    |                  eswitch                 |
>> >>                    |                                          |
>> >>                    |                                          |
>> >>                    +------------------------------------------+
>> >> 
>> >> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
>> >> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
>> >> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
>> >> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f
>> >> 
>> >> This is basically what you have and I think we are in sync with that.
>> >> But what about 5,6,7? Should they have devlink port instances too?
>> >> 
>> >> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
>> >> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
>> >> 7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1
>> >> 
>> >> These are the "peers".
>> >> I think that there could be flavours "pci_pf" and "pci_vf". Then the
>> >> "representors" (switch ports) could have flavours "pci_pf_port" and
>> >> "pci_vf_port" or something like that. User can see right away
>> >> that is not "PF" of "VF" but rather something "on the other end".
>> >> Note there is no "switch_id" for these devlink ports that tells the user
>> >> these devlink ports are not part of any switch.
>> >> What do you think?  
>> >
>> >Hmmm.. Hm. Hm.
>> >
>> >To me its neat if the devlink instance matches an ASIC.  I think it's
>> >kind of clear for people to understand what it stands for then.  So if
>> >we wanted to do the above we'd have to make the switch_id the first
>> >class identifier for devlink instances, rather than the bus?  But then
>> >VF instances don't have a switch ID so that doesn't work...
>> >
>> >I need to think about it.
>> >
>> >It's also kind of strange that we have to add the noun *port* to the
>> >flavour of... a port...  So I would prefer not to have those showing up
>> >as ports.  Can we invent a new command (say "partition"?) that'd take
>> >the bus info where the partition is to be spawned?  
>> 
>> Devlink does not supposed to be only there for switches. From the
>> beginning the design was to handle cases where the netdev/ib_dev is not
>> the correct handle. Not only in case you have multiple instances (ports)
>> for one ASIC, but also in case you have only one. Example use case is
>> port-type-change (eth->ib,ib->eth).
>> 
>> I chose word "port" as the parent devlink instance is "dev" and if you
>> partition the ASIC you basically got "ports", each of a different flavour.
>> 
>> And as you said, devlink instance matches one ASIC. Therefore the
>> devlink instance should contain all bits there are part of that ASIC,
>> not only switch/eswitch ports. That would be very limitting.
>
>I could read this as us being in full agreement, but I'm not sure..
>I think we agree that all objects of an ASIC should be under one
>devlink instance, the question remains whether both ends of the pipe
>for PCI devices (subdevs or not) should appear under ports or does the
>"far end" (from ASICs perspective)/"host end" get its own category.

Yep. Please see the suggestion about "flavour host" I did in other reply
in this thread.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-05  0:51     ` Jakub Kicinski
@ 2019-03-05 11:09       ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-05 11:09 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Tue, Mar 05, 2019 at 01:51:59AM CET, jakub.kicinski@netronome.com wrote:
>On Mon, 4 Mar 2019 12:08:57 +0100, Jiri Pirko wrote:
>> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@netronome.com wrote:
>> >PCI endpoint corresponds to a PCI device, but such device
>> >can have one more more logical device ports associated with it.
>> >We need a way to distinguish those. Add a PCI subport in the
>> >dumps and print the info in phys_port_name appropriately.
>> >
>> >This is not equivalent to port splitting, there is no split
>> >group. It's just a way of representing multiple netdevs on
>> >a single PCI function.
>> >
>> >Note that the quality of being multiport pertains only to
>> >the PCI function itself. A PF having multiple netdevs does
>> >not mean that its VFs will also have multiple, or that VFs
>> >are associated with any particular port of a multiport VF.
>> >
>> >Example (bus 05 device has subports, bus 82 has only one port per
>> >function):  
>> 
>> How do you plan to added/remove these subports?
>
>I can't say I got that figured out fully, but I was wondering if we can
>have some form of:
>
>$ devlink partition pci/0000:82:00.0 new
>pci/0000:82:00.0/1001002

Parav has something similar in his proposal. Lets figure out the
port/non-port dillema first, then we can design this api.


>
>Which would create appropriate sub-port and port (-repr-) netdev.
>
>Plus optionally the ability to work with something like the already
>existing mdev infrastructure for passing to a VM.  But I haven't even
>looked at that, yet.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-05 11:06           ` Jiri Pirko
@ 2019-03-05 17:15             ` Jakub Kicinski
  2019-03-05 19:59               ` Parav Pandit
  2019-03-06 12:20               ` Jiri Pirko
  0 siblings, 2 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-05 17:15 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Tue, 5 Mar 2019 12:06:01 +0100, Jiri Pirko wrote:
> >> >as ports.  Can we invent a new command (say "partition"?) that'd take
> >> >the bus info where the partition is to be spawned?    
> >> 
> >> Got it. But the question is how different this object would be from the
> >> existing "port" we have today.  
> >
> >They'd be where "the other side of a PCI link" is represented,
> >restricting ports to only ASIC's forwarding plane ports.  
> 
> Basically a "host port", right? It can still be the same port object,
> only with different flavour and attributes. So we would have:
> 
> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0
>                        flavour physical switch_id 00154d130d2f
> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0
>                            flavour pci_pf pf 0 subport 0
>                            switch_id 00154d130d2f
>                            peer pci/0000:05:00.0/1
> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0
>                            flavour pci_vf pf 0 vf 0
>                            switch_id 00154d130d2f
>                            peer pci/0000:05:10.1/0
> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1
>                            flavour pci_pf pf 0 subport 1
>                            switch_id 00154d130d2f
>                            peer pci/0000:05:00.0/2
> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0??
>                        flavour host          <----------------
>                        peer pci/0000:05:00.0/10000
> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 
>                        flavour host          <----------------
>                        peer pci/0000:05:00.0/10001
> 7) pci/0000:05:00.0/2: type eth netdev enp5s0f0??
>                        flavour host          <----------------
>                        peer pci/0000:05:00.0/10001
> 
> I think it looks quite clear, it gives complete topology view.

Okay, I have some of questions :)

What do we use for port_index?

What are the operations one can perform on "host ports"?

If we have PCI parameters, do they get set on the ASIC side of the port
or the host side of the port?

How do those behave when device is passed to VM?

You have a VF devlink instance there - what ports does it show?

How do those look when the PF is connected to another host?  Do they
get spawned at all?

Will this not be confusing to DSA folks who have a CPU port?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-05 17:15             ` Jakub Kicinski
@ 2019-03-05 19:59               ` Parav Pandit
  2019-03-06 12:20               ` Jiri Pirko
  1 sibling, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-05 19:59 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: davem, netdev, oss-drivers



> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jakub Kicinski
> Sent: Tuesday, March 5, 2019 11:16 AM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; oss-
> drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Tue, 5 Mar 2019 12:06:01 +0100, Jiri Pirko wrote:
> > >> >as ports.  Can we invent a new command (say "partition"?) that'd take
> > >> >the bus info where the partition is to be spawned?
> > >>
> > >> Got it. But the question is how different this object would be from
> > >> the existing "port" we have today.
> > >
> > >They'd be where "the other side of a PCI link" is represented,
> > >restricting ports to only ASIC's forwarding plane ports.
> >
> > Basically a "host port", right? It can still be the same port object,
> > only with different flavour and attributes. So we would have:
> >
> > 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0
> >                        flavour physical switch_id 00154d130d2f
> > 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0
> >                            flavour pci_pf pf 0 subport 0
> >                            switch_id 00154d130d2f
> >                            peer pci/0000:05:00.0/1
> > 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0
> >                            flavour pci_vf pf 0 vf 0
> >                            switch_id 00154d130d2f
> >                            peer pci/0000:05:10.1/0
> > 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1
> >                            flavour pci_pf pf 0 subport 1
> >                            switch_id 00154d130d2f
> >                            peer pci/0000:05:00.0/2
> > 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0??
> >                        flavour host          <----------------
> >                        peer pci/0000:05:00.0/10000
> > 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0
> >                        flavour host          <----------------
> >                        peer pci/0000:05:00.0/10001
> > 7) pci/0000:05:00.0/2: type eth netdev enp5s0f0??
> >                        flavour host          <----------------
> >                        peer pci/0000:05:00.0/10001
> >
> > I think it looks quite clear, it gives complete topology view.
> 
> Okay, I have some of questions :)
> 
> What do we use for port_index?
> 
port_index to refer a port for port attribute query/config.

> What are the operations one can perform on "host ports"?
> 
Set port parameters.
I see use case for mac address, rdma port guid, port speed that should be emulated.
> If we have PCI parameters, do they get set on the ASIC side of the port or the
> host side of the port?
> 
Hostport has host facing parameters.
Eswitch port has switch facing parameters (configured/queried/managed through netdev/ovs).

> How do those behave when device is passed to VM?
> 
> You have a VF devlink instance there - what ports does it show?
> 
VF devlink port in VM will show the port it got.
It will likely have only read permission for the port attributes as today. 
For example, it will not know which switchport it is connected to.

> How do those look when the PF is connected to another host?  Do they get
> spawned at all?
> 
> Will this not be confusing to DSA folks who have a CPU port?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-05 17:15             ` Jakub Kicinski
  2019-03-05 19:59               ` Parav Pandit
@ 2019-03-06 12:20               ` Jiri Pirko
  2019-03-06 17:56                 ` Jakub Kicinski
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-06 12:20 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Tue, Mar 05, 2019 at 06:15:34PM CET, jakub.kicinski@netronome.com wrote:
>On Tue, 5 Mar 2019 12:06:01 +0100, Jiri Pirko wrote:
>> >> >as ports.  Can we invent a new command (say "partition"?) that'd take
>> >> >the bus info where the partition is to be spawned?    
>> >> 
>> >> Got it. But the question is how different this object would be from the
>> >> existing "port" we have today.  
>> >
>> >They'd be where "the other side of a PCI link" is represented,
>> >restricting ports to only ASIC's forwarding plane ports.  
>> 
>> Basically a "host port", right? It can still be the same port object,
>> only with different flavour and attributes. So we would have:
>> 
>> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0
>>                        flavour physical switch_id 00154d130d2f
>> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0
>>                            flavour pci_pf pf 0 subport 0
>>                            switch_id 00154d130d2f
>>                            peer pci/0000:05:00.0/1
>> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0
>>                            flavour pci_vf pf 0 vf 0
>>                            switch_id 00154d130d2f
>>                            peer pci/0000:05:10.1/0
>> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1
>>                            flavour pci_pf pf 0 subport 1
>>                            switch_id 00154d130d2f
>>                            peer pci/0000:05:00.0/2
>> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0??
>>                        flavour host          <----------------
>>                        peer pci/0000:05:00.0/10000
>> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 
>>                        flavour host          <----------------
>>                        peer pci/0000:05:00.0/10001
>> 7) pci/0000:05:00.0/2: type eth netdev enp5s0f0??
>>                        flavour host          <----------------
>>                        peer pci/0000:05:00.0/10001
>> 
>> I think it looks quite clear, it gives complete topology view.
>
>Okay, I have some of questions :)
>
>What do we use for port_index?

That is just a number totally in control of the driver. Driver can
assign it in any way.


>
>What are the operations one can perform on "host ports"?

That is a good question. I would start with *none* and extend it upon
needs.


>
>If we have PCI parameters, do they get set on the ASIC side of the port
>or the host side of the port?

Could you give me an example? But I believe that on switch-port side.


>
>How do those behave when device is passed to VM?

In case of VF? VF will have separate devlink instance (separate handle,
probably "aliased" to the PF handle). So it would disappear from
baremetal and appear in VM:
$ devlink dev
pci/0000:00:10.0
$ devlink dev port
pci/0000:00:10.1/0: type eth netdev enp5s10f0
                    flavour host
That's it for the VM.

There's no linkage (peer, alias) between this and the instances on
baremetal. 


>
>You have a VF devlink instance there - what ports does it show?

See above.


>
>How do those look when the PF is connected to another host?  Do they
>get spawned at all?

What do you mean by "PF is connected to another host"?


>
>Will this not be confusing to DSA folks who have a CPU port?

Why do you think so?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-06 12:20               ` Jiri Pirko
@ 2019-03-06 17:56                 ` Jakub Kicinski
  2019-03-07  3:56                   ` Parav Pandit
  2019-03-07  9:48                   ` Jiri Pirko
  0 siblings, 2 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-06 17:56 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Wed, 6 Mar 2019 13:20:37 +0100, Jiri Pirko wrote:
> Tue, Mar 05, 2019 at 06:15:34PM CET, jakub.kicinski@netronome.com wrote:
> >On Tue, 5 Mar 2019 12:06:01 +0100, Jiri Pirko wrote:  
> >> >> >as ports.  Can we invent a new command (say "partition"?) that'd take
> >> >> >the bus info where the partition is to be spawned?      
> >> >> 
> >> >> Got it. But the question is how different this object would be from the
> >> >> existing "port" we have today.    
> >> >
> >> >They'd be where "the other side of a PCI link" is represented,
> >> >restricting ports to only ASIC's forwarding plane ports.    
> >> 
> >> Basically a "host port", right? It can still be the same port object,
> >> only with different flavour and attributes. So we would have:
> >> 
> >> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0
> >>                        flavour physical switch_id 00154d130d2f
> >> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0
> >>                            flavour pci_pf pf 0 subport 0
> >>                            switch_id 00154d130d2f
> >>                            peer pci/0000:05:00.0/1
> >> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0
> >>                            flavour pci_vf pf 0 vf 0
> >>                            switch_id 00154d130d2f
> >>                            peer pci/0000:05:10.1/0
> >> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1
> >>                            flavour pci_pf pf 0 subport 1
> >>                            switch_id 00154d130d2f
> >>                            peer pci/0000:05:00.0/2
> >> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0??
> >>                        flavour host          <----------------
> >>                        peer pci/0000:05:00.0/10000
> >> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 
> >>                        flavour host          <----------------
> >>                        peer pci/0000:05:00.0/10001
> >> 7) pci/0000:05:00.0/2: type eth netdev enp5s0f0??
> >>                        flavour host          <----------------
> >>                        peer pci/0000:05:00.0/10001
> >> 
> >> I think it looks quite clear, it gives complete topology view.  
> >
> >Okay, I have some of questions :)
> >
> >What do we use for port_index?  
> 
> That is just a number totally in control of the driver. Driver can
> assign it in any way.
> 
> >
> >What are the operations one can perform on "host ports"?  
> 
> That is a good question. I would start with *none* and extend it upon
> needs.
> 
> 
> >
> >If we have PCI parameters, do they get set on the ASIC side of the port
> >or the host side of the port?  
> 
> Could you give me an example?

Let's take msix_vec_per_pf_min as an example.  

> But I believe that on switch-port side.

Ok.

> >How do those behave when device is passed to VM?  
> 
> In case of VF? VF will have separate devlink instance (separate handle,
> probably "aliased" to the PF handle). So it would disappear from
> baremetal and appear in VM:
> $ devlink dev
> pci/0000:00:10.0
> $ devlink dev port
> pci/0000:00:10.1/0: type eth netdev enp5s10f0
>                     flavour host
> That's it for the VM.
> 
> There's no linkage (peer, alias) between this and the instances on
> baremetal. 

Ok, I guess this is the main advantage from your perspective?
The fact that "host ports" are visible inside a VM?
Or do you believe that having both ends of a pipe as ports makes the
topology easier to understand?

For creating subdevices, I don't think the handle should ever be port.
We create new ports on a devlink instance, and configure its forwarding
with offloads of well established Linux SW constructs.  New devices are
not logically associated with other ports (see how in my patches there
are 2 "subports" but no main port on that PF - a split not a hierarchy).

How we want to model forwarding inside a VM (who configures the
underlying switching) remains unclear.

> >You have a VF devlink instance there - what ports does it show?  
> 
> See above.
> 
> 
> >
> >How do those look when the PF is connected to another host?  Do they
> >get spawned at all?  
> 
> What do you mean by "PF is connected to another host"?

Either "SmartNIC":

http://www.mellanox.com/products/smartnic/?ls=gppc&lsd=SmartNIC-gen-smartnic&gclid=EAIaIQobChMIxIrGmYju4AIVy5yzCh2SFwQJEAAYASAAEgIui_D_BwE

or

Multi-host NIC: http://www.mellanox.com/page/multihost

> >Will this not be confusing to DSA folks who have a CPU port?  
> 
> Why do you think so?

Host and CPU sound quite similar, it is unclear how they differ, and
why we have a need for both (from user's perspective).

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-06 17:56                 ` Jakub Kicinski
@ 2019-03-07  3:56                   ` Parav Pandit
  2019-03-07  9:48                   ` Jiri Pirko
  1 sibling, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-07  3:56 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: davem, netdev, oss-drivers

Hi Jakub,

> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jakub Kicinski
> Sent: Wednesday, March 6, 2019 11:57 AM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; oss-
> drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Wed, 6 Mar 2019 13:20:37 +0100, Jiri Pirko wrote:
> > Tue, Mar 05, 2019 at 06:15:34PM CET, jakub.kicinski@netronome.com
> wrote:
> > >On Tue, 5 Mar 2019 12:06:01 +0100, Jiri Pirko wrote:
> > >> >> >as ports.  Can we invent a new command (say "partition"?) that'd
> take
> > >> >> >the bus info where the partition is to be spawned?
> > >> >>
> > >> >> Got it. But the question is how different this object would be from
> the
> > >> >> existing "port" we have today.
> > >> >
> > >> >They'd be where "the other side of a PCI link" is represented,
> > >> >restricting ports to only ASIC's forwarding plane ports.
> > >>
> > >> Basically a "host port", right? It can still be the same port
> > >> object, only with different flavour and attributes. So we would have:
> > >>
> > >> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0
> > >>                        flavour physical switch_id 00154d130d2f
> > >> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0
> > >>                            flavour pci_pf pf 0 subport 0
> > >>                            switch_id 00154d130d2f
> > >>                            peer pci/0000:05:00.0/1
> > >> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0
> > >>                            flavour pci_vf pf 0 vf 0
> > >>                            switch_id 00154d130d2f
> > >>                            peer pci/0000:05:10.1/0
> > >> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1
> > >>                            flavour pci_pf pf 0 subport 1
> > >>                            switch_id 00154d130d2f
> > >>                            peer pci/0000:05:00.0/2
> > >> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0??
> > >>                        flavour host          <----------------
> > >>                        peer pci/0000:05:00.0/10000
> > >> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0
> > >>                        flavour host          <----------------
> > >>                        peer pci/0000:05:00.0/10001
> > >> 7) pci/0000:05:00.0/2: type eth netdev enp5s0f0??
> > >>                        flavour host          <----------------
> > >>                        peer pci/0000:05:00.0/10001
> > >>
> > >> I think it looks quite clear, it gives complete topology view.
> > >
> > >Okay, I have some of questions :)
> > >
> > >What do we use for port_index?
> >
> > That is just a number totally in control of the driver. Driver can
> > assign it in any way.
> >
> > >
> > >What are the operations one can perform on "host ports"?
> >
> > That is a good question. I would start with *none* and extend it upon
> > needs.
> >
> >
> > >
> > >If we have PCI parameters, do they get set on the ASIC side of the
> > >port or the host side of the port?
> >
> > Could you give me an example?
> 
> Let's take msix_vec_per_pf_min as an example.
> 
> > But I believe that on switch-port side.
> 
> Ok.
> 
> > >How do those behave when device is passed to VM?
> >
> > In case of VF? VF will have separate devlink instance (separate
> > handle, probably "aliased" to the PF handle). So it would disappear
> > from baremetal and appear in VM:
> > $ devlink dev
> > pci/0000:00:10.0
> > $ devlink dev port
> > pci/0000:00:10.1/0: type eth netdev enp5s10f0
> >                     flavour host
> > That's it for the VM.
> >
> > There's no linkage (peer, alias) between this and the instances on
> > baremetal.
> 
> Ok, I guess this is the main advantage from your perspective?
> The fact that "host ports" are visible inside a VM?
> Or do you believe that having both ends of a pipe as ports makes the
> topology easier to understand?
> 
> For creating subdevices, I don't think the handle should ever be port.

I updated the proposal [1], haven't sent updated (reduced) RFC patches yet.
subdevices are created using already existing 'mdev' framework.
You also mentioned in one of the past email discussion.
These subdevices live on 'mdev' bus.

Handle for creating this subdev is its parent PCI device.
Mdev framework exposes bunch of sysfs files for these work.

vendors who wish to get it connected in switchdev mode,
(I guest most of the current switchdev drivers),
will be creating mdev device and associated devlink instance (and port).

This way, created subdev can be provisioned for VM or on host itself using unified scheme.
subdev's parameters, its hostport, its switchport (rep-netdev) 
can be controlled similar to VFs.

[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1948602.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-06 17:56                 ` Jakub Kicinski
  2019-03-07  3:56                   ` Parav Pandit
@ 2019-03-07  9:48                   ` Jiri Pirko
  2019-03-08  2:52                     ` Jakub Kicinski
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-07  9:48 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Wed, Mar 06, 2019 at 06:56:38PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 6 Mar 2019 13:20:37 +0100, Jiri Pirko wrote:
>> Tue, Mar 05, 2019 at 06:15:34PM CET, jakub.kicinski@netronome.com wrote:
>> >On Tue, 5 Mar 2019 12:06:01 +0100, Jiri Pirko wrote:  
>> >> >> >as ports.  Can we invent a new command (say "partition"?) that'd take
>> >> >> >the bus info where the partition is to be spawned?      
>> >> >> 
>> >> >> Got it. But the question is how different this object would be from the
>> >> >> existing "port" we have today.    
>> >> >
>> >> >They'd be where "the other side of a PCI link" is represented,
>> >> >restricting ports to only ASIC's forwarding plane ports.    
>> >> 
>> >> Basically a "host port", right? It can still be the same port object,
>> >> only with different flavour and attributes. So we would have:
>> >> 
>> >> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0
>> >>                        flavour physical switch_id 00154d130d2f
>> >> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0
>> >>                            flavour pci_pf pf 0 subport 0
>> >>                            switch_id 00154d130d2f
>> >>                            peer pci/0000:05:00.0/1
>> >> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0
>> >>                            flavour pci_vf pf 0 vf 0
>> >>                            switch_id 00154d130d2f
>> >>                            peer pci/0000:05:10.1/0
>> >> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1
>> >>                            flavour pci_pf pf 0 subport 1
>> >>                            switch_id 00154d130d2f
>> >>                            peer pci/0000:05:00.0/2
>> >> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0??
>> >>                        flavour host          <----------------
>> >>                        peer pci/0000:05:00.0/10000
>> >> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 
>> >>                        flavour host          <----------------
>> >>                        peer pci/0000:05:00.0/10001
>> >> 7) pci/0000:05:00.0/2: type eth netdev enp5s0f0??
>> >>                        flavour host          <----------------
>> >>                        peer pci/0000:05:00.0/10001
>> >> 
>> >> I think it looks quite clear, it gives complete topology view.  
>> >
>> >Okay, I have some of questions :)
>> >
>> >What do we use for port_index?  
>> 
>> That is just a number totally in control of the driver. Driver can
>> assign it in any way.
>> 
>> >
>> >What are the operations one can perform on "host ports"?  
>> 
>> That is a good question. I would start with *none* and extend it upon
>> needs.
>> 
>> 
>> >
>> >If we have PCI parameters, do they get set on the ASIC side of the port
>> >or the host side of the port?  
>> 
>> Could you give me an example?
>
>Let's take msix_vec_per_pf_min as an example.  
>
>> But I believe that on switch-port side.
>
>Ok.
>
>> >How do those behave when device is passed to VM?  
>> 
>> In case of VF? VF will have separate devlink instance (separate handle,
>> probably "aliased" to the PF handle). So it would disappear from
>> baremetal and appear in VM:
>> $ devlink dev
>> pci/0000:00:10.0
>> $ devlink dev port
>> pci/0000:00:10.1/0: type eth netdev enp5s10f0
>>                     flavour host
>> That's it for the VM.
>> 
>> There's no linkage (peer, alias) between this and the instances on
>> baremetal. 
>
>Ok, I guess this is the main advantage from your perspective?
>The fact that "host ports" are visible inside a VM?

Yep. Also on baremetal.


>Or do you believe that having both ends of a pipe as ports makes the
>topology easier to understand?

That as well.


>
>For creating subdevices, I don't think the handle should ever be port.
>We create new ports on a devlink instance, and configure its forwarding

Okay I agree. Something like:
$ devlink port add pci/0000:00:10.0 .....

It's a bit confusing because "set" accepts port handle (like
pci/0000:00:10.0/1). Probably better would be:
$ devlink dev port add pci/0000:00:10.0 .....


>with offloads of well established Linux SW constructs.  New devices are
>not logically associated with other ports (see how in my patches there
>are 2 "subports" but no main port on that PF - a split not a hierarchy).

Right, basically you have 2 equal objects. Makes sense.


>
>How we want to model forwarding inside a VM (who configures the
>underlying switching) remains unclear.

I don't understand. Could you elaborate a bit?


>
>> >You have a VF devlink instance there - what ports does it show?  
>> 
>> See above.
>> 
>> 
>> >
>> >How do those look when the PF is connected to another host?  Do they
>> >get spawned at all?  
>> 
>> What do you mean by "PF is connected to another host"?
>
>Either "SmartNIC":
>
>http://www.mellanox.com/products/smartnic/?ls=gppc&lsd=SmartNIC-gen-smartnic&gclid=EAIaIQobChMIxIrGmYju4AIVy5yzCh2SFwQJEAAYASAAEgIui_D_BwE
>
>or
>
>Multi-host NIC: http://www.mellanox.com/page/multihost

Right. So in this case, I think that the hostport on specific host
should see devlink instance and the hostport. However, the switchports
should be only on one selected host (I don't see how to do that
differently)


>
>> >Will this not be confusing to DSA folks who have a CPU port?  
>> 
>> Why do you think so?
>
>Host and CPU sound quite similar, it is unclear how they differ, and
>why we have a need for both (from user's perspective).

Hmm, dsa cpu port is something different. It does not have netdev
associated with it. It is just a port which is physically used in order
to send or receive packets on switch ports.

However in our hostport case, it has user facing netdev associated and
user actually uses it to send and receive packets directly (assigns ip
etc).


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-07  9:48                   ` Jiri Pirko
@ 2019-03-08  2:52                     ` Jakub Kicinski
  2019-03-08 14:54                       ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-08  2:52 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Thu, 7 Mar 2019 10:48:16 +0100, Jiri Pirko wrote:
> Wed, Mar 06, 2019 at 06:56:38PM CET, jakub.kicinski@netronome.com wrote:
> >On Wed, 6 Mar 2019 13:20:37 +0100, Jiri Pirko wrote:  
> >For creating subdevices, I don't think the handle should ever be port.
> >We create new ports on a devlink instance, and configure its forwarding  
> 
> Okay I agree. Something like:
> $ devlink port add pci/0000:00:10.0 .....
> 
> It's a bit confusing because "set" accepts port handle (like
> pci/0000:00:10.0/1). Probably better would be:
> $ devlink dev port add pci/0000:00:10.0 .....
> 
> >with offloads of well established Linux SW constructs.  New devices are
> >not logically associated with other ports (see how in my patches there
> >are 2 "subports" but no main port on that PF - a split not a hierarchy).  
> 
> Right, basically you have 2 equal objects. Makes sense.
> 
> >How we want to model forwarding inside a VM (who configures the
> >underlying switching) remains unclear.  
> 
> I don't understand. Could you elaborate a bit?

If VF in a VM gets a partitioning request does the new port pop up on
the hypervisor?  With a port netdev?

Does the VF also create a port object as well as host port object?

That question is probably independent of host port discussion.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-08  2:52                     ` Jakub Kicinski
@ 2019-03-08 14:54                       ` Jiri Pirko
  2019-03-08 19:09                         ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-08 14:54 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Fri, Mar 08, 2019 at 03:52:02AM CET, jakub.kicinski@netronome.com wrote:
>On Thu, 7 Mar 2019 10:48:16 +0100, Jiri Pirko wrote:
>> Wed, Mar 06, 2019 at 06:56:38PM CET, jakub.kicinski@netronome.com wrote:
>> >On Wed, 6 Mar 2019 13:20:37 +0100, Jiri Pirko wrote:  
>> >For creating subdevices, I don't think the handle should ever be port.
>> >We create new ports on a devlink instance, and configure its forwarding  
>> 
>> Okay I agree. Something like:
>> $ devlink port add pci/0000:00:10.0 .....
>> 
>> It's a bit confusing because "set" accepts port handle (like
>> pci/0000:00:10.0/1). Probably better would be:
>> $ devlink dev port add pci/0000:00:10.0 .....
>> 
>> >with offloads of well established Linux SW constructs.  New devices are
>> >not logically associated with other ports (see how in my patches there
>> >are 2 "subports" but no main port on that PF - a split not a hierarchy).  
>> 
>> Right, basically you have 2 equal objects. Makes sense.
>> 
>> >How we want to model forwarding inside a VM (who configures the
>> >underlying switching) remains unclear.  
>> 
>> I don't understand. Could you elaborate a bit?
>
>If VF in a VM gets a partitioning request does the new port pop up on
>the hypervisor?  With a port netdev?

Switchport in hypervizor with correct switchid attribute, hostport in
vm. Makes sense?

>
>Does the VF also create a port object as well as host port object?
>
>That question is probably independent of host port discussion.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-08 14:54                       ` Jiri Pirko
@ 2019-03-08 19:09                         ` Jakub Kicinski
  2019-03-11  8:52                           ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-08 19:09 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Fri, 8 Mar 2019 15:54:21 +0100, Jiri Pirko wrote:
> Fri, Mar 08, 2019 at 03:52:02AM CET, jakub.kicinski@netronome.com wrote:
> >On Thu, 7 Mar 2019 10:48:16 +0100, Jiri Pirko wrote:  
> >> Wed, Mar 06, 2019 at 06:56:38PM CET, jakub.kicinski@netronome.com wrote:  
> >> >On Wed, 6 Mar 2019 13:20:37 +0100, Jiri Pirko wrote:  
> >> >For creating subdevices, I don't think the handle should ever be port.
> >> >We create new ports on a devlink instance, and configure its forwarding    
> >> 
> >> Okay I agree. Something like:
> >> $ devlink port add pci/0000:00:10.0 .....
> >> 
> >> It's a bit confusing because "set" accepts port handle (like
> >> pci/0000:00:10.0/1). Probably better would be:
> >> $ devlink dev port add pci/0000:00:10.0 .....
> >>   
> >> >with offloads of well established Linux SW constructs.  New devices are
> >> >not logically associated with other ports (see how in my patches there
> >> >are 2 "subports" but no main port on that PF - a split not a hierarchy).    
> >> 
> >> Right, basically you have 2 equal objects. Makes sense.
> >>   
> >> >How we want to model forwarding inside a VM (who configures the
> >> >underlying switching) remains unclear.    
> >> 
> >> I don't understand. Could you elaborate a bit?  
> >
> >If VF in a VM gets a partitioning request does the new port pop up on
> >the hypervisor?  With a port netdev?  
> 
> Switchport in hypervizor with correct switchid attribute, hostport in
> vm. Makes sense?

If the switchport is in the hypervisor then only the hypervisor can
control switching/forwarding, correct?

The primary use case for partitioning within a VM (of a VF) would be
containers (and DPDK)?

SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
I'm trying to get a sense of is how would we control an SR-IOV
environment as a whole.

> >Does the VF also create a port object as well as host port object?
> >
> >That question is probably independent of host port discussion.  


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-08 19:09                         ` Jakub Kicinski
@ 2019-03-11  8:52                           ` Jiri Pirko
  2019-03-12  2:10                             ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-11  8:52 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Fri, Mar 08, 2019 at 08:09:43PM CET, jakub.kicinski@netronome.com wrote:
>On Fri, 8 Mar 2019 15:54:21 +0100, Jiri Pirko wrote:
>> Fri, Mar 08, 2019 at 03:52:02AM CET, jakub.kicinski@netronome.com wrote:
>> >On Thu, 7 Mar 2019 10:48:16 +0100, Jiri Pirko wrote:  
>> >> Wed, Mar 06, 2019 at 06:56:38PM CET, jakub.kicinski@netronome.com wrote:  
>> >> >On Wed, 6 Mar 2019 13:20:37 +0100, Jiri Pirko wrote:  
>> >> >For creating subdevices, I don't think the handle should ever be port.
>> >> >We create new ports on a devlink instance, and configure its forwarding    
>> >> 
>> >> Okay I agree. Something like:
>> >> $ devlink port add pci/0000:00:10.0 .....
>> >> 
>> >> It's a bit confusing because "set" accepts port handle (like
>> >> pci/0000:00:10.0/1). Probably better would be:
>> >> $ devlink dev port add pci/0000:00:10.0 .....
>> >>   
>> >> >with offloads of well established Linux SW constructs.  New devices are
>> >> >not logically associated with other ports (see how in my patches there
>> >> >are 2 "subports" but no main port on that PF - a split not a hierarchy).    
>> >> 
>> >> Right, basically you have 2 equal objects. Makes sense.
>> >>   
>> >> >How we want to model forwarding inside a VM (who configures the
>> >> >underlying switching) remains unclear.    
>> >> 
>> >> I don't understand. Could you elaborate a bit?  
>> >
>> >If VF in a VM gets a partitioning request does the new port pop up on
>> >the hypervisor?  With a port netdev?  
>> 
>> Switchport in hypervizor with correct switchid attribute, hostport in
>> vm. Makes sense?
>
>If the switchport is in the hypervisor then only the hypervisor can
>control switching/forwarding, correct?

Correct.


>
>The primary use case for partitioning within a VM (of a VF) would be
>containers (and DPDK)?

Makes sense.


>
>SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
>I'm trying to get a sense of is how would we control an SR-IOV
>environment as a whole.

You mean orchestration? I originally planned to implement sriov
orchestration api in devlink too.


>
>> >Does the VF also create a port object as well as host port object?
>> >
>> >That question is probably independent of host port discussion.  
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-11  8:52                           ` Jiri Pirko
@ 2019-03-12  2:10                             ` Jakub Kicinski
  2019-03-12 14:02                               ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-12  2:10 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:
> Fri, Mar 08, 2019 at 08:09:43PM CET, jakub.kicinski@netronome.com wrote:
> >If the switchport is in the hypervisor then only the hypervisor can
> >control switching/forwarding, correct?  
> 
> Correct.
> 
> >The primary use case for partitioning within a VM (of a VF) would be
> >containers (and DPDK)?  
> 
> Makes sense.
> 
> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
> >I'm trying to get a sense of is how would we control an SR-IOV
> >environment as a whole.  
> 
> You mean orchestration? 

Right, orchestration.

To be clear on where I'm going with this - if we want to allow VFs 
to partition themselves then they have to control what is effectively 
a "nested" switch.  A per-VF set of rules which would the get
"flattened" into the main eswitch rule set.  If I was to choose I'd
really rather have this "flattening" be done on the (Linux) hypervisor
and not in the vendor driver and firmware.

I'd much rather have the VM make a "give me another NIC" orchestration
call via some high level REST API than devlink.  This makes the
configuration strictly high level to low level:

  VM -> cloud net REST API -> cloud agent -> devlink/Linux -> FW -> HW

Without round trips via firmware.  

This allows for easy policy enforcement, common code to be maintained
in user space, in high level languages (no 0.5M LoC drivers and 10M LoC
firmware for every driver).  It can also be used with software paths
like VirtIO..

Modelling and debugging a nested switch would be a nightmare.  What
follows is that we probably shouldn't deal with partitioning of VFs,
but rather only partition via the PF devlink instance, and reassign 
the partitions to VMs.

> I originally planned to implement sriov orchestration api in devlink too.

Interesting, would you mind elaborating?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-12  2:10                             ` Jakub Kicinski
@ 2019-03-12 14:02                               ` Jiri Pirko
  2019-03-12 20:56                                 ` Jakub Kicinski
       [not found]                                 ` <7227d58e-ac58-d549-b921-ca0a0dd3f4b0@intel.com>
  0 siblings, 2 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-12 14:02 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Tue, Mar 12, 2019 at 03:10:54AM CET, jakub.kicinski@netronome.com wrote:
>On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:
>> Fri, Mar 08, 2019 at 08:09:43PM CET, jakub.kicinski@netronome.com wrote:
>> >If the switchport is in the hypervisor then only the hypervisor can
>> >control switching/forwarding, correct?  
>> 
>> Correct.
>> 
>> >The primary use case for partitioning within a VM (of a VF) would be
>> >containers (and DPDK)?  
>> 
>> Makes sense.
>> 
>> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
>> >I'm trying to get a sense of is how would we control an SR-IOV
>> >environment as a whole.  
>> 
>> You mean orchestration? 
>
>Right, orchestration.
>
>To be clear on where I'm going with this - if we want to allow VFs 
>to partition themselves then they have to control what is effectively 
>a "nested" switch.  A per-VF set of rules which would the get

Wait. If you allow to make VF subports (I believe that is what you ment
by VFs partition themselves), that does not mean they will have a
separate nested switch. They would still belong under the same one.


>"flattened" into the main eswitch rule set.  If I was to choose I'd
>really rather have this "flattening" be done on the (Linux) hypervisor
>and not in the vendor driver and firmware.

Agreed. Driver should provide one big switch. User should configure it.


>
>I'd much rather have the VM make a "give me another NIC" orchestration
>call via some high level REST API than devlink.  This makes the
>configuration strictly high level to low level:
>
>  VM -> cloud net REST API -> cloud agent -> devlink/Linux -> FW -> HW
>
>Without round trips via firmware.  

Okay. So the "devlink/Linux -> FW" part is going to happen on baremetal.
Makes sense.


>
>This allows for easy policy enforcement, common code to be maintained
>in user space, in high level languages (no 0.5M LoC drivers and 10M LoC
>firmware for every driver).  It can also be used with software paths
>like VirtIO..

Agreed.


>
>Modelling and debugging a nested switch would be a nightmare.  What
>follows is that we probably shouldn't deal with partitioning of VFs,
>but rather only partition via the PF devlink instance, and reassign 
>the partitions to VMs.

Agreed. That must be misunderstanding, I never suggested nested
switches.


>
>> I originally planned to implement sriov orchestration api in devlink too.
>
>Interesting, would you mind elaborating?

I have to think about it. But something like this:

After bootup, you see only physical port, PF switch port and PF host leg.
$ devlink port show
pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
                    peer pci/0000:05:00.0/10000
pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
                    switch_id 00154d130d2f peer pci/0000:05:00.0/1

To create new PF subport under PF 0:
$ devlink dev port add pci/0000:05:00.0 flavour pci_pf pf 0
$ devlink port show
pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
                    peer pci/0000:05:00.0/10000
pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
                    switch_id 00154d130d2f peer pci/0000:05:00.0/1
pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host                            <<<<<<<<<<<<<<<<<<
                    peer pci/0000:05:00.0/10001                                        <<<<<<<<<<<<<<<<<<
pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1  <<<<<<<<<<<<<<<<<<
                    switch_id 00154d130d2f peer pci/0000:05:00.0/2                     <<<<<<<<<<<<<<<<<<

To create a new VF under PF 0:
$ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0
$ devlink port show
pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
                    peer pci/0000:05:00.0/10000
pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
                    switch_id 00154d130d2f peer pci/0000:05:00.0/1
pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host
                    peer pci/0000:05:00.0/10001
pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1
                    switch_id 00154d130d2f peer pci/0000:05:00.0/2
pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host                            <<<<<<<<<<<<<<<<<<
                    peer pci/0000:05:00.0/10002                                        <<<<<<<<<<<<<<<<<<
pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0       <<<<<<<<<<<<<<<<<<
                    switch_id 00154d130d2f peer pci/0000:05:10.1/0                     <<<<<<<<<<<<<<<<<<

So new VF is created.


To delete, use would need to use the port which is in eswitch:
$ devlink port del pci/0000:05:00.0/2
devlink answers: Operation not permitted
$ devlink port del pci/0000:05:00.0/10001     <<<<<<<<<< this

$ devlink port del pci/0000:05:10.1/0
devlink answers: Operation not permitted
$ devlink port del pci/0000:05:00.0/10002     <<<<<<<<<< this

This actually removes VF.


For VF subports this would work too, we just have to have "subport"
attribute not only for PFs but also for VFs:

To create a new VF subport under PF 0 and VF 0:
$ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 vf 0
$ devlink port show
pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
                    peer pci/0000:05:00.0/10000
pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
                    switch_id 00154d130d2f peer pci/0000:05:00.0/1
pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host
                    peer pci/0000:05:00.0/10001
pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1
                    switch_id 00154d130d2f peer pci/0000:05:00.0/2
pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
                    peer pci/0000:05:00.0/10002
pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0 subport 0
                    switch_id 00154d130d2f peer pci/0000:05:10.1/0
pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host                                  <<<<<<<<<<<<<<<<<<
                    peer pci/0000:05:00.0/10003                                              <<<<<<<<<<<<<<<<<<
pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0 subport 1   <<<<<<<<<<<<<<<<<<
                    switch_id 00154d130d2f peer pci/0000:05:10.1/1                           <<<<<<<<<<<<<<<<<<



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-12 14:02                               ` Jiri Pirko
@ 2019-03-12 20:56                                 ` Jakub Kicinski
  2019-03-13  6:07                                   ` Jiri Pirko
       [not found]                                 ` <7227d58e-ac58-d549-b921-ca0a0dd3f4b0@intel.com>
  1 sibling, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-12 20:56 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:
> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:
> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:  
> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:  
> >> >If the switchport is in the hypervisor then only the hypervisor can
> >> >control switching/forwarding, correct?    
> >> 
> >> Correct.
> >>   
> >> >The primary use case for partitioning within a VM (of a VF) would be
> >> >containers (and DPDK)?    
> >> 
> >> Makes sense.
> >>   
> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
> >> >I'm trying to get a sense of is how would we control an SR-IOV
> >> >environment as a whole.    
> >> 
> >> You mean orchestration?   
> >
> >Right, orchestration.
> >
> >To be clear on where I'm going with this - if we want to allow VFs 
> >to partition themselves then they have to control what is effectively 
> >a "nested" switch.  A per-VF set of rules which would the get  
> 
> Wait. If you allow to make VF subports (I believe that is what you ment
> by VFs partition themselves), that does not mean they will have a
> separate nested switch. They would still belong under the same one.

But that existing switch is administered by the hypervisor, how would
the VF owners install forwarding rules in a switch they don't control?

> >"flattened" into the main eswitch rule set.  If I was to choose I'd
> >really rather have this "flattening" be done on the (Linux) hypervisor
> >and not in the vendor driver and firmware.  
> 
> Agreed. Driver should provide one big switch. User should configure it.

Cool, when you say user - is it the tenant or the provider?

> >I'd much rather have the VM make a "give me another NIC" orchestration
> >call via some high level REST API than devlink.  This makes the
> >configuration strictly high level to low level:
> >
> >  VM -> cloud net REST API -> cloud agent -> devlink/Linux -> FW -> HW
> >
> >Without round trips via firmware.    
> 
> Okay. So the "devlink/Linux -> FW" part is going to happen on baremetal.
> Makes sense.
> 
> >This allows for easy policy enforcement, common code to be maintained
> >in user space, in high level languages (no 0.5M LoC drivers and 10M LoC
> >firmware for every driver).  It can also be used with software paths
> >like VirtIO..  
> 
> Agreed.
> 
> >Modelling and debugging a nested switch would be a nightmare.  What
> >follows is that we probably shouldn't deal with partitioning of VFs,
> >but rather only partition via the PF devlink instance, and reassign 
> >the partitions to VMs.  
> 
> Agreed. That must be misunderstanding, I never suggested nested
> switches.

Cool, yes, I was making sure we weren't going in that direction :)

> >> I originally planned to implement sriov orchestration api in devlink too.  
> >
> >Interesting, would you mind elaborating?  
> 
> I have to think about it. But something like this:
> [...]

I see thanks for the examples, they makes things clear!

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-12 20:56                                 ` Jakub Kicinski
@ 2019-03-13  6:07                                   ` Jiri Pirko
  2019-03-13 16:17                                     ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-13  6:07 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Tue, Mar 12, 2019 at 09:56:28PM CET, jakub.kicinski@netronome.com wrote:
>On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:
>> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:
>> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:  
>> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:  
>> >> >If the switchport is in the hypervisor then only the hypervisor can
>> >> >control switching/forwarding, correct?    
>> >> 
>> >> Correct.
>> >>   
>> >> >The primary use case for partitioning within a VM (of a VF) would be
>> >> >containers (and DPDK)?    
>> >> 
>> >> Makes sense.
>> >>   
>> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
>> >> >I'm trying to get a sense of is how would we control an SR-IOV
>> >> >environment as a whole.    
>> >> 
>> >> You mean orchestration?   
>> >
>> >Right, orchestration.
>> >
>> >To be clear on where I'm going with this - if we want to allow VFs 
>> >to partition themselves then they have to control what is effectively 
>> >a "nested" switch.  A per-VF set of rules which would the get  
>> 
>> Wait. If you allow to make VF subports (I believe that is what you ment
>> by VFs partition themselves), that does not mean they will have a
>> separate nested switch. They would still belong under the same one.
>
>But that existing switch is administered by the hypervisor, how would
>the VF owners install forwarding rules in a switch they don't control?

They won't.


>
>> >"flattened" into the main eswitch rule set.  If I was to choose I'd
>> >really rather have this "flattening" be done on the (Linux) hypervisor
>> >and not in the vendor driver and firmware.  
>> 
>> Agreed. Driver should provide one big switch. User should configure it.
>
>Cool, when you say user - is it the tenant or the provider?

Whoever gets access to the instance.


>
>> >I'd much rather have the VM make a "give me another NIC" orchestration
>> >call via some high level REST API than devlink.  This makes the
>> >configuration strictly high level to low level:
>> >
>> >  VM -> cloud net REST API -> cloud agent -> devlink/Linux -> FW -> HW
>> >
>> >Without round trips via firmware.    
>> 
>> Okay. So the "devlink/Linux -> FW" part is going to happen on baremetal.
>> Makes sense.
>> 
>> >This allows for easy policy enforcement, common code to be maintained
>> >in user space, in high level languages (no 0.5M LoC drivers and 10M LoC
>> >firmware for every driver).  It can also be used with software paths
>> >like VirtIO..  
>> 
>> Agreed.
>> 
>> >Modelling and debugging a nested switch would be a nightmare.  What
>> >follows is that we probably shouldn't deal with partitioning of VFs,
>> >but rather only partition via the PF devlink instance, and reassign 
>> >the partitions to VMs.  
>> 
>> Agreed. That must be misunderstanding, I never suggested nested
>> switches.
>
>Cool, yes, I was making sure we weren't going in that direction :)

Okay.


>
>> >> I originally planned to implement sriov orchestration api in devlink too.  
>> >
>> >Interesting, would you mind elaborating?  
>> 
>> I have to think about it. But something like this:
>> [...]
>
>I see thanks for the examples, they makes things clear!

Okay. I will put together some documentation including this. I have some
patches that implement some of the stuff. Your patchset also does some
of that (considering you adjust a thing or two). Lets make this right. 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
       [not found]                                 ` <7227d58e-ac58-d549-b921-ca0a0dd3f4b0@intel.com>
@ 2019-03-13  7:37                                   ` Jiri Pirko
  2019-03-13 16:03                                     ` Samudrala, Sridhar
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-13  7:37 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: Jakub Kicinski, davem, netdev, oss-drivers

Wed, Mar 13, 2019 at 07:17:04AM CET, sridhar.samudrala@intel.com wrote:
>
>On 3/12/2019 7:02 AM, Jiri Pirko wrote:
>
>> 
>> > 
>> > > I originally planned to implement sriov orchestration api in devlink too.
>> > 
>> > Interesting, would you mind elaborating?
>> 
>> I have to think about it. But something like this:
>> 
>> After bootup, you see only physical port, PF switch port and PF host leg.
>
>Is this after changing the eswitch mode to 'switchdev'

I believe so. For new drivers, this should be default and only option.


>
>> $ devlink port show
>> pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
>
>Is this the uplink port representor?

Yes


>
>> pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
>>                      peer pci/0000:05:00.0/10000
>
>I guess this is PF netdev

Yes, port


>
>> pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
>>                      switch_id 00154d130d2f peer pci/0000:05:00.0/1
>
>and this one is PF port representor netdev

Yes, port


>
>> 
>> To create new PF subport under PF 0:
>> $ devlink dev port add pci/0000:05:00.0 flavour pci_pf pf 0
>
>Can we consider l2-fwd offload macvlan device also as a subport of PF?

What does this have to with with macvlan? Macvlan is a separate soft
driver.


>
>> $ devlink port show
>> pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
>> pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
>>                      peer pci/0000:05:00.0/10000
>> pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
>>                      switch_id 00154d130d2f peer pci/0000:05:00.0/1
>> pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host                            <<<<<<<<<<<<<<<<<<
>>                      peer pci/0000:05:00.0/10001                                        <<<<<<<<<<<<<<<<<<
>> pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1  <<<<<<<<<<<<<<<<<<
>>                      switch_id 00154d130d2f peer pci/0000:05:00.0/2                     <<<<<<<<<<<<<<<<<<
>> 
>> To create a new VF under PF 0:
>> $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0
>
>Does this mean that this interface allows creating VFs dynamically one at a
>time?

Yes


>
>> $ devlink port show
>> pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
>> pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
>>                      peer pci/0000:05:00.0/10000
>> pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
>>                      switch_id 00154d130d2f peer pci/0000:05:00.0/1
>> pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host
>>                      peer pci/0000:05:00.0/10001
>> pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1
>>                      switch_id 00154d130d2f peer pci/0000:05:00.0/2
>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host                            <<<<<<<<<<<<<<<<<<
>>                      peer pci/0000:05:00.0/10002                                        <<<<<<<<<<<<<<<<<<
>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0       <<<<<<<<<<<<<<<<<<
>>                      switch_id 00154d130d2f peer pci/0000:05:10.1/0                     <<<<<<<<<<<<<<<<<<
>> 
>> So new VF is created.
>> 
>> 
>> To delete, use would need to use the port which is in eswitch:
>> $ devlink port del pci/0000:05:00.0/2
>> devlink answers: Operation not permitted
>> $ devlink port del pci/0000:05:00.0/10001     <<<<<<<<<< this
>> 
>> $ devlink port del pci/0000:05:10.1/0
>> devlink answers: Operation not permitted
>> $ devlink port del pci/0000:05:00.0/10002     <<<<<<<<<< this
>> 
>> This actually removes VF.
>> 
>> 
>> For VF subports this would work too, we just have to have "subport"
>> attribute not only for PFs but also for VFs:
>> 
>> To create a new VF subport under PF 0 and VF 0:
>> $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 vf 0
>> $ devlink port show
>> pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
>> pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
>>                      peer pci/0000:05:00.0/10000
>> pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
>>                      switch_id 00154d130d2f peer pci/0000:05:00.0/1
>> pci/0000:05:00.0/2: type eth netdev ??? flavour pci_pf_host
>>                      peer pci/0000:05:00.0/10001
>> pci/0000:05:00.0/10001: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 1
>>                      switch_id 00154d130d2f peer pci/0000:05:00.0/2
>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
>>                      peer pci/0000:05:00.0/10002
>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0 subport 0
>>                      switch_id 00154d130d2f peer pci/0000:05:10.1/0
>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host                                  <<<<<<<<<<<<<<<<<<
>>                      peer pci/0000:05:00.0/10003                                              <<<<<<<<<<<<<<<<<<
>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0 flavour pci_vf pf 0 vf 0 subport 1   <<<<<<<<<<<<<<<<<<
>>                      switch_id 00154d130d2f peer pci/0000:05:10.1/1                           <<<<<<<<<<<<<<<<<<
>> 
>> 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-13  7:37                                   ` Jiri Pirko
@ 2019-03-13 16:03                                     ` Samudrala, Sridhar
  2019-03-13 16:24                                       ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Samudrala, Sridhar @ 2019-03-13 16:03 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, davem, netdev, oss-drivers

On 3/13/2019 12:37 AM, Jiri Pirko wrote:
> Wed, Mar 13, 2019 at 07:17:04AM CET, sridhar.samudrala@intel.com wrote:
>>
>> On 3/12/2019 7:02 AM, Jiri Pirko wrote:
>>
>>>
>>>>
>>>>> I originally planned to implement sriov orchestration api in devlink too.
>>>>
>>>> Interesting, would you mind elaborating?
>>>
>>> I have to think about it. But something like this:
>>>
>>> After bootup, you see only physical port, PF switch port and PF host leg.
>>
>> Is this after changing the eswitch mode to 'switchdev'
> 
> I believe so. For new drivers, this should be default and only option.
> 
> 
>>
>>> $ devlink port show
>>> pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
>>
>> Is this the uplink port representor?
> 
> Yes
> 
> 
>>
>>> pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
>>>                       peer pci/0000:05:00.0/10000
>>
>> I guess this is PF netdev
> 
> Yes, port
> 
> 
>>
>>> pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
>>>                       switch_id 00154d130d2f peer pci/0000:05:00.0/1
>>
>> and this one is PF port representor netdev
> 
> Yes, port
> 
> 
>>
>>>
>>> To create new PF subport under PF 0:
>>> $ devlink dev port add pci/0000:05:00.0 flavour pci_pf pf 0
>>
>> Can we consider l2-fwd offload macvlan device also as a subport of PF?
> 
> What does this have to with with macvlan? Macvlan is a separate soft
> driver.

ethtool -k <pf> l2-fwd-offload on
ip link add link <pf> type macvlan

will create a macvlan netdev but it is backed by a set of separate HW 
queues and switching is offloaded to HW. This can be considered as a 
subport. In i40e, it is a VMDq VSI.





^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-13  6:07                                   ` Jiri Pirko
@ 2019-03-13 16:17                                     ` Jakub Kicinski
  2019-03-13 16:22                                       ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-13 16:17 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Wed, 13 Mar 2019 07:07:01 +0100, Jiri Pirko wrote:
> Tue, Mar 12, 2019 at 09:56:28PM CET, jakub.kicinski@netronome.com wrote:
> >On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:  
> >> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:  
> >> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:    
> >> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:    
> >> >> >If the switchport is in the hypervisor then only the hypervisor can
> >> >> >control switching/forwarding, correct?      
> >> >> 
> >> >> Correct.
> >> >>     
> >> >> >The primary use case for partitioning within a VM (of a VF) would be
> >> >> >containers (and DPDK)?      
> >> >> 
> >> >> Makes sense.
> >> >>     
> >> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
> >> >> >I'm trying to get a sense of is how would we control an SR-IOV
> >> >> >environment as a whole.      
> >> >> 
> >> >> You mean orchestration?     
> >> >
> >> >Right, orchestration.
> >> >
> >> >To be clear on where I'm going with this - if we want to allow VFs 
> >> >to partition themselves then they have to control what is effectively 
> >> >a "nested" switch.  A per-VF set of rules which would the get    
> >> 
> >> Wait. If you allow to make VF subports (I believe that is what you ment
> >> by VFs partition themselves), that does not mean they will have a
> >> separate nested switch. They would still belong under the same one.  
> >
> >But that existing switch is administered by the hypervisor, how would
> >the VF owners install forwarding rules in a switch they don't control?  
> 
> They won't.

Argh.  So how is forwarding configured if there are no rules?  Are you
going to assume its switching on MACs?  We're supposed to offload
software constructs.  If its a software port it needs to be explicitly
switched.  If it's not explicitly switched - we already have macvlan
offload.

> >> >"flattened" into the main eswitch rule set.  If I was to choose I'd
> >> >really rather have this "flattening" be done on the (Linux) hypervisor
> >> >and not in the vendor driver and firmware.    
> >> 
> >> Agreed. Driver should provide one big switch. User should configure it.  
> >
> >Cool, when you say user - is it the tenant or the provider?  
> 
> Whoever gets access to the instance.
>  
> >> >I'd much rather have the VM make a "give me another NIC" orchestration
> >> >call via some high level REST API than devlink.  This makes the
> >> >configuration strictly high level to low level:
> >> >
> >> >  VM -> cloud net REST API -> cloud agent -> devlink/Linux -> FW -> HW
> >> >
> >> >Without round trips via firmware.      
> >> 
> >> Okay. So the "devlink/Linux -> FW" part is going to happen on baremetal.
> >> Makes sense.
> >>   
> >> >This allows for easy policy enforcement, common code to be maintained
> >> >in user space, in high level languages (no 0.5M LoC drivers and 10M LoC
> >> >firmware for every driver).  It can also be used with software paths
> >> >like VirtIO..    
> >> 
> >> Agreed.
> >>   
> >> >Modelling and debugging a nested switch would be a nightmare.  What
> >> >follows is that we probably shouldn't deal with partitioning of VFs,
> >> >but rather only partition via the PF devlink instance, and reassign 
> >> >the partitions to VMs.    
> >> 
> >> Agreed. That must be misunderstanding, I never suggested nested
> >> switches.  
> >
> >Cool, yes, I was making sure we weren't going in that direction :)  
> 
> Okay.
> 
> >> >> I originally planned to implement sriov orchestration api in devlink too.    
> >> >
> >> >Interesting, would you mind elaborating?    
> >> 
> >> I have to think about it. But something like this:
> >> [...]  
> >
> >I see thanks for the examples, they makes things clear!  
> 
> Okay. I will put together some documentation including this. I have some
> patches that implement some of the stuff. Your patchset also does some
> of that (considering you adjust a thing or two). Lets make this right. 

Yeah, I feel like I'm again getting further from clarity on what you're
trying to achieve.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-13 16:17                                     ` Jakub Kicinski
@ 2019-03-13 16:22                                       ` Jiri Pirko
  2019-03-13 16:55                                         ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-13 16:22 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Wed, Mar 13, 2019 at 05:17:31PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 13 Mar 2019 07:07:01 +0100, Jiri Pirko wrote:
>> Tue, Mar 12, 2019 at 09:56:28PM CET, jakub.kicinski@netronome.com wrote:
>> >On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:  
>> >> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:  
>> >> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:    
>> >> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:    
>> >> >> >If the switchport is in the hypervisor then only the hypervisor can
>> >> >> >control switching/forwarding, correct?      
>> >> >> 
>> >> >> Correct.
>> >> >>     
>> >> >> >The primary use case for partitioning within a VM (of a VF) would be
>> >> >> >containers (and DPDK)?      
>> >> >> 
>> >> >> Makes sense.
>> >> >>     
>> >> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
>> >> >> >I'm trying to get a sense of is how would we control an SR-IOV
>> >> >> >environment as a whole.      
>> >> >> 
>> >> >> You mean orchestration?     
>> >> >
>> >> >Right, orchestration.
>> >> >
>> >> >To be clear on where I'm going with this - if we want to allow VFs 
>> >> >to partition themselves then they have to control what is effectively 
>> >> >a "nested" switch.  A per-VF set of rules which would the get    
>> >> 
>> >> Wait. If you allow to make VF subports (I believe that is what you ment
>> >> by VFs partition themselves), that does not mean they will have a
>> >> separate nested switch. They would still belong under the same one.  
>> >
>> >But that existing switch is administered by the hypervisor, how would
>> >the VF owners install forwarding rules in a switch they don't control?  
>> 
>> They won't.
>
>Argh.  So how is forwarding configured if there are no rules?  Are you
>going to assume its switching on MACs?  We're supposed to offload
>software constructs.  If its a software port it needs to be explicitly
>switched.  If it's not explicitly switched - we already have macvlan
>offload.

Wait a second. You configure the switch. And for that, you have the
switchports (representors). What we are talking about are VF (VF
subport) host legs. Am I missing something?


>
>> >> >"flattened" into the main eswitch rule set.  If I was to choose I'd
>> >> >really rather have this "flattening" be done on the (Linux) hypervisor
>> >> >and not in the vendor driver and firmware.    
>> >> 
>> >> Agreed. Driver should provide one big switch. User should configure it.  
>> >
>> >Cool, when you say user - is it the tenant or the provider?  
>> 
>> Whoever gets access to the instance.
>>  
>> >> >I'd much rather have the VM make a "give me another NIC" orchestration
>> >> >call via some high level REST API than devlink.  This makes the
>> >> >configuration strictly high level to low level:
>> >> >
>> >> >  VM -> cloud net REST API -> cloud agent -> devlink/Linux -> FW -> HW
>> >> >
>> >> >Without round trips via firmware.      
>> >> 
>> >> Okay. So the "devlink/Linux -> FW" part is going to happen on baremetal.
>> >> Makes sense.
>> >>   
>> >> >This allows for easy policy enforcement, common code to be maintained
>> >> >in user space, in high level languages (no 0.5M LoC drivers and 10M LoC
>> >> >firmware for every driver).  It can also be used with software paths
>> >> >like VirtIO..    
>> >> 
>> >> Agreed.
>> >>   
>> >> >Modelling and debugging a nested switch would be a nightmare.  What
>> >> >follows is that we probably shouldn't deal with partitioning of VFs,
>> >> >but rather only partition via the PF devlink instance, and reassign 
>> >> >the partitions to VMs.    
>> >> 
>> >> Agreed. That must be misunderstanding, I never suggested nested
>> >> switches.  
>> >
>> >Cool, yes, I was making sure we weren't going in that direction :)  
>> 
>> Okay.
>> 
>> >> >> I originally planned to implement sriov orchestration api in devlink too.    
>> >> >
>> >> >Interesting, would you mind elaborating?    
>> >> 
>> >> I have to think about it. But something like this:
>> >> [...]  
>> >
>> >I see thanks for the examples, they makes things clear!  
>> 
>> Okay. I will put together some documentation including this. I have some
>> patches that implement some of the stuff. Your patchset also does some
>> of that (considering you adjust a thing or two). Lets make this right. 
>
>Yeah, I feel like I'm again getting further from clarity on what you're
>trying to achieve.

:)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-13 16:03                                     ` Samudrala, Sridhar
@ 2019-03-13 16:24                                       ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-13 16:24 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: Jakub Kicinski, davem, netdev, oss-drivers

Wed, Mar 13, 2019 at 05:03:17PM CET, sridhar.samudrala@intel.com wrote:
>On 3/13/2019 12:37 AM, Jiri Pirko wrote:
>> Wed, Mar 13, 2019 at 07:17:04AM CET, sridhar.samudrala@intel.com wrote:
>> > 
>> > On 3/12/2019 7:02 AM, Jiri Pirko wrote:
>> > 
>> > > 
>> > > > 
>> > > > > I originally planned to implement sriov orchestration api in devlink too.
>> > > > 
>> > > > Interesting, would you mind elaborating?
>> > > 
>> > > I have to think about it. But something like this:
>> > > 
>> > > After bootup, you see only physical port, PF switch port and PF host leg.
>> > 
>> > Is this after changing the eswitch mode to 'switchdev'
>> 
>> I believe so. For new drivers, this should be default and only option.
>> 
>> 
>> > 
>> > > $ devlink port show
>> > > pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2
>> > 
>> > Is this the uplink port representor?
>> 
>> Yes
>> 
>> 
>> > 
>> > > pci/0000:05:00.0/1: type eth netdev ??? flavour pci_pf_host
>> > >                       peer pci/0000:05:00.0/10000
>> > 
>> > I guess this is PF netdev
>> 
>> Yes, port
>> 
>> 
>> > 
>> > > pci/0000:05:00.0/10000: type eth netdev enp5s0npf0pf0s0 flavour pci_pf pf 0 subport 0
>> > >                       switch_id 00154d130d2f peer pci/0000:05:00.0/1
>> > 
>> > and this one is PF port representor netdev
>> 
>> Yes, port
>> 
>> 
>> > 
>> > > 
>> > > To create new PF subport under PF 0:
>> > > $ devlink dev port add pci/0000:05:00.0 flavour pci_pf pf 0
>> > 
>> > Can we consider l2-fwd offload macvlan device also as a subport of PF?
>> 
>> What does this have to with with macvlan? Macvlan is a separate soft
>> driver.
>
>ethtool -k <pf> l2-fwd-offload on
>ip link add link <pf> type macvlan
>
>will create a macvlan netdev but it is backed by a set of separate HW queues
>and switching is offloaded to HW. This can be considered as a subport. In
>i40e, it is a VMDq VSI.

Oh, this one. I think that is abuse of macvlan. We should do the
modelling correctly, including visibility of switch ports.

>
>
>
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-13 16:22                                       ` Jiri Pirko
@ 2019-03-13 16:55                                         ` Jakub Kicinski
  2019-03-14  7:38                                           ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-13 16:55 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Wed, 13 Mar 2019 17:22:43 +0100, Jiri Pirko wrote:
> Wed, Mar 13, 2019 at 05:17:31PM CET, jakub.kicinski@netronome.com wrote:
> >On Wed, 13 Mar 2019 07:07:01 +0100, Jiri Pirko wrote:  
> >> Tue, Mar 12, 2019 at 09:56:28PM CET, jakub.kicinski@netronome.com wrote:  
> >> >On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:    
> >> >> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:    
> >> >> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:      
> >> >> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:      
> >> >> >> >If the switchport is in the hypervisor then only the hypervisor can
> >> >> >> >control switching/forwarding, correct?        
> >> >> >> 
> >> >> >> Correct.
> >> >> >>       
> >> >> >> >The primary use case for partitioning within a VM (of a VF) would be
> >> >> >> >containers (and DPDK)?        
> >> >> >> 
> >> >> >> Makes sense.
> >> >> >>       
> >> >> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
> >> >> >> >I'm trying to get a sense of is how would we control an SR-IOV
> >> >> >> >environment as a whole.        
> >> >> >> 
> >> >> >> You mean orchestration?       
> >> >> >
> >> >> >Right, orchestration.
> >> >> >
> >> >> >To be clear on where I'm going with this - if we want to allow VFs 
> >> >> >to partition themselves then they have to control what is effectively 
> >> >> >a "nested" switch.  A per-VF set of rules which would the get      
> >> >> 
> >> >> Wait. If you allow to make VF subports (I believe that is what you ment
> >> >> by VFs partition themselves), that does not mean they will have a
> >> >> separate nested switch. They would still belong under the same one.    
> >> >
> >> >But that existing switch is administered by the hypervisor, how would
> >> >the VF owners install forwarding rules in a switch they don't control?    
> >> 
> >> They won't.  
> >
> >Argh.  So how is forwarding configured if there are no rules?  Are you
> >going to assume its switching on MACs?  We're supposed to offload
> >software constructs.  If its a software port it needs to be explicitly
> >switched.  If it's not explicitly switched - we already have macvlan
> >offload.  
> 
> Wait a second. You configure the switch. And for that, you have the
> switchports (representors). What we are talking about are VF (VF
> subport) host legs. Am I missing something?

Hm :)  So when VM gets a new port, how is it connected?  Are we
assuming all ports of a VM are plugged into one big L2 switch?
The use case for those sub ports is a little murky, sorry about
the endless confusion :)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-13 16:55                                         ` Jakub Kicinski
@ 2019-03-14  7:38                                           ` Jiri Pirko
  2019-03-14 22:09                                             ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-14  7:38 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Wed, Mar 13, 2019 at 05:55:55PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 13 Mar 2019 17:22:43 +0100, Jiri Pirko wrote:
>> Wed, Mar 13, 2019 at 05:17:31PM CET, jakub.kicinski@netronome.com wrote:
>> >On Wed, 13 Mar 2019 07:07:01 +0100, Jiri Pirko wrote:  
>> >> Tue, Mar 12, 2019 at 09:56:28PM CET, jakub.kicinski@netronome.com wrote:  
>> >> >On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:    
>> >> >> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:    
>> >> >> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:      
>> >> >> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:      
>> >> >> >> >If the switchport is in the hypervisor then only the hypervisor can
>> >> >> >> >control switching/forwarding, correct?        
>> >> >> >> 
>> >> >> >> Correct.
>> >> >> >>       
>> >> >> >> >The primary use case for partitioning within a VM (of a VF) would be
>> >> >> >> >containers (and DPDK)?        
>> >> >> >> 
>> >> >> >> Makes sense.
>> >> >> >>       
>> >> >> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
>> >> >> >> >I'm trying to get a sense of is how would we control an SR-IOV
>> >> >> >> >environment as a whole.        
>> >> >> >> 
>> >> >> >> You mean orchestration?       
>> >> >> >
>> >> >> >Right, orchestration.
>> >> >> >
>> >> >> >To be clear on where I'm going with this - if we want to allow VFs 
>> >> >> >to partition themselves then they have to control what is effectively 
>> >> >> >a "nested" switch.  A per-VF set of rules which would the get      
>> >> >> 
>> >> >> Wait. If you allow to make VF subports (I believe that is what you ment
>> >> >> by VFs partition themselves), that does not mean they will have a
>> >> >> separate nested switch. They would still belong under the same one.    
>> >> >
>> >> >But that existing switch is administered by the hypervisor, how would
>> >> >the VF owners install forwarding rules in a switch they don't control?    
>> >> 
>> >> They won't.  
>> >
>> >Argh.  So how is forwarding configured if there are no rules?  Are you
>> >going to assume its switching on MACs?  We're supposed to offload
>> >software constructs.  If its a software port it needs to be explicitly
>> >switched.  If it's not explicitly switched - we already have macvlan
>> >offload.  
>> 
>> Wait a second. You configure the switch. And for that, you have the
>> switchports (representors). What we are talking about are VF (VF
>> subport) host legs. Am I missing something?
>
>Hm :)  So when VM gets a new port, how is it connected?  Are we
>assuming all ports of a VM are plugged into one big L2 switch?
>The use case for those sub ports is a little murky, sorry about
>the endless confusion :)

Np. When user John (on baremetal, or whenever the devlink instance
with switch port is) creates VF of VF subport by: 
$ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0
or by:
$ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 vf 0

Then instances of flavour pci_vf are going to appear in the same devlink
instance. Those are the switch ports:
pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
                        flavour pci_vf pf 0 vf 0
                        switch_id 00154d130d2f peer pci/0000:05:10.1/0    
pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
                        flavour pci_vf pf 0 vf 0 subport 1
                        switch_id 00154d130d2f peer pci/0000:05:10.1/1

With that, peers are going to appear too, and those are the actual VF/VF
subport:
pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
                    peer pci/0000:05:00.0/10002
pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
                    peer pci/0000:05:00.0/10003

Later you can push this VF along with all subports to VM. So in VM, you
are going to see the VF like this:
$ devlink dev
pci/0000:00:08.0
$ devlink port
pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host

And back to your question of how are they connected in eswitch.
That is totally up to the original user John who did the creation.
He is in charge of the eswitch on baremetal, he would configure
the forwarding however he likes.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-14  7:38                                           ` Jiri Pirko
@ 2019-03-14 22:09                                             ` Jakub Kicinski
  2019-03-14 22:35                                               ` Parav Pandit
  2019-03-15  7:00                                               ` Jiri Pirko
  0 siblings, 2 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-14 22:09 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, netdev, oss-drivers

On Thu, 14 Mar 2019 08:38:40 +0100, Jiri Pirko wrote:
> Wed, Mar 13, 2019 at 05:55:55PM CET, jakub.kicinski@netronome.com wrote:
> >On Wed, 13 Mar 2019 17:22:43 +0100, Jiri Pirko wrote:  
> >> Wed, Mar 13, 2019 at 05:17:31PM CET, jakub.kicinski@netronome.com wrote:  
> >> >On Wed, 13 Mar 2019 07:07:01 +0100, Jiri Pirko wrote:    
> >> >> Tue, Mar 12, 2019 at 09:56:28PM CET, jakub.kicinski@netronome.com wrote:    
> >> >> >On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:      
> >> >> >> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:      
> >> >> >> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:        
> >> >> >> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:        
> >> >> >> >> >If the switchport is in the hypervisor then only the hypervisor can
> >> >> >> >> >control switching/forwarding, correct?          
> >> >> >> >> 
> >> >> >> >> Correct.
> >> >> >> >>         
> >> >> >> >> >The primary use case for partitioning within a VM (of a VF) would be
> >> >> >> >> >containers (and DPDK)?          
> >> >> >> >> 
> >> >> >> >> Makes sense.
> >> >> >> >>         
> >> >> >> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
> >> >> >> >> >I'm trying to get a sense of is how would we control an SR-IOV
> >> >> >> >> >environment as a whole.          
> >> >> >> >> 
> >> >> >> >> You mean orchestration?         
> >> >> >> >
> >> >> >> >Right, orchestration.
> >> >> >> >
> >> >> >> >To be clear on where I'm going with this - if we want to allow VFs 
> >> >> >> >to partition themselves then they have to control what is effectively 
> >> >> >> >a "nested" switch.  A per-VF set of rules which would the get        
> >> >> >> 
> >> >> >> Wait. If you allow to make VF subports (I believe that is what you ment
> >> >> >> by VFs partition themselves), that does not mean they will have a
> >> >> >> separate nested switch. They would still belong under the same one.      
> >> >> >
> >> >> >But that existing switch is administered by the hypervisor, how would
> >> >> >the VF owners install forwarding rules in a switch they don't control?      
> >> >> 
> >> >> They won't.    
> >> >
> >> >Argh.  So how is forwarding configured if there are no rules?  Are you
> >> >going to assume its switching on MACs?  We're supposed to offload
> >> >software constructs.  If its a software port it needs to be explicitly
> >> >switched.  If it's not explicitly switched - we already have macvlan
> >> >offload.    
> >> 
> >> Wait a second. You configure the switch. And for that, you have the
> >> switchports (representors). What we are talking about are VF (VF
> >> subport) host legs. Am I missing something?  
> >
> >Hm :)  So when VM gets a new port, how is it connected?  Are we
> >assuming all ports of a VM are plugged into one big L2 switch?
> >The use case for those sub ports is a little murky, sorry about
> >the endless confusion :)  
> 
> Np. When user John (on baremetal, or whenever the devlink instance
> with switch port is) creates VF of VF subport by: 
> $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0
> or by:
> $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 vf 0
> 
> Then instances of flavour pci_vf are going to appear in the same devlink
> instance. Those are the switch ports:
> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
>                         flavour pci_vf pf 0 vf 0
>                         switch_id 00154d130d2f peer pci/0000:05:10.1/0    
> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
>                         flavour pci_vf pf 0 vf 0 subport 1
>                         switch_id 00154d130d2f peer pci/0000:05:10.1/1
> 
> With that, peers are going to appear too, and those are the actual VF/VF
> subport:
> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
>                     peer pci/0000:05:00.0/10002
> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
>                     peer pci/0000:05:00.0/10003
> 
> Later you can push this VF along with all subports to VM. So in VM, you
> are going to see the VF like this:
> $ devlink dev
> pci/0000:00:08.0
> $ devlink port
> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
> 
> And back to your question of how are they connected in eswitch.
> That is totally up to the original user John who did the creation.
> He is in charge of the eswitch on baremetal, he would configure
> the forwarding however he likes.

Ack, so I think you're saying VM has to communicate to the cloud
environment to have this provisioned using some service API, not 
a kernel API.  That's what I wanted to confirm.

I don't see any benefit to having the "host ports" under devlink,
as such I think it's a matter of preference.  I'll try to describe 
the two options to Netronome's FAEs and see which one they find more
intuitive.

Makes sense?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-14 22:09                                             ` Jakub Kicinski
@ 2019-03-14 22:35                                               ` Parav Pandit
  2019-03-14 23:39                                                 ` Jakub Kicinski
  2019-03-15  7:00                                               ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-14 22:35 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: davem, netdev, oss-drivers

Hi Jakub,

> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jakub Kicinski
> Sent: Thursday, March 14, 2019 5:10 PM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; oss-
> drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Thu, 14 Mar 2019 08:38:40 +0100, Jiri Pirko wrote:
> > Wed, Mar 13, 2019 at 05:55:55PM CET, jakub.kicinski@netronome.com
> wrote:
> > >On Wed, 13 Mar 2019 17:22:43 +0100, Jiri Pirko wrote:
> > >> Wed, Mar 13, 2019 at 05:17:31PM CET, jakub.kicinski@netronome.com
> wrote:
> > >> >On Wed, 13 Mar 2019 07:07:01 +0100, Jiri Pirko wrote:
> > >> >> Tue, Mar 12, 2019 at 09:56:28PM CET,
> jakub.kicinski@netronome.com wrote:
> > >> >> >On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:
> > >> >> >> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:
> > >> >> >> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:
> > >> >> >> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:
> > >> >> >> >> >If the switchport is in the hypervisor then only the hypervisor
> can
> > >> >> >> >> >control switching/forwarding, correct?
> > >> >> >> >>
> > >> >> >> >> Correct.
> > >> >> >> >>
> > >> >> >> >> >The primary use case for partitioning within a VM (of a VF)
> would be
> > >> >> >> >> >containers (and DPDK)?
> > >> >> >> >>
> > >> >> >> >> Makes sense.
> > >> >> >> >>
> > >> >> >> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy
> to grasp.
> > >> >> >> >> >I'm trying to get a sense of is how would we control an SR-IOV
> > >> >> >> >> >environment as a whole.
> > >> >> >> >>
> > >> >> >> >> You mean orchestration?
> > >> >> >> >
> > >> >> >> >Right, orchestration.
> > >> >> >> >
> > >> >> >> >To be clear on where I'm going with this - if we want to
> > >> >> >> >allow VFs to partition themselves then they have to control what
> is effectively
> > >> >> >> >a "nested" switch.  A per-VF set of rules which would the get
> > >> >> >>
> > >> >> >> Wait. If you allow to make VF subports (I believe that is
> > >> >> >> what you ment by VFs partition themselves), that does not mean
> they will have a
> > >> >> >> separate nested switch. They would still belong under the same
> one.
> > >> >> >
> > >> >> >But that existing switch is administered by the hypervisor, how
> would
> > >> >> >the VF owners install forwarding rules in a switch they don't
> control?
> > >> >>
> > >> >> They won't.
> > >> >
> > >> >Argh.  So how is forwarding configured if there are no rules?  Are
> > >> >you going to assume its switching on MACs?  We're supposed to
> > >> >offload software constructs.  If its a software port it needs to
> > >> >be explicitly switched.  If it's not explicitly switched - we already have
> macvlan
> > >> >offload.
> > >>
> > >> Wait a second. You configure the switch. And for that, you have the
> > >> switchports (representors). What we are talking about are VF (VF
> > >> subport) host legs. Am I missing something?
> > >
> > >Hm :)  So when VM gets a new port, how is it connected?  Are we
> > >assuming all ports of a VM are plugged into one big L2 switch?
> > >The use case for those sub ports is a little murky, sorry about the
> > >endless confusion :)
> >
> > Np. When user John (on baremetal, or whenever the devlink instance
> > with switch port is) creates VF of VF subport by:
> > $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 or by:
> > $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 vf 0
> >
> > Then instances of flavour pci_vf are going to appear in the same
> > devlink instance. Those are the switch ports:
> > pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> >                         flavour pci_vf pf 0 vf 0
> >                         switch_id 00154d130d2f peer pci/0000:05:10.1/0
> > pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> >                         flavour pci_vf pf 0 vf 0 subport 1
> >                         switch_id 00154d130d2f peer pci/0000:05:10.1/1
> >
> > With that, peers are going to appear too, and those are the actual
> > VF/VF
> > subport:
> > pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> >                     peer pci/0000:05:00.0/10002
> > pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> >                     peer pci/0000:05:00.0/10003
> >
> > Later you can push this VF along with all subports to VM. So in VM,
> > you are going to see the VF like this:
> > $ devlink dev
> > pci/0000:00:08.0
> > $ devlink port
> > pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
> > pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
> >
> > And back to your question of how are they connected in eswitch.
> > That is totally up to the original user John who did the creation.
> > He is in charge of the eswitch on baremetal, he would configure the
> > forwarding however he likes.
> 
> Ack, so I think you're saying VM has to communicate to the cloud
> environment to have this provisioned using some service API, not a kernel
> API.  That's what I wanted to confirm.
> 
> I don't see any benefit to having the "host ports" under devlink, as such I
> think it's a matter of preference. 
We need 'host ports' to configure parameters of this 
host port which is not exposed by the rep-netdev.
Such as mac address.

> I'll try to describe the two options to
> Netronome's FAEs and see which one they find more intuitive.
> 
> Makes sense?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-14 22:35                                               ` Parav Pandit
@ 2019-03-14 23:39                                                 ` Jakub Kicinski
  2019-03-15  1:28                                                   ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-14 23:39 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jiri Pirko, davem, netdev, oss-drivers

On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
> > > Then instances of flavour pci_vf are going to appear in the same
> > > devlink instance. Those are the switch ports:
> > > pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> > >                         flavour pci_vf pf 0 vf 0
> > >                         switch_id 00154d130d2f peer pci/0000:05:10.1/0
> > > pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> > >                         flavour pci_vf pf 0 vf 0 subport 1
> > >                         switch_id 00154d130d2f peer pci/0000:05:10.1/1
> > >
> > > With that, peers are going to appear too, and those are the actual
> > > VF/VF
> > > subport:
> > > pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> > >                     peer pci/0000:05:00.0/10002
> > > pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> > >                     peer pci/0000:05:00.0/10003
> > >
> > > Later you can push this VF along with all subports to VM. So in VM,
> > > you are going to see the VF like this:
> > > $ devlink dev
> > > pci/0000:00:08.0
> > > $ devlink port
> > > pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
> > > pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
> > >
> > > And back to your question of how are they connected in eswitch.
> > > That is totally up to the original user John who did the creation.
> > > He is in charge of the eswitch on baremetal, he would configure the
> > > forwarding however he likes.  
> > 
> > Ack, so I think you're saying VM has to communicate to the cloud
> > environment to have this provisioned using some service API, not a kernel
> > API.  That's what I wanted to confirm.
> > 
> > I don't see any benefit to having the "host ports" under devlink, as such I
> > think it's a matter of preference.   
>
> We need 'host ports' to configure parameters of this 
> host port which is not exposed by the rep-netdev.
> Such as mac address.

Please look at the quote of what Jiri wrote above - the host port gets
passed to the VM, you can't use it as a handle to set the MAC.

The way to set the MAC remains:

# devlink port set pci/0000:05:00.0/10002 peer mac_addr 00:11:22:33:44:55

(using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-14 23:39                                                 ` Jakub Kicinski
@ 2019-03-15  1:28                                                   ` Parav Pandit
  2019-03-15  1:31                                                     ` Parav Pandit
  2019-03-15  2:15                                                     ` Samudrala, Sridhar
  0 siblings, 2 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-15  1:28 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Thursday, March 14, 2019 6:39 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
> > > > Then instances of flavour pci_vf are going to appear in the same
> > > > devlink instance. Those are the switch ports:
> > > > pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> > > >                         flavour pci_vf pf 0 vf 0
> > > >                         switch_id 00154d130d2f peer
> > > > pci/0000:05:10.1/0
> > > > pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> > > >                         flavour pci_vf pf 0 vf 0 subport 1
> > > >                         switch_id 00154d130d2f peer
> > > > pci/0000:05:10.1/1
> > > >
> > > > With that, peers are going to appear too, and those are the actual
> > > > VF/VF
> > > > subport:
> > > > pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> > > >                     peer pci/0000:05:00.0/10002
> > > > pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> > > >                     peer pci/0000:05:00.0/10003
> > > >
> > > > Later you can push this VF along with all subports to VM. So in
> > > > VM, you are going to see the VF like this:
> > > > $ devlink dev
> > > > pci/0000:00:08.0
> > > > $ devlink port
> > > > pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
> > > > pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
> > > >
> > > > And back to your question of how are they connected in eswitch.
> > > > That is totally up to the original user John who did the creation.
> > > > He is in charge of the eswitch on baremetal, he would configure
> > > > the forwarding however he likes.
> > >
> > > Ack, so I think you're saying VM has to communicate to the cloud
> > > environment to have this provisioned using some service API, not a
> > > kernel API.  That's what I wanted to confirm.
> > >
> > > I don't see any benefit to having the "host ports" under devlink, as such I
> > > think it's a matter of preference.
> >
> > We need 'host ports' to configure parameters of this host port which
> > is not exposed by the rep-netdev.
> > Such as mac address.
> 
> Please look at the quote of what Jiri wrote above - the host port gets passed
> to the VM, you can't use it as a handle to set the MAC.
> 
> The way to set the MAC remains:
> 
> # devlink port set pci/0000:05:00.0/10002 peer mac_addr 00:11:22:33:44:55
> 
Even though it can be done, I think this is wrong model to program hostport mac address using eswitch port.
All devlink objects are control objects, so what is passed to VM is what is represented by devlink.
VF in the VM will anyway create its devlink object.
What is wrong in programming hostport?
It gives a very clear view to users of topology and objects.

Also eswitch is flat. There is no need of pf/vf flavour for port.
It doesn't make sense to define 'mdev' flavour which we are already working.
At eswitch level it is just a port, it happen to be connected to vf or pf or other objects, it doesn't matter.
Port should be flavoured as 'hostport' or 'switchport'.


> (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15  1:28                                                   ` Parav Pandit
@ 2019-03-15  1:31                                                     ` Parav Pandit
  2019-03-15  2:15                                                     ` Samudrala, Sridhar
  1 sibling, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-15  1:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, davem, netdev, oss-drivers



> -----Original Message-----
> From: Parav Pandit
> Sent: Thursday, March 14, 2019 8:29 PM
> To: 'Jakub Kicinski' <jakub.kicinski@netronome.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> 
> 
> > -----Original Message-----
> > From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Sent: Thursday, March 14, 2019 6:39 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> > netdev@vger.kernel.org; oss-drivers@netronome.com
> > Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > devlink PCI ports
> >
> > On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
> > > > > Then instances of flavour pci_vf are going to appear in the same
> > > > > devlink instance. Those are the switch ports:
> > > > > pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> > > > >                         flavour pci_vf pf 0 vf 0
> > > > >                         switch_id 00154d130d2f peer
> > > > > pci/0000:05:10.1/0
> > > > > pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> > > > >                         flavour pci_vf pf 0 vf 0 subport 1
> > > > >                         switch_id 00154d130d2f peer
> > > > > pci/0000:05:10.1/1
> > > > >
> > > > > With that, peers are going to appear too, and those are the
> > > > > actual VF/VF
> > > > > subport:
> > > > > pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> > > > >                     peer pci/0000:05:00.0/10002
> > > > > pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> > > > >                     peer pci/0000:05:00.0/10003
> > > > >
> > > > > Later you can push this VF along with all subports to VM. So in
> > > > > VM, you are going to see the VF like this:
> > > > > $ devlink dev
> > > > > pci/0000:00:08.0
> > > > > $ devlink port
> > > > > pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
> > > > > pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
> > > > >
> > > > > And back to your question of how are they connected in eswitch.
> > > > > That is totally up to the original user John who did the creation.
> > > > > He is in charge of the eswitch on baremetal, he would configure
> > > > > the forwarding however he likes.
> > > >
> > > > Ack, so I think you're saying VM has to communicate to the cloud
> > > > environment to have this provisioned using some service API, not a
> > > > kernel API.  That's what I wanted to confirm.
> > > >
> > > > I don't see any benefit to having the "host ports" under devlink,
> > > > as such I think it's a matter of preference.
> > >
> > > We need 'host ports' to configure parameters of this host port which
> > > is not exposed by the rep-netdev.
> > > Such as mac address.
> >
> > Please look at the quote of what Jiri wrote above - the host port gets
> > passed to the VM, you can't use it as a handle to set the MAC.
> >
> > The way to set the MAC remains:
> >
> > # devlink port set pci/0000:05:00.0/10002 peer mac_addr
> > 00:11:22:33:44:55
> >
> Even though it can be done, I think this is wrong model to program hostport
> mac address using eswitch port.
> All devlink objects are control objects, so what is passed to VM is what is
> represented by devlink.
> VF in the VM will anyway create its devlink object.
> What is wrong in programming hostport?
> It gives a very clear view to users of topology and objects.
> 

This also make sense for rdma where there are only hostports, and it doesn't have any peer switch ports.
User needs to configure node guid and port_guid.
Programming port_guid for 'hostport' also aligns for this requirement.
We better not program hostport params using this convoluted indirect way.

> Also eswitch is flat. There is no need of pf/vf flavour for port.
> It doesn't make sense to define 'mdev' flavour which we are already working.
> At eswitch level it is just a port, it happen to be connected to vf or pf or
> other objects, it doesn't matter.
> Port should be flavoured as 'hostport' or 'switchport'.
> 
> 
> > (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15  1:28                                                   ` Parav Pandit
  2019-03-15  1:31                                                     ` Parav Pandit
@ 2019-03-15  2:15                                                     ` Samudrala, Sridhar
  2019-03-15  2:40                                                       ` Parav Pandit
  1 sibling, 1 reply; 100+ messages in thread
From: Samudrala, Sridhar @ 2019-03-15  2:15 UTC (permalink / raw)
  To: Parav Pandit, Jakub Kicinski; +Cc: Jiri Pirko, davem, netdev, oss-drivers



On 3/14/2019 6:28 PM, Parav Pandit wrote:
> 
> 
>> -----Original Message-----
>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Sent: Thursday, March 14, 2019 6:39 PM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>>
>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
>>>>> Then instances of flavour pci_vf are going to appear in the same
>>>>> devlink instance. Those are the switch ports:
>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
>>>>>                          flavour pci_vf pf 0 vf 0
>>>>>                          switch_id 00154d130d2f peer
>>>>> pci/0000:05:10.1/0
>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
>>>>>                          flavour pci_vf pf 0 vf 0 subport 1
>>>>>                          switch_id 00154d130d2f peer
>>>>> pci/0000:05:10.1/1
>>>>>
>>>>> With that, peers are going to appear too, and those are the actual
>>>>> VF/VF
>>>>> subport:
>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
>>>>>                      peer pci/0000:05:00.0/10002
>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
>>>>>                      peer pci/0000:05:00.0/10003
>>>>>
>>>>> Later you can push this VF along with all subports to VM. So in
>>>>> VM, you are going to see the VF like this:
>>>>> $ devlink dev
>>>>> pci/0000:00:08.0
>>>>> $ devlink port
>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
>>>>>
>>>>> And back to your question of how are they connected in eswitch.
>>>>> That is totally up to the original user John who did the creation.
>>>>> He is in charge of the eswitch on baremetal, he would configure
>>>>> the forwarding however he likes.
>>>>
>>>> Ack, so I think you're saying VM has to communicate to the cloud
>>>> environment to have this provisioned using some service API, not a
>>>> kernel API.  That's what I wanted to confirm.
>>>>
>>>> I don't see any benefit to having the "host ports" under devlink, as such I
>>>> think it's a matter of preference.
>>>
>>> We need 'host ports' to configure parameters of this host port which
>>> is not exposed by the rep-netdev.
>>> Such as mac address.
>>
>> Please look at the quote of what Jiri wrote above - the host port gets passed
>> to the VM, you can't use it as a handle to set the MAC.
>>
>> The way to set the MAC remains:
>>
>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr 00:11:22:33:44:55
>>
> Even though it can be done, I think this is wrong model to program hostport mac address using eswitch port.
> All devlink objects are control objects, so what is passed to VM is what is represented by devlink.
> VF in the VM will anyway create its devlink object.
> What is wrong in programming hostport?
> It gives a very clear view to users of topology and objects.

The VF or any subport MAC address should be configured by the 
orchestration layer that is running on the hypervisor and when a VF is 
assigned to a VF, the host port is not visible to the hypervisor.
Currently we have ndo_set_vf_mac_addr api that works with PF netdev, but 
i think we are trying to move away from that API and do all the 
configuration via the port representor netdevs. As the mac address 
cannot be configured using this netdev, i think Jakub is suggesting 
creating a devlink opject for each port representor and use that 
interface to set peer mac address. We should be able use this to 
configure port vlan too.

Also, instead of subport, can we call vport and support different types 
of vports - sr-iov, siov, vmdq etc.

> 
> Also eswitch is flat. There is no need of pf/vf flavour for port.
> It doesn't make sense to define 'mdev' flavour which we are already working.
> At eswitch level it is just a port, it happen to be connected to vf or pf or other objects, it doesn't matter.
> Port should be flavoured as 'hostport' or 'switchport'.
> 
> 
>> (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15  2:15                                                     ` Samudrala, Sridhar
@ 2019-03-15  2:40                                                       ` Parav Pandit
       [not found]                                                         ` <ae938b4f-5fa9-3c33-8ae6-eab2d3d9f1ec@intel.com>
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-15  2:40 UTC (permalink / raw)
  To: Samudrala, Sridhar, Jakub Kicinski; +Cc: Jiri Pirko, davem, netdev, oss-drivers



> -----Original Message-----
> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> Sent: Thursday, March 14, 2019 9:16 PM
> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> <jakub.kicinski@netronome.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> 
> 
> On 3/14/2019 6:28 PM, Parav Pandit wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> Sent: Thursday, March 14, 2019 6:39 PM
> >> To: Parav Pandit <parav@mellanox.com>
> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
> >>>>> Then instances of flavour pci_vf are going to appear in the same
> >>>>> devlink instance. Those are the switch ports:
> >>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> >>>>>                          flavour pci_vf pf 0 vf 0
> >>>>>                          switch_id 00154d130d2f peer
> >>>>> pci/0000:05:10.1/0
> >>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> >>>>>                          flavour pci_vf pf 0 vf 0 subport 1
> >>>>>                          switch_id 00154d130d2f peer
> >>>>> pci/0000:05:10.1/1
> >>>>>
> >>>>> With that, peers are going to appear too, and those are the actual
> >>>>> VF/VF
> >>>>> subport:
> >>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> >>>>>                      peer pci/0000:05:00.0/10002
> >>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> >>>>>                      peer pci/0000:05:00.0/10003
> >>>>>
> >>>>> Later you can push this VF along with all subports to VM. So in
> >>>>> VM, you are going to see the VF like this:
> >>>>> $ devlink dev
> >>>>> pci/0000:00:08.0
> >>>>> $ devlink port
> >>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
> >>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
> >>>>>
> >>>>> And back to your question of how are they connected in eswitch.
> >>>>> That is totally up to the original user John who did the creation.
> >>>>> He is in charge of the eswitch on baremetal, he would configure
> >>>>> the forwarding however he likes.
> >>>>
> >>>> Ack, so I think you're saying VM has to communicate to the cloud
> >>>> environment to have this provisioned using some service API, not a
> >>>> kernel API.  That's what I wanted to confirm.
> >>>>
> >>>> I don't see any benefit to having the "host ports" under devlink,
> >>>> as such I think it's a matter of preference.
> >>>
> >>> We need 'host ports' to configure parameters of this host port which
> >>> is not exposed by the rep-netdev.
> >>> Such as mac address.
> >>
> >> Please look at the quote of what Jiri wrote above - the host port
> >> gets passed to the VM, you can't use it as a handle to set the MAC.
> >>
> >> The way to set the MAC remains:
> >>
> >> # devlink port set pci/0000:05:00.0/10002 peer mac_addr
> >> 00:11:22:33:44:55
> >>
> > Even though it can be done, I think this is wrong model to program
> hostport mac address using eswitch port.
> > All devlink objects are control objects, so what is passed to VM is what is
> represented by devlink.
> > VF in the VM will anyway create its devlink object.
> > What is wrong in programming hostport?
> > It gives a very clear view to users of topology and objects.
> 
> The VF or any subport MAC address should be configured by the
> orchestration layer that is running on the hypervisor and when a VF is
> assigned to a VF, the host port is not visible to the hypervisor.
What prevents  creation of hostport due to which is not visible?
Hostport is control port to program host side of parameters.
It should be created when user wants to program the parameters.

Model is really straight forward.
Program host port params using hostport object.
Program switchport params using rep-netdev.

> Currently we have ndo_set_vf_mac_addr api that works with PF netdev, but i
> think we are trying to move away from that API and do all the configuration
> via the port representor netdevs.
This is fine rep-netdev represents eswitch port.
You normally don't go to switch to program host port params.

> As the mac address cannot be configured
> using this netdev, i think Jakub is suggesting creating a devlink opject for
> each port representor and use that interface to set peer mac address. 

I understand but is convoluted interface.
When you program host NIC mac address you talk to iLo or BIOS.
When you program switch side mac address, you go switch/router/modem.

Also programming host params on host side, also doesn't make assumption that its connected to eswitch.
It also doesn't assume that same connectivity for its life.

If you model around how physical devices are configured, it will almost never go wrong and still provides same level of flexibility.

> We should be able use this to configure port vlan too.
> 
> Also, instead of subport, can we call vport and support different types of
> vports - sr-iov, siov, vmdq etc.
> 
At switch level there are just ports.
sriov, siov, mdev, vmdq are their couter part (peer) where it is connected.

> >
> > Also eswitch is flat. There is no need of pf/vf flavour for port.
> > It doesn't make sense to define 'mdev' flavour which we are already
> working.
> > At eswitch level it is just a port, it happen to be connected to vf or pf or
> other objects, it doesn't matter.
> > Port should be flavoured as 'hostport' or 'switchport'.
> >
> >
> >> (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-14 22:09                                             ` Jakub Kicinski
  2019-03-14 22:35                                               ` Parav Pandit
@ 2019-03-15  7:00                                               ` Jiri Pirko
  1 sibling, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-15  7:00 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Thu, Mar 14, 2019 at 11:09:45PM CET, jakub.kicinski@netronome.com wrote:
>On Thu, 14 Mar 2019 08:38:40 +0100, Jiri Pirko wrote:
>> Wed, Mar 13, 2019 at 05:55:55PM CET, jakub.kicinski@netronome.com wrote:
>> >On Wed, 13 Mar 2019 17:22:43 +0100, Jiri Pirko wrote:  
>> >> Wed, Mar 13, 2019 at 05:17:31PM CET, jakub.kicinski@netronome.com wrote:  
>> >> >On Wed, 13 Mar 2019 07:07:01 +0100, Jiri Pirko wrote:    
>> >> >> Tue, Mar 12, 2019 at 09:56:28PM CET, jakub.kicinski@netronome.com wrote:    
>> >> >> >On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:      
>> >> >> >> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:      
>> >> >> >> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:        
>> >> >> >> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:        
>> >> >> >> >> >If the switchport is in the hypervisor then only the hypervisor can
>> >> >> >> >> >control switching/forwarding, correct?          
>> >> >> >> >> 
>> >> >> >> >> Correct.
>> >> >> >> >>         
>> >> >> >> >> >The primary use case for partitioning within a VM (of a VF) would be
>> >> >> >> >> >containers (and DPDK)?          
>> >> >> >> >> 
>> >> >> >> >> Makes sense.
>> >> >> >> >>         
>> >> >> >> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
>> >> >> >> >> >I'm trying to get a sense of is how would we control an SR-IOV
>> >> >> >> >> >environment as a whole.          
>> >> >> >> >> 
>> >> >> >> >> You mean orchestration?         
>> >> >> >> >
>> >> >> >> >Right, orchestration.
>> >> >> >> >
>> >> >> >> >To be clear on where I'm going with this - if we want to allow VFs 
>> >> >> >> >to partition themselves then they have to control what is effectively 
>> >> >> >> >a "nested" switch.  A per-VF set of rules which would the get        
>> >> >> >> 
>> >> >> >> Wait. If you allow to make VF subports (I believe that is what you ment
>> >> >> >> by VFs partition themselves), that does not mean they will have a
>> >> >> >> separate nested switch. They would still belong under the same one.      
>> >> >> >
>> >> >> >But that existing switch is administered by the hypervisor, how would
>> >> >> >the VF owners install forwarding rules in a switch they don't control?      
>> >> >> 
>> >> >> They won't.    
>> >> >
>> >> >Argh.  So how is forwarding configured if there are no rules?  Are you
>> >> >going to assume its switching on MACs?  We're supposed to offload
>> >> >software constructs.  If its a software port it needs to be explicitly
>> >> >switched.  If it's not explicitly switched - we already have macvlan
>> >> >offload.    
>> >> 
>> >> Wait a second. You configure the switch. And for that, you have the
>> >> switchports (representors). What we are talking about are VF (VF
>> >> subport) host legs. Am I missing something?  
>> >
>> >Hm :)  So when VM gets a new port, how is it connected?  Are we
>> >assuming all ports of a VM are plugged into one big L2 switch?
>> >The use case for those sub ports is a little murky, sorry about
>> >the endless confusion :)  
>> 
>> Np. When user John (on baremetal, or whenever the devlink instance
>> with switch port is) creates VF of VF subport by: 
>> $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0
>> or by:
>> $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 vf 0
>> 
>> Then instances of flavour pci_vf are going to appear in the same devlink
>> instance. Those are the switch ports:
>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
>>                         flavour pci_vf pf 0 vf 0
>>                         switch_id 00154d130d2f peer pci/0000:05:10.1/0    
>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
>>                         flavour pci_vf pf 0 vf 0 subport 1
>>                         switch_id 00154d130d2f peer pci/0000:05:10.1/1
>> 
>> With that, peers are going to appear too, and those are the actual VF/VF
>> subport:
>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
>>                     peer pci/0000:05:00.0/10002
>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
>>                     peer pci/0000:05:00.0/10003
>> 
>> Later you can push this VF along with all subports to VM. So in VM, you
>> are going to see the VF like this:
>> $ devlink dev
>> pci/0000:00:08.0
>> $ devlink port
>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
>> 
>> And back to your question of how are they connected in eswitch.
>> That is totally up to the original user John who did the creation.
>> He is in charge of the eswitch on baremetal, he would configure
>> the forwarding however he likes.
>
>Ack, so I think you're saying VM has to communicate to the cloud
>environment to have this provisioned using some service API, not 
>a kernel API.  That's what I wanted to confirm.

Okay.

>
>I don't see any benefit to having the "host ports" under devlink,
>as such I think it's a matter of preference.  I'll try to describe 
>the two options to Netronome's FAEs and see which one they find more
>intuitive.

Yeah, the "host ports" are probably not a must. I just like to have them
for visibility purposes. No big deal to implement them.

>
>Makes sense?

Okay. Thanks!

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
       [not found]                                                         ` <ae938b4f-5fa9-3c33-8ae6-eab2d3d9f1ec@intel.com>
@ 2019-03-15 15:32                                                           ` Parav Pandit
  2019-03-15 20:08                                                             ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-15 15:32 UTC (permalink / raw)
  To: Samudrala, Sridhar, Jakub Kicinski; +Cc: Jiri Pirko, davem, netdev, oss-drivers



> -----Original Message-----
> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> Sent: Friday, March 15, 2019 12:58 AM
> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> <jakub.kicinski@netronome.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> 
> On 3/14/2019 7:40 PM, Parav Pandit wrote:
> >
> >
> >> -----Original Message-----
> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> >> Sent: Thursday, March 14, 2019 9:16 PM
> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> >> <jakub.kicinski@netronome.com>
> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >>
> >>
> >> On 3/14/2019 6:28 PM, Parav Pandit wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> >>>> Sent: Thursday, March 14, 2019 6:39 PM
> >>>> To: Parav Pandit <parav@mellanox.com>
> >>>> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >>>> netdev@vger.kernel.org; oss-drivers@netronome.com
> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >>>> devlink PCI ports
> >>>>
> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
> >>>>>>> Then instances of flavour pci_vf are going to appear in the same
> >>>>>>> devlink instance. Those are the switch ports:
> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> >>>>>>>                           flavour pci_vf pf 0 vf 0
> >>>>>>>                           switch_id 00154d130d2f peer
> >>>>>>> pci/0000:05:10.1/0
> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> >>>>>>>                           flavour pci_vf pf 0 vf 0 subport 1
> >>>>>>>                           switch_id 00154d130d2f peer
> >>>>>>> pci/0000:05:10.1/1
> >>>>>>>
> >>>>>>> With that, peers are going to appear too, and those are the
> >>>>>>> actual VF/VF
> >>>>>>> subport:
> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> >>>>>>>                       peer pci/0000:05:00.0/10002
> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> >>>>>>>                       peer pci/0000:05:00.0/10003
> >>>>>>>
> >>>>>>> Later you can push this VF along with all subports to VM. So in
> >>>>>>> VM, you are going to see the VF like this:
> >>>>>>> $ devlink dev
> >>>>>>> pci/0000:00:08.0
> >>>>>>> $ devlink port
> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
> >>>>>>>
> >>>>>>> And back to your question of how are they connected in eswitch.
> >>>>>>> That is totally up to the original user John who did the creation.
> >>>>>>> He is in charge of the eswitch on baremetal, he would configure
> >>>>>>> the forwarding however he likes.
> >>>>>>
> >>>>>> Ack, so I think you're saying VM has to communicate to the cloud
> >>>>>> environment to have this provisioned using some service API, not
> >>>>>> a kernel API.  That's what I wanted to confirm.
> >>>>>>
> >>>>>> I don't see any benefit to having the "host ports" under devlink,
> >>>>>> as such I think it's a matter of preference.
> >>>>>
> >>>>> We need 'host ports' to configure parameters of this host port
> >>>>> which is not exposed by the rep-netdev.
> >>>>> Such as mac address.
> >>>>
> >>>> Please look at the quote of what Jiri wrote above - the host port
> >>>> gets passed to the VM, you can't use it as a handle to set the MAC.
> >>>>
> >>>> The way to set the MAC remains:
> >>>>
> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr
> >>>> 00:11:22:33:44:55
> >>>>
> >>> Even though it can be done, I think this is wrong model to program
> >> hostport mac address using eswitch port.
> >>> All devlink objects are control objects, so what is passed to VM is
> >>> what is
> >> represented by devlink.
> >>> VF in the VM will anyway create its devlink object.
> >>> What is wrong in programming hostport?
> >>> It gives a very clear view to users of topology and objects.
> >>
> >> The VF or any subport MAC address should be configured by the
> >> orchestration layer that is running on the hypervisor and when a VF
> >> is assigned to a VF, the host port is not visible to the hypervisor.
> > What prevents  creation of hostport due to which is not visible?
> > Hostport is control port to program host side of parameters.
> > It should be created when user wants to program the parameters.
> >
> > Model is really straight forward.
> > Program host port params using hostport object.
> > Program switchport params using rep-netdev.
> 
> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each port -
> host facing ports and switch facing ports. This is in addition to the netdevs
> that are created today.
> 
I am not proposing any different.
I am proposing only two changes.
1. control hostport params via referring hostport (not via indirect peer)
2. flavour should not be vf/pf, flavour should be hostport, switchport.
Because switch is flat and agnostic of pf/vf/mdev.

> Are you suggesting that all the devlink objects should be visible only at the
> hypervisor layer?
> 
Of course not.

Ports and params controlled by hypervisor should be exposed at hypervisor/eswitch wherever its parent devlink instance exist.
Ports which should be visible inside a VM should be exposed inside a VM.
So for a given VF,

If eswitch is at hypervisor level,
$ devlink port show
pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:10.1/0
pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002

where VF is enumerated,
$ devlink port show
pci/0000:05:10.1/0 eth netdev flavour hostport
This is because unprivileged VF doesn't have visibility to eswitch and its links.

> I think the terminology need to be defined clearly so that we are all on the
> same page.
> 
> >
> >> Currently we have ndo_set_vf_mac_addr api that works with PF netdev,
> >> but i think we are trying to move away from that API and do all the
> >> configuration via the port representor netdevs.
> > This is fine rep-netdev represents eswitch port.
> > You normally don't go to switch to program host port params.
> >
> >> As the mac address cannot be configured using this netdev, i think
> >> Jakub is suggesting creating a devlink opject for each port
> >> representor and use that interface to set peer mac address.
> >
> > I understand but is convoluted interface.
> > When you program host NIC mac address you talk to iLo or BIOS.
> > When you program switch side mac address, you go switch/router/modem.
> >
> > Also programming host params on host side, also doesn't make
> assumption that its connected to eswitch.
> > It also doesn't assume that same connectivity for its life.
> >
> > If you model around how physical devices are configured, it will almost
> never go wrong and still provides same level of flexibility.
> >
> >> We should be able use this to configure port vlan too.
> >>
> >> Also, instead of subport, can we call vport and support different
> >> types of vports - sr-iov, siov, vmdq etc.
> >>
> > At switch level there are just ports.
> > sriov, siov, mdev, vmdq are their couter part (peer) where it is connected.
> >
> >>>
> >>> Also eswitch is flat. There is no need of pf/vf flavour for port.
> >>> It doesn't make sense to define 'mdev' flavour which we are already
> >> working.
> >>> At eswitch level it is just a port, it happen to be connected to vf
> >>> or pf or
> >> other objects, it doesn't matter.
> >>> Port should be flavoured as 'hostport' or 'switchport'.
> >>>
> >>>
> >>>> (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15 15:32                                                           ` Parav Pandit
@ 2019-03-15 20:08                                                             ` Jiri Pirko
  2019-03-15 20:44                                                               ` Jakub Kicinski
  2019-03-15 21:59                                                               ` Parav Pandit
  0 siblings, 2 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-15 20:08 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Samudrala, Sridhar, Jakub Kicinski, davem, netdev, oss-drivers

Fri, Mar 15, 2019 at 04:32:24PM CET, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
>> Sent: Friday, March 15, 2019 12:58 AM
>> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
>> <jakub.kicinski@netronome.com>
>> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>> 
>> 
>> On 3/14/2019 7:40 PM, Parav Pandit wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
>> >> Sent: Thursday, March 14, 2019 9:16 PM
>> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
>> >> <jakub.kicinski@netronome.com>
>> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
>> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> devlink PCI ports
>> >>
>> >>
>> >>
>> >> On 3/14/2019 6:28 PM, Parav Pandit wrote:
>> >>>
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >>>> Sent: Thursday, March 14, 2019 6:39 PM
>> >>>> To: Parav Pandit <parav@mellanox.com>
>> >>>> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
>> >>>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >>>> devlink PCI ports
>> >>>>
>> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
>> >>>>>>> Then instances of flavour pci_vf are going to appear in the same
>> >>>>>>> devlink instance. Those are the switch ports:
>> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
>> >>>>>>>                           flavour pci_vf pf 0 vf 0
>> >>>>>>>                           switch_id 00154d130d2f peer
>> >>>>>>> pci/0000:05:10.1/0
>> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
>> >>>>>>>                           flavour pci_vf pf 0 vf 0 subport 1
>> >>>>>>>                           switch_id 00154d130d2f peer
>> >>>>>>> pci/0000:05:10.1/1
>> >>>>>>>
>> >>>>>>> With that, peers are going to appear too, and those are the
>> >>>>>>> actual VF/VF
>> >>>>>>> subport:
>> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
>> >>>>>>>                       peer pci/0000:05:00.0/10002
>> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
>> >>>>>>>                       peer pci/0000:05:00.0/10003
>> >>>>>>>
>> >>>>>>> Later you can push this VF along with all subports to VM. So in
>> >>>>>>> VM, you are going to see the VF like this:
>> >>>>>>> $ devlink dev
>> >>>>>>> pci/0000:00:08.0
>> >>>>>>> $ devlink port
>> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
>> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
>> >>>>>>>
>> >>>>>>> And back to your question of how are they connected in eswitch.
>> >>>>>>> That is totally up to the original user John who did the creation.
>> >>>>>>> He is in charge of the eswitch on baremetal, he would configure
>> >>>>>>> the forwarding however he likes.
>> >>>>>>
>> >>>>>> Ack, so I think you're saying VM has to communicate to the cloud
>> >>>>>> environment to have this provisioned using some service API, not
>> >>>>>> a kernel API.  That's what I wanted to confirm.
>> >>>>>>
>> >>>>>> I don't see any benefit to having the "host ports" under devlink,
>> >>>>>> as such I think it's a matter of preference.
>> >>>>>
>> >>>>> We need 'host ports' to configure parameters of this host port
>> >>>>> which is not exposed by the rep-netdev.
>> >>>>> Such as mac address.
>> >>>>
>> >>>> Please look at the quote of what Jiri wrote above - the host port
>> >>>> gets passed to the VM, you can't use it as a handle to set the MAC.
>> >>>>
>> >>>> The way to set the MAC remains:
>> >>>>
>> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr
>> >>>> 00:11:22:33:44:55
>> >>>>
>> >>> Even though it can be done, I think this is wrong model to program
>> >> hostport mac address using eswitch port.
>> >>> All devlink objects are control objects, so what is passed to VM is
>> >>> what is
>> >> represented by devlink.
>> >>> VF in the VM will anyway create its devlink object.
>> >>> What is wrong in programming hostport?
>> >>> It gives a very clear view to users of topology and objects.
>> >>
>> >> The VF or any subport MAC address should be configured by the
>> >> orchestration layer that is running on the hypervisor and when a VF
>> >> is assigned to a VF, the host port is not visible to the hypervisor.
>> > What prevents  creation of hostport due to which is not visible?
>> > Hostport is control port to program host side of parameters.
>> > It should be created when user wants to program the parameters.
>> >
>> > Model is really straight forward.
>> > Program host port params using hostport object.
>> > Program switchport params using rep-netdev.
>> 
>> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each port -
>> host facing ports and switch facing ports. This is in addition to the netdevs
>> that are created today.
>> 
>I am not proposing any different.
>I am proposing only two changes.
>1. control hostport params via referring hostport (not via indirect peer)

Not really possible. If you passthrough VF into VM, the hostport goes
along with it.


>2. flavour should not be vf/pf, flavour should be hostport, switchport.
>Because switch is flat and agnostic of pf/vf/mdev.

Not sure. It's good to have this kind of visibility.


>
>> Are you suggesting that all the devlink objects should be visible only at the
>> hypervisor layer?
>> 
>Of course not.
>
>Ports and params controlled by hypervisor should be exposed at hypervisor/eswitch wherever its parent devlink instance exist.
>Ports which should be visible inside a VM should be exposed inside a VM.
>So for a given VF,
>
>If eswitch is at hypervisor level,
>$ devlink port show
>pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:10.1/0
>pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002
>
>where VF is enumerated,
>$ devlink port show
>pci/0000:05:10.1/0 eth netdev flavour hostport

So this is how it looks like in VM, right?


>This is because unprivileged VF doesn't have visibility to eswitch and its links.
>
>> I think the terminology need to be defined clearly so that we are all on the
>> same page.
>> 
>> >
>> >> Currently we have ndo_set_vf_mac_addr api that works with PF netdev,
>> >> but i think we are trying to move away from that API and do all the
>> >> configuration via the port representor netdevs.
>> > This is fine rep-netdev represents eswitch port.
>> > You normally don't go to switch to program host port params.
>> >
>> >> As the mac address cannot be configured using this netdev, i think
>> >> Jakub is suggesting creating a devlink opject for each port
>> >> representor and use that interface to set peer mac address.
>> >
>> > I understand but is convoluted interface.
>> > When you program host NIC mac address you talk to iLo or BIOS.
>> > When you program switch side mac address, you go switch/router/modem.
>> >
>> > Also programming host params on host side, also doesn't make
>> assumption that its connected to eswitch.
>> > It also doesn't assume that same connectivity for its life.
>> >
>> > If you model around how physical devices are configured, it will almost
>> never go wrong and still provides same level of flexibility.
>> >
>> >> We should be able use this to configure port vlan too.
>> >>
>> >> Also, instead of subport, can we call vport and support different
>> >> types of vports - sr-iov, siov, vmdq etc.
>> >>
>> > At switch level there are just ports.
>> > sriov, siov, mdev, vmdq are their couter part (peer) where it is connected.
>> >
>> >>>
>> >>> Also eswitch is flat. There is no need of pf/vf flavour for port.
>> >>> It doesn't make sense to define 'mdev' flavour which we are already
>> >> working.
>> >>> At eswitch level it is just a port, it happen to be connected to vf
>> >>> or pf or
>> >> other objects, it doesn't matter.
>> >>> Port should be flavoured as 'hostport' or 'switchport'.
>> >>>
>> >>>
>> >>>> (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15 20:08                                                             ` Jiri Pirko
@ 2019-03-15 20:44                                                               ` Jakub Kicinski
  2019-03-15 22:12                                                                 ` Parav Pandit
  2019-03-18 12:11                                                                 ` Jiri Pirko
  2019-03-15 21:59                                                               ` Parav Pandit
  1 sibling, 2 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-15 20:44 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Parav Pandit, Samudrala, Sridhar, davem, netdev, oss-drivers

On Fri, 15 Mar 2019 21:08:14 +0100, Jiri Pirko wrote:
> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each port -
> >> host facing ports and switch facing ports. This is in addition to the netdevs
> >> that are created today.

To be clear I'm not in favour of the dual-object proposal.

> >I am not proposing any different.
> >I am proposing only two changes.
> >1. control hostport params via referring hostport (not via indirect peer)  
> 
> Not really possible. If you passthrough VF into VM, the hostport goes
> along with it.
> 
> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> >Because switch is flat and agnostic of pf/vf/mdev.  
> 
> Not sure. It's good to have this kind of visibility.

Yes, this subthread honestly makes me go from 60% sure to 95% sure we
shouldn't do the dual object thing :(  Seems like Parav is already
confused by it and suggests host port can exist without switch port :(

> >> Are you suggesting that all the devlink objects should be visible only at the
> >> hypervisor layer?
> >>   
> >Of course not.
> >
> >Ports and params controlled by hypervisor should be exposed at hypervisor/eswitch wherever its parent devlink instance exist.
> >Ports which should be visible inside a VM should be exposed inside a VM.
> >So for a given VF,
> >
> >If eswitch is at hypervisor level,
> >$ devlink port show
> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:10.1/0
> >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002
> >
> >where VF is enumerated,
> >$ devlink port show
> >pci/0000:05:10.1/0 eth netdev flavour hostport  
> 
> So this is how it looks like in VM, right?
> 
> >This is because unprivileged VF doesn't have visibility to eswitch and its links.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15 20:08                                                             ` Jiri Pirko
  2019-03-15 20:44                                                               ` Jakub Kicinski
@ 2019-03-15 21:59                                                               ` Parav Pandit
  2019-03-18 12:21                                                                 ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-15 21:59 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Samudrala, Sridhar, Jakub Kicinski, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Friday, March 15, 2019 3:08 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Samudrala, Sridhar <sridhar.samudrala@intel.com>; Jakub Kicinski
> <jakub.kicinski@netronome.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Fri, Mar 15, 2019 at 04:32:24PM CET, parav@mellanox.com wrote:
> >
> >
> >> -----Original Message-----
> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> >> Sent: Friday, March 15, 2019 12:58 AM
> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> >> <jakub.kicinski@netronome.com>
> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >>
> >> On 3/14/2019 7:40 PM, Parav Pandit wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> >> >> Sent: Thursday, March 14, 2019 9:16 PM
> >> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> >> >> <jakub.kicinski@netronome.com>
> >> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> devlink PCI ports
> >> >>
> >> >>
> >> >>
> >> >> On 3/14/2019 6:28 PM, Parav Pandit wrote:
> >> >>>
> >> >>>
> >> >>>> -----Original Message-----
> >> >>>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> >>>> Sent: Thursday, March 14, 2019 6:39 PM
> >> >>>> To: Parav Pandit <parav@mellanox.com>
> >> >>>> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >> >>>> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >>>> devlink PCI ports
> >> >>>>
> >> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
> >> >>>>>>> Then instances of flavour pci_vf are going to appear in the
> >> >>>>>>> same devlink instance. Those are the switch ports:
> >> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> >> >>>>>>>                           flavour pci_vf pf 0 vf 0
> >> >>>>>>>                           switch_id 00154d130d2f peer
> >> >>>>>>> pci/0000:05:10.1/0
> >> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> >> >>>>>>>                           flavour pci_vf pf 0 vf 0 subport 1
> >> >>>>>>>                           switch_id 00154d130d2f peer
> >> >>>>>>> pci/0000:05:10.1/1
> >> >>>>>>>
> >> >>>>>>> With that, peers are going to appear too, and those are the
> >> >>>>>>> actual VF/VF
> >> >>>>>>> subport:
> >> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> >> >>>>>>>                       peer pci/0000:05:00.0/10002
> >> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> >> >>>>>>>                       peer pci/0000:05:00.0/10003
> >> >>>>>>>
> >> >>>>>>> Later you can push this VF along with all subports to VM. So
> >> >>>>>>> in VM, you are going to see the VF like this:
> >> >>>>>>> $ devlink dev
> >> >>>>>>> pci/0000:00:08.0
> >> >>>>>>> $ devlink port
> >> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
> >> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
> >> >>>>>>>
> >> >>>>>>> And back to your question of how are they connected in eswitch.
> >> >>>>>>> That is totally up to the original user John who did the creation.
> >> >>>>>>> He is in charge of the eswitch on baremetal, he would
> >> >>>>>>> configure the forwarding however he likes.
> >> >>>>>>
> >> >>>>>> Ack, so I think you're saying VM has to communicate to the
> >> >>>>>> cloud environment to have this provisioned using some service
> >> >>>>>> API, not a kernel API.  That's what I wanted to confirm.
> >> >>>>>>
> >> >>>>>> I don't see any benefit to having the "host ports" under
> >> >>>>>> devlink, as such I think it's a matter of preference.
> >> >>>>>
> >> >>>>> We need 'host ports' to configure parameters of this host port
> >> >>>>> which is not exposed by the rep-netdev.
> >> >>>>> Such as mac address.
> >> >>>>
> >> >>>> Please look at the quote of what Jiri wrote above - the host
> >> >>>> port gets passed to the VM, you can't use it as a handle to set the
> MAC.
> >> >>>>
> >> >>>> The way to set the MAC remains:
> >> >>>>
> >> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr
> >> >>>> 00:11:22:33:44:55
> >> >>>>
> >> >>> Even though it can be done, I think this is wrong model to
> >> >>> program
> >> >> hostport mac address using eswitch port.
> >> >>> All devlink objects are control objects, so what is passed to VM
> >> >>> is what is
> >> >> represented by devlink.
> >> >>> VF in the VM will anyway create its devlink object.
> >> >>> What is wrong in programming hostport?
> >> >>> It gives a very clear view to users of topology and objects.
> >> >>
> >> >> The VF or any subport MAC address should be configured by the
> >> >> orchestration layer that is running on the hypervisor and when a
> >> >> VF is assigned to a VF, the host port is not visible to the hypervisor.
> >> > What prevents  creation of hostport due to which is not visible?
> >> > Hostport is control port to program host side of parameters.
> >> > It should be created when user wants to program the parameters.
> >> >
> >> > Model is really straight forward.
> >> > Program host port params using hostport object.
> >> > Program switchport params using rep-netdev.
> >>
> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each
> >> port - host facing ports and switch facing ports. This is in addition
> >> to the netdevs that are created today.
> >>
> >I am not proposing any different.
> >I am proposing only two changes.
> >1. control hostport params via referring hostport (not via indirect
> >peer)
> 
> Not really possible. If you passthrough VF into VM, the hostport goes along
> with it.
> 
No.
I am sorry in showing the enumeration which is the source of confusion.

Below is the right enumeration.

When VF is enumerated initially in the host, where eswitch devlink instance is located.
Below enumeration is seen.

First two entries shows the link between hostport and switchport.
$ devlink port show
pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1

pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002

pci/0000:05:10.1/0 eth netdev flavour hostport
This entry won't be seen if VF auto probing is disabled. Because than VF is not enumerated.

As a user, I will be programming the mac address of hostport for a VF.
pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002


> 
> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> >Because switch is flat and agnostic of pf/vf/mdev.
> 
> Not sure. It's good to have this kind of visibility.
> 
port can have label/attribute indicating that this belong to VF-1 or mdev as long as you are agreeing to have mdev attribute on host port.
(and not ask for abstracting it, because mdev is well defined kernel object).

> 
> >
> >> Are you suggesting that all the devlink objects should be visible
> >> only at the hypervisor layer?
> >>
> >Of course not.
> >
> >Ports and params controlled by hypervisor should be exposed at
> hypervisor/eswitch wherever its parent devlink instance exist.
> >Ports which should be visible inside a VM should be exposed inside a VM.
> >So for a given VF,
> >
> >If eswitch is at hypervisor level,
> >$ devlink port show
> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
> >00154d130d2f peer pci/0000:05:10.1/0
> >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f
> >peer pci/0000:05:00.0/10002
> >
> >where VF is enumerated,
> >$ devlink port show
> >pci/0000:05:10.1/0 eth netdev flavour hostport
> 
> So this is how it looks like in VM, right?
> 
Yep.
Once VF is mapped to VM only two entries are seen and hostport can be still controlled.

$ devlink port show
pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1

pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002

This addresses the case for Infiniband where there is no eswitch, but hostports exists and should be managed.
We shouldn't be inventing new devlink APIs or create a fake sw eswitch object which doesn't exist in hw.

> 
> >This is because unprivileged VF doesn't have visibility to eswitch and its
> links.
> >
> >> I think the terminology need to be defined clearly so that we are all
> >> on the same page.
> >>
> >> >
> >> >> Currently we have ndo_set_vf_mac_addr api that works with PF
> >> >> netdev, but i think we are trying to move away from that API and
> >> >> do all the configuration via the port representor netdevs.
> >> > This is fine rep-netdev represents eswitch port.
> >> > You normally don't go to switch to program host port params.
> >> >
> >> >> As the mac address cannot be configured using this netdev, i think
> >> >> Jakub is suggesting creating a devlink opject for each port
> >> >> representor and use that interface to set peer mac address.
> >> >
> >> > I understand but is convoluted interface.
> >> > When you program host NIC mac address you talk to iLo or BIOS.
> >> > When you program switch side mac address, you go
> switch/router/modem.
> >> >
> >> > Also programming host params on host side, also doesn't make
> >> assumption that its connected to eswitch.
> >> > It also doesn't assume that same connectivity for its life.
> >> >
> >> > If you model around how physical devices are configured, it will
> >> > almost
> >> never go wrong and still provides same level of flexibility.
> >> >
> >> >> We should be able use this to configure port vlan too.
> >> >>
> >> >> Also, instead of subport, can we call vport and support different
> >> >> types of vports - sr-iov, siov, vmdq etc.
> >> >>
> >> > At switch level there are just ports.
> >> > sriov, siov, mdev, vmdq are their couter part (peer) where it is
> connected.
> >> >
> >> >>>
> >> >>> Also eswitch is flat. There is no need of pf/vf flavour for port.
> >> >>> It doesn't make sense to define 'mdev' flavour which we are
> >> >>> already
> >> >> working.
> >> >>> At eswitch level it is just a port, it happen to be connected to
> >> >>> vf or pf or
> >> >> other objects, it doesn't matter.
> >> >>> Port should be flavoured as 'hostport' or 'switchport'.
> >> >>>
> >> >>>
> >> >>>> (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15 20:44                                                               ` Jakub Kicinski
@ 2019-03-15 22:12                                                                 ` Parav Pandit
  2019-03-16  1:16                                                                   ` Jakub Kicinski
  2019-03-18 12:11                                                                 ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-15 22:12 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Friday, March 15, 2019 3:45 PM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Fri, 15 Mar 2019 21:08:14 +0100, Jiri Pirko wrote:
> > >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for
> > >> each port - host facing ports and switch facing ports. This is in
> > >> addition to the netdevs that are created today.
> 
> To be clear I'm not in favour of the dual-object proposal.
> 
> > >I am not proposing any different.
> > >I am proposing only two changes.
> > >1. control hostport params via referring hostport (not via indirect
> > >peer)
> >
> > Not really possible. If you passthrough VF into VM, the hostport goes
> > along with it.
> >
> > >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> > >Because switch is flat and agnostic of pf/vf/mdev.
> >
> > Not sure. It's good to have this kind of visibility.
> 
> Yes, this subthread honestly makes me go from 60% sure to 95% sure we
> shouldn't do the dual object thing :(  Seems like Parav is already confused by
> it and suggests host port can exist without switch port :(
>
I am almost sure that I am not confused.
I am clear that hostports should be configured by devlink
instance which has the capability to program it.
When hostport is in VF, that VF usually won't have privilege
to program it and won't have visibility to eswitch either.

Why would you like to start with restrictive model of peer view only?
Hostports exist for infiniband HCA without switchport.
We should be able to manage hostport objects without creating fake eswitch sw object.

Jakub,
Can you please point to some example other than veth-pair where you configure host param (such as mac address) through a switch?
An existing example will help me to map it to devlink eswitch proposal.
If we go peer programming route, what are your thoughts on
how should we program infiniband hostports which doesn't have peer ports?

> > >> Are you suggesting that all the devlink objects should be visible
> > >> only at the hypervisor layer?
> > >>
> > >Of course not.
> > >
> > >Ports and params controlled by hypervisor should be exposed at
> hypervisor/eswitch wherever its parent devlink instance exist.
> > >Ports which should be visible inside a VM should be exposed inside a VM.
> > >So for a given VF,
> > >
> > >If eswitch is at hypervisor level,
> > >$ devlink port show
> > >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
> > >00154d130d2f peer pci/0000:05:10.1/0
> > >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f
> > >peer pci/0000:05:00.0/10002
> > >
> > >where VF is enumerated,
> > >$ devlink port show
> > >pci/0000:05:10.1/0 eth netdev flavour hostport
> >
> > So this is how it looks like in VM, right?
> >
> > >This is because unprivileged VF doesn't have visibility to eswitch and its
> links.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15 22:12                                                                 ` Parav Pandit
@ 2019-03-16  1:16                                                                   ` Jakub Kicinski
  2019-03-18 15:43                                                                     ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-16  1:16 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers

On Fri, 15 Mar 2019 22:12:13 +0000, Parav Pandit wrote:
> > On Fri, 15 Mar 2019 21:08:14 +0100, Jiri Pirko wrote:  
> > > >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for
> > > >> each port - host facing ports and switch facing ports. This is in
> > > >> addition to the netdevs that are created today.  
> > 
> > To be clear I'm not in favour of the dual-object proposal.
> >   
> > > >I am not proposing any different.
> > > >I am proposing only two changes.
> > > >1. control hostport params via referring hostport (not via indirect
> > > >peer)  
> > >
> > > Not really possible. If you passthrough VF into VM, the hostport goes
> > > along with it.
> > >  
> > > >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> > > >Because switch is flat and agnostic of pf/vf/mdev.  
> > >
> > > Not sure. It's good to have this kind of visibility.  
> > 
> > Yes, this subthread honestly makes me go from 60% sure to 95% sure we
> > shouldn't do the dual object thing :(  Seems like Parav is already confused by
> > it and suggests host port can exist without switch port :(
> >  
> I am almost sure that I am not confused.
> I am clear that hostports should be configured by devlink
> instance which has the capability to program it.

Right now a devlink port is something that the datapath of an ASIC can
address.  All flavours we have presently are basically various MACs -
physical (front panel ports), DSA - for ASIC interconnects on a
multi-ASIC board, CPU - for connecting to a MAC of a NIC.

Jiri's flavour proposal was strictly extending the same logic to
SR-IOV.  Each object addressable within the datapath gets a port.  
The datapath's ID can be used as port_index.

I just reimplemented his patches here and added the subports which I
think he wasn't aware of as they are a quirk of old NFP ASICs.

Having 3 objects for the same datapath ID is a significant departure
from the existing devlink port semantics.

> When hostport is in VF, that VF usually won't have privilege
> to program it and won't have visibility to eswitch either.

If VM has no visibility into the eswitch and no permission to configure
things, what use does the object serve?

> Why would you like to start with restrictive model of peer view only?

"Restrictive model" is one way of putting it.  I'd rather say that we
are not adding objects which:
 (a) do not adhere to current semantics;
 (b) have no distinct function.

We can make the "add MAC address" command not use the word peer:

devlink port addr_pool add pci/0000:05:00.0/10003 type eth 00:11:22:33:44:55
devlink port addr_pool del pci/0000:05:00.0/10003 type eth 00:11:22:33:44:55

if the "peer" doesn't sit right.

> Hostports exist for infiniband HCA without switchport.
> We should be able to manage hostport objects without creating fake eswitch sw object.

It sounds like the RDMA subsystem is lacking a model to represent all
its objects, but that's RDMA's problem to solve..

In netdev world we have netdevs for ports which a used for bulk of the
configuration, most importantly - forwarding.

> Jakub,
> Can you please point to some example other than veth-pair where you
> configure host param (such as mac address) through a switch?

Existing "legacy" SR-IOV NDOs.

> An existing example will help me to map it to devlink eswitch proposal.
> If we go peer programming route, what are your thoughts on
> how should we program infiniband hostports which doesn't have peer ports?

Again, you may be trying to fix RDMA's lack of control objects, which
may be better fixed elsewhere..

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15 20:44                                                               ` Jakub Kicinski
  2019-03-15 22:12                                                                 ` Parav Pandit
@ 2019-03-18 12:11                                                                 ` Jiri Pirko
  2019-03-18 19:16                                                                   ` Jakub Kicinski
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-18 12:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Parav Pandit, Samudrala, Sridhar, davem, netdev, oss-drivers

Fri, Mar 15, 2019 at 09:44:54PM CET, jakub.kicinski@netronome.com wrote:
>On Fri, 15 Mar 2019 21:08:14 +0100, Jiri Pirko wrote:
>> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each port -
>> >> host facing ports and switch facing ports. This is in addition to the netdevs
>> >> that are created today.
>
>To be clear I'm not in favour of the dual-object proposal.
>
>> >I am not proposing any different.
>> >I am proposing only two changes.
>> >1. control hostport params via referring hostport (not via indirect peer)  
>> 
>> Not really possible. If you passthrough VF into VM, the hostport goes
>> along with it.
>> 
>> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
>> >Because switch is flat and agnostic of pf/vf/mdev.  
>> 
>> Not sure. It's good to have this kind of visibility.
>
>Yes, this subthread honestly makes me go from 60% sure to 95% sure we
>shouldn't do the dual object thing :(  Seems like Parav is already
>confused by it and suggests host port can exist without switch port :(

Although I understand your hesitation, the host ports are also
associated with the asic and should be under the devlink instance.
It is just a matter of proper documentation and clear code to avoid
confusions.


>
>> >> Are you suggesting that all the devlink objects should be visible only at the
>> >> hypervisor layer?
>> >>   
>> >Of course not.
>> >
>> >Ports and params controlled by hypervisor should be exposed at hypervisor/eswitch wherever its parent devlink instance exist.
>> >Ports which should be visible inside a VM should be exposed inside a VM.
>> >So for a given VF,
>> >
>> >If eswitch is at hypervisor level,
>> >$ devlink port show
>> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:10.1/0
>> >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002
>> >
>> >where VF is enumerated,
>> >$ devlink port show
>> >pci/0000:05:10.1/0 eth netdev flavour hostport  
>> 
>> So this is how it looks like in VM, right?
>> 
>> >This is because unprivileged VF doesn't have visibility to eswitch and its links.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-15 21:59                                                               ` Parav Pandit
@ 2019-03-18 12:21                                                                 ` Jiri Pirko
  2019-03-18 15:56                                                                   ` Parav Pandit
  2019-03-18 19:19                                                                   ` Jakub Kicinski
  0 siblings, 2 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-18 12:21 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Samudrala, Sridhar, Jakub Kicinski, davem, netdev, oss-drivers

Fri, Mar 15, 2019 at 10:59:33PM CET, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Friday, March 15, 2019 3:08 PM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: Samudrala, Sridhar <sridhar.samudrala@intel.com>; Jakub Kicinski
>> <jakub.kicinski@netronome.com>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>> 
>> Fri, Mar 15, 2019 at 04:32:24PM CET, parav@mellanox.com wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
>> >> Sent: Friday, March 15, 2019 12:58 AM
>> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
>> >> <jakub.kicinski@netronome.com>
>> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
>> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> devlink PCI ports
>> >>
>> >>
>> >> On 3/14/2019 7:40 PM, Parav Pandit wrote:
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
>> >> >> Sent: Thursday, March 14, 2019 9:16 PM
>> >> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
>> >> >> <jakub.kicinski@netronome.com>
>> >> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
>> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> >> devlink PCI ports
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 3/14/2019 6:28 PM, Parav Pandit wrote:
>> >> >>>
>> >> >>>
>> >> >>>> -----Original Message-----
>> >> >>>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >> >>>> Sent: Thursday, March 14, 2019 6:39 PM
>> >> >>>> To: Parav Pandit <parav@mellanox.com>
>> >> >>>> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
>> >> >>>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> >>>> devlink PCI ports
>> >> >>>>
>> >> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
>> >> >>>>>>> Then instances of flavour pci_vf are going to appear in the
>> >> >>>>>>> same devlink instance. Those are the switch ports:
>> >> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
>> >> >>>>>>>                           flavour pci_vf pf 0 vf 0
>> >> >>>>>>>                           switch_id 00154d130d2f peer
>> >> >>>>>>> pci/0000:05:10.1/0
>> >> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
>> >> >>>>>>>                           flavour pci_vf pf 0 vf 0 subport 1
>> >> >>>>>>>                           switch_id 00154d130d2f peer
>> >> >>>>>>> pci/0000:05:10.1/1
>> >> >>>>>>>
>> >> >>>>>>> With that, peers are going to appear too, and those are the
>> >> >>>>>>> actual VF/VF
>> >> >>>>>>> subport:
>> >> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
>> >> >>>>>>>                       peer pci/0000:05:00.0/10002
>> >> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
>> >> >>>>>>>                       peer pci/0000:05:00.0/10003
>> >> >>>>>>>
>> >> >>>>>>> Later you can push this VF along with all subports to VM. So
>> >> >>>>>>> in VM, you are going to see the VF like this:
>> >> >>>>>>> $ devlink dev
>> >> >>>>>>> pci/0000:00:08.0
>> >> >>>>>>> $ devlink port
>> >> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
>> >> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
>> >> >>>>>>>
>> >> >>>>>>> And back to your question of how are they connected in eswitch.
>> >> >>>>>>> That is totally up to the original user John who did the creation.
>> >> >>>>>>> He is in charge of the eswitch on baremetal, he would
>> >> >>>>>>> configure the forwarding however he likes.
>> >> >>>>>>
>> >> >>>>>> Ack, so I think you're saying VM has to communicate to the
>> >> >>>>>> cloud environment to have this provisioned using some service
>> >> >>>>>> API, not a kernel API.  That's what I wanted to confirm.
>> >> >>>>>>
>> >> >>>>>> I don't see any benefit to having the "host ports" under
>> >> >>>>>> devlink, as such I think it's a matter of preference.
>> >> >>>>>
>> >> >>>>> We need 'host ports' to configure parameters of this host port
>> >> >>>>> which is not exposed by the rep-netdev.
>> >> >>>>> Such as mac address.
>> >> >>>>
>> >> >>>> Please look at the quote of what Jiri wrote above - the host
>> >> >>>> port gets passed to the VM, you can't use it as a handle to set the
>> MAC.
>> >> >>>>
>> >> >>>> The way to set the MAC remains:
>> >> >>>>
>> >> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr
>> >> >>>> 00:11:22:33:44:55
>> >> >>>>
>> >> >>> Even though it can be done, I think this is wrong model to
>> >> >>> program
>> >> >> hostport mac address using eswitch port.
>> >> >>> All devlink objects are control objects, so what is passed to VM
>> >> >>> is what is
>> >> >> represented by devlink.
>> >> >>> VF in the VM will anyway create its devlink object.
>> >> >>> What is wrong in programming hostport?
>> >> >>> It gives a very clear view to users of topology and objects.
>> >> >>
>> >> >> The VF or any subport MAC address should be configured by the
>> >> >> orchestration layer that is running on the hypervisor and when a
>> >> >> VF is assigned to a VF, the host port is not visible to the hypervisor.
>> >> > What prevents  creation of hostport due to which is not visible?
>> >> > Hostport is control port to program host side of parameters.
>> >> > It should be created when user wants to program the parameters.
>> >> >
>> >> > Model is really straight forward.
>> >> > Program host port params using hostport object.
>> >> > Program switchport params using rep-netdev.
>> >>
>> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each
>> >> port - host facing ports and switch facing ports. This is in addition
>> >> to the netdevs that are created today.
>> >>
>> >I am not proposing any different.
>> >I am proposing only two changes.
>> >1. control hostport params via referring hostport (not via indirect
>> >peer)
>> 
>> Not really possible. If you passthrough VF into VM, the hostport goes along
>> with it.
>> 
>No.
>I am sorry in showing the enumeration which is the source of confusion.
>
>Below is the right enumeration.
>
>When VF is enumerated initially in the host, where eswitch devlink instance is located.
>Below enumeration is seen.
>
>First two entries shows the link between hostport and switchport.
>$ devlink port show
>pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
>
>pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002

Hostport should not have switch_id.

>
>pci/0000:05:10.1/0 eth netdev flavour hostport
>This entry won't be seen if VF auto probing is disabled. Because than VF is not enumerated.
>
>As a user, I will be programming the mac address of hostport for a VF.
>pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002

Hmm, so you are going to have 2 hostports for VF:
1) pci/0000:05:10.1/0
   real one, that is going to go to VM - with a separate pci address
   and devlink instance.
2) pci/0000:05:00.0/1
   dummy one, which is not really a hostport, as there is no netdev
   created for it. It only models the other side of cable, which is away
   in VM.

>
>
>> 
>> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
>> >Because switch is flat and agnostic of pf/vf/mdev.
>> 
>> Not sure. It's good to have this kind of visibility.
>> 
>port can have label/attribute indicating that this belong to VF-1 or mdev as long as you are agreeing to have mdev attribute on host port.
>(and not ask for abstracting it, because mdev is well defined kernel object).

Why mdev cannot be another flavour?

>
>> 
>> >
>> >> Are you suggesting that all the devlink objects should be visible
>> >> only at the hypervisor layer?
>> >>
>> >Of course not.
>> >
>> >Ports and params controlled by hypervisor should be exposed at
>> hypervisor/eswitch wherever its parent devlink instance exist.
>> >Ports which should be visible inside a VM should be exposed inside a VM.
>> >So for a given VF,
>> >
>> >If eswitch is at hypervisor level,
>> >$ devlink port show
>> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
>> >00154d130d2f peer pci/0000:05:10.1/0
>> >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f
>> >peer pci/0000:05:00.0/10002
>> >
>> >where VF is enumerated,
>> >$ devlink port show
>> >pci/0000:05:10.1/0 eth netdev flavour hostport
>> 
>> So this is how it looks like in VM, right?
>> 
>Yep.
>Once VF is mapped to VM only two entries are seen and hostport can be still controlled.
>
>$ devlink port show
>pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
>
>pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002
>
>This addresses the case for Infiniband where there is no eswitch, but hostports exists and should be managed.
>We shouldn't be inventing new devlink APIs or create a fake sw eswitch object which doesn't exist in hw.
>
>> 
>> >This is because unprivileged VF doesn't have visibility to eswitch and its
>> links.
>> >
>> >> I think the terminology need to be defined clearly so that we are all
>> >> on the same page.
>> >>
>> >> >
>> >> >> Currently we have ndo_set_vf_mac_addr api that works with PF
>> >> >> netdev, but i think we are trying to move away from that API and
>> >> >> do all the configuration via the port representor netdevs.
>> >> > This is fine rep-netdev represents eswitch port.
>> >> > You normally don't go to switch to program host port params.
>> >> >
>> >> >> As the mac address cannot be configured using this netdev, i think
>> >> >> Jakub is suggesting creating a devlink opject for each port
>> >> >> representor and use that interface to set peer mac address.
>> >> >
>> >> > I understand but is convoluted interface.
>> >> > When you program host NIC mac address you talk to iLo or BIOS.
>> >> > When you program switch side mac address, you go
>> switch/router/modem.
>> >> >
>> >> > Also programming host params on host side, also doesn't make
>> >> assumption that its connected to eswitch.
>> >> > It also doesn't assume that same connectivity for its life.
>> >> >
>> >> > If you model around how physical devices are configured, it will
>> >> > almost
>> >> never go wrong and still provides same level of flexibility.
>> >> >
>> >> >> We should be able use this to configure port vlan too.
>> >> >>
>> >> >> Also, instead of subport, can we call vport and support different
>> >> >> types of vports - sr-iov, siov, vmdq etc.
>> >> >>
>> >> > At switch level there are just ports.
>> >> > sriov, siov, mdev, vmdq are their couter part (peer) where it is
>> connected.
>> >> >
>> >> >>>
>> >> >>> Also eswitch is flat. There is no need of pf/vf flavour for port.
>> >> >>> It doesn't make sense to define 'mdev' flavour which we are
>> >> >>> already
>> >> >> working.
>> >> >>> At eswitch level it is just a port, it happen to be connected to
>> >> >>> vf or pf or
>> >> >> other objects, it doesn't matter.
>> >> >>> Port should be flavoured as 'hostport' or 'switchport'.
>> >> >>>
>> >> >>>
>> >> >>>> (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-16  1:16                                                                   ` Jakub Kicinski
@ 2019-03-18 15:43                                                                     ` Parav Pandit
  2019-03-18 19:29                                                                       ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-18 15:43 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Friday, March 15, 2019 8:16 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Fri, 15 Mar 2019 22:12:13 +0000, Parav Pandit wrote:
> > > On Fri, 15 Mar 2019 21:08:14 +0100, Jiri Pirko wrote:
> > > > >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects
> > > > >> for each port - host facing ports and switch facing ports. This
> > > > >> is in addition to the netdevs that are created today.
> > >
> > > To be clear I'm not in favour of the dual-object proposal.
> > >
> > > > >I am not proposing any different.
> > > > >I am proposing only two changes.
> > > > >1. control hostport params via referring hostport (not via
> > > > >indirect
> > > > >peer)
> > > >
> > > > Not really possible. If you passthrough VF into VM, the hostport
> > > > goes along with it.
> > > >
> > > > >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> > > > >Because switch is flat and agnostic of pf/vf/mdev.
> > > >
> > > > Not sure. It's good to have this kind of visibility.
> > >
> > > Yes, this subthread honestly makes me go from 60% sure to 95% sure
> > > we shouldn't do the dual object thing :(  Seems like Parav is
> > > already confused by it and suggests host port can exist without
> > > switch port :(
> > >
> > I am almost sure that I am not confused.
> > I am clear that hostports should be configured by devlink instance
> > which has the capability to program it.
> 
> Right now a devlink port is something that the datapath of an ASIC can
> address.  All flavours we have presently are basically various MACs - physical
> (front panel ports), DSA - for ASIC interconnects on a multi-ASIC board, CPU -
> for connecting to a MAC of a NIC.
> 
Devlink port implementation in commit doesn't say that it is for ASIC datapath or limited to ASIC datapath id.
It is not right to say that 'whole datapath' object should be represented with just single object 'port'.
Datapath involves various stages in ASIC each does different processing.
These datapath objects are interconnected, i.e. hostport is connected to switchport.
Commit [1] says devlink port is physical port. However we already have 3 flavours of port.

> Jiri's flavour proposal was strictly extending the same logic to SR-IOV.  Each
> object addressable within the datapath gets a port.
> The datapath's ID can be used as port_index.
> 
And as I said, it is already restrictive.
Port is a port, it can be labeled for vf/pf, but flavour is not really vf/pf.
Also label applies more on the hostport side vs switchport.

> I just reimplemented his patches here and added the subports which I think
> he wasn't aware of as they are a quirk of old NFP ASICs.
> 
> Having 3 objects for the same datapath ID is a significant departure from the
> existing devlink port semantics.
> 
It is really not same datapath ID.
Because if that is the case, we should be programming mac address on the rep-netdev itself.
But we are not doing that because rep-netdev represents only 'eswitch port'.

> > When hostport is in VF, that VF usually won't have privilege to
> > program it and won't have visibility to eswitch either.
> 
> If VM has no visibility into the eswitch and no permission to configure
> things, what use does the object serve?
> 
To view device properties, health, RO registers, more importantly its port details.
Yonatan is working on grouping these devlink ports and those are control through devlink APIs.
Jiri is actively internally reviewing those patches since last 3+ weeks, not finished yet.
So this visibility is needed anyway.

> > Why would you like to start with restrictive model of peer view only?
> 
> "Restrictive model" is one way of putting it.  I'd rather say that we are not
> adding objects which:
>  (a) do not adhere to current semantics;
>  (b) have no distinct function.
> 
hostport certainly has distinct function than switchport.
i.e. to program host side parameters. (eth.mac, rdma.port_guid and more in future).

> We can make the "add MAC address" command not use the word peer:
> 
> devlink port addr_pool add pci/0000:05:00.0/10003 type eth
> 00:11:22:33:44:55 devlink port addr_pool del pci/0000:05:00.0/10003 type
> eth 00:11:22:33:44:55
> 
> if the "peer" doesn't sit right.
> 
> > Hostports exist for infiniband HCA without switchport.
> > We should be able to manage hostport objects without creating fake
> eswitch sw object.
> 
> It sounds like the RDMA subsystem is lacking a model to represent all its
> objects, but that's RDMA's problem to solve..
> 
devlink framework is not limited to Ethernet, it operates on bus/device notion.
So for Ethernet vendors program mac address.
For rdma vendor programs port_guid (which is equivalent of mac address).

devlink also publishes rdma device info today.
net/core/devlink.c has very well established IB device info exposed via devlink_nl_port_fill() for more than 3 years now in commit [2].
It is not fair to say create, solve it somewhere else.

> In netdev world we have netdevs for ports which a used for bulk of the
> configuration, most importantly - forwarding.
> 
> > Jakub,
> > Can you please point to some example other than veth-pair where you
> > configure host param (such as mac address) through a switch?
> 
> Existing "legacy" SR-IOV NDOs.
> 
That is perfect example of programming hostport parameters, without a eswitch..
At high level, I was looking where you open switch GUI/cli or something equivalent that program's host's mac address..
So far we don't have such equivalent good example yet..

> > An existing example will help me to map it to devlink eswitch proposal.
> > If we go peer programming route, what are your thoughts on how should
> > we program infiniband hostports which doesn't have peer ports?
> 
> Again, you may be trying to fix RDMA's lack of control objects, which may be
> better fixed elsewhere..

devlink port is link agnostic control object.

[1] bfcd3a46617209454cfc0947ab093e37fd1e84ef
[2] commit id bfcd3a466

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 12:21                                                                 ` Jiri Pirko
@ 2019-03-18 15:56                                                                   ` Parav Pandit
  2019-03-18 16:22                                                                     ` Parav Pandit
  2019-03-18 19:19                                                                   ` Jakub Kicinski
  1 sibling, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-18 15:56 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Samudrala, Sridhar, Jakub Kicinski, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Monday, March 18, 2019 7:21 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Samudrala, Sridhar <sridhar.samudrala@intel.com>; Jakub Kicinski
> <jakub.kicinski@netronome.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Fri, Mar 15, 2019 at 10:59:33PM CET, parav@mellanox.com wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Sent: Friday, March 15, 2019 3:08 PM
> >> To: Parav Pandit <parav@mellanox.com>
> >> Cc: Samudrala, Sridhar <sridhar.samudrala@intel.com>; Jakub Kicinski
> >> <jakub.kicinski@netronome.com>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >> Fri, Mar 15, 2019 at 04:32:24PM CET, parav@mellanox.com wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> >> >> Sent: Friday, March 15, 2019 12:58 AM
> >> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> >> >> <jakub.kicinski@netronome.com>
> >> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> devlink PCI ports
> >> >>
> >> >>
> >> >> On 3/14/2019 7:40 PM, Parav Pandit wrote:
> >> >> >
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> >> >> >> Sent: Thursday, March 14, 2019 9:16 PM
> >> >> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> >> >> >> <jakub.kicinski@netronome.com>
> >> >> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> >> devlink PCI ports
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On 3/14/2019 6:28 PM, Parav Pandit wrote:
> >> >> >>>
> >> >> >>>
> >> >> >>>> -----Original Message-----
> >> >> >>>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> >> >>>> Sent: Thursday, March 14, 2019 6:39 PM
> >> >> >>>> To: Parav Pandit <parav@mellanox.com>
> >> >> >>>> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> >> >> >>>> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports
> >> >> >>>> on devlink PCI ports
> >> >> >>>>
> >> >> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
> >> >> >>>>>>> Then instances of flavour pci_vf are going to appear in
> >> >> >>>>>>> the same devlink instance. Those are the switch ports:
> >> >> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> >> >> >>>>>>>                           flavour pci_vf pf 0 vf 0
> >> >> >>>>>>>                           switch_id 00154d130d2f peer
> >> >> >>>>>>> pci/0000:05:10.1/0
> >> >> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> >> >> >>>>>>>                           flavour pci_vf pf 0 vf 0 subport 1
> >> >> >>>>>>>                           switch_id 00154d130d2f peer
> >> >> >>>>>>> pci/0000:05:10.1/1
> >> >> >>>>>>>
> >> >> >>>>>>> With that, peers are going to appear too, and those are
> >> >> >>>>>>> the actual VF/VF
> >> >> >>>>>>> subport:
> >> >> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> >> >> >>>>>>>                       peer pci/0000:05:00.0/10002
> >> >> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> >> >> >>>>>>>                       peer pci/0000:05:00.0/10003
> >> >> >>>>>>>
> >> >> >>>>>>> Later you can push this VF along with all subports to VM.
> >> >> >>>>>>> So in VM, you are going to see the VF like this:
> >> >> >>>>>>> $ devlink dev
> >> >> >>>>>>> pci/0000:00:08.0
> >> >> >>>>>>> $ devlink port
> >> >> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour
> >> >> >>>>>>> pci_vf_host
> >> >> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour
> >> >> >>>>>>> pci_vf_host
> >> >> >>>>>>>
> >> >> >>>>>>> And back to your question of how are they connected in
> eswitch.
> >> >> >>>>>>> That is totally up to the original user John who did the
> creation.
> >> >> >>>>>>> He is in charge of the eswitch on baremetal, he would
> >> >> >>>>>>> configure the forwarding however he likes.
> >> >> >>>>>>
> >> >> >>>>>> Ack, so I think you're saying VM has to communicate to the
> >> >> >>>>>> cloud environment to have this provisioned using some
> >> >> >>>>>> service API, not a kernel API.  That's what I wanted to confirm.
> >> >> >>>>>>
> >> >> >>>>>> I don't see any benefit to having the "host ports" under
> >> >> >>>>>> devlink, as such I think it's a matter of preference.
> >> >> >>>>>
> >> >> >>>>> We need 'host ports' to configure parameters of this host
> >> >> >>>>> port which is not exposed by the rep-netdev.
> >> >> >>>>> Such as mac address.
> >> >> >>>>
> >> >> >>>> Please look at the quote of what Jiri wrote above - the host
> >> >> >>>> port gets passed to the VM, you can't use it as a handle to
> >> >> >>>> set the
> >> MAC.
> >> >> >>>>
> >> >> >>>> The way to set the MAC remains:
> >> >> >>>>
> >> >> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr
> >> >> >>>> 00:11:22:33:44:55
> >> >> >>>>
> >> >> >>> Even though it can be done, I think this is wrong model to
> >> >> >>> program
> >> >> >> hostport mac address using eswitch port.
> >> >> >>> All devlink objects are control objects, so what is passed to
> >> >> >>> VM is what is
> >> >> >> represented by devlink.
> >> >> >>> VF in the VM will anyway create its devlink object.
> >> >> >>> What is wrong in programming hostport?
> >> >> >>> It gives a very clear view to users of topology and objects.
> >> >> >>
> >> >> >> The VF or any subport MAC address should be configured by the
> >> >> >> orchestration layer that is running on the hypervisor and when
> >> >> >> a VF is assigned to a VF, the host port is not visible to the
> hypervisor.
> >> >> > What prevents  creation of hostport due to which is not visible?
> >> >> > Hostport is control port to program host side of parameters.
> >> >> > It should be created when user wants to program the parameters.
> >> >> >
> >> >> > Model is really straight forward.
> >> >> > Program host port params using hostport object.
> >> >> > Program switchport params using rep-netdev.
> >> >>
> >> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for
> >> >> each port - host facing ports and switch facing ports. This is in
> >> >> addition to the netdevs that are created today.
> >> >>
> >> >I am not proposing any different.
> >> >I am proposing only two changes.
> >> >1. control hostport params via referring hostport (not via indirect
> >> >peer)
> >>
> >> Not really possible. If you passthrough VF into VM, the hostport goes
> >> along with it.
> >>
> >No.
> >I am sorry in showing the enumeration which is the source of confusion.
> >
> >Below is the right enumeration.
> >
> >When VF is enumerated initially in the host, where eswitch devlink instance
> is located.
> >Below enumeration is seen.
> >
> >First two entries shows the link between hostport and switchport.
> >$ devlink port show
> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
> >00154d130d2f peer pci/0000:05:00.0/1
> >
> >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f
> >peer pci/0000:05:00.0/10002
> 
> Hostport should not have switch_id.
> 
> >
> >pci/0000:05:10.1/0 eth netdev flavour hostport This entry won't be seen
> >if VF auto probing is disabled. Because than VF is not enumerated.
> >
> >As a user, I will be programming the mac address of hostport for a VF.
> >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f
> >peer pci/0000:05:00.0/10002
> 
> Hmm, so you are going to have 2 hostports for VF:
> 1) pci/0000:05:10.1/0
>    real one, that is going to go to VM - with a separate pci address
>    and devlink instance.

Yep. This is the one where Yonatan's port grouping APIs work on.

> 2) pci/0000:05:00.0/1
>    dummy one, which is not really a hostport, as there is no netdev
>    created for it. It only models the other side of cable, which is away
>    in VM.
> 
Right. This is the control object which typically hypervisor programs.

> >
> >
> >>
> >> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> >> >Because switch is flat and agnostic of pf/vf/mdev.
> >>
> >> Not sure. It's good to have this kind of visibility.
> >>
> >port can have label/attribute indicating that this belong to VF-1 or mdev as
> long as you are agreeing to have mdev attribute on host port.
> >(and not ask for abstracting it, because mdev is well defined kernel object).
> 
> Why mdev cannot be another flavour?
> 

hostport is of type pf/vf/mdev connected to some switchport.

So proposal is to have,
port flavour = hostport/switchport
port type/label = pf/vf/mdev


> >
> >>
> >> >
> >> >> Are you suggesting that all the devlink objects should be visible
> >> >> only at the hypervisor layer?
> >> >>
> >> >Of course not.
> >> >
> >> >Ports and params controlled by hypervisor should be exposed at
> >> hypervisor/eswitch wherever its parent devlink instance exist.
> >> >Ports which should be visible inside a VM should be exposed inside a
> VM.
> >> >So for a given VF,
> >> >
> >> >If eswitch is at hypervisor level,
> >> >$ devlink port show
> >> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
> >> >00154d130d2f peer pci/0000:05:10.1/0
> >> >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id
> >> >00154d130d2f peer pci/0000:05:00.0/10002
> >> >
> >> >where VF is enumerated,
> >> >$ devlink port show
> >> >pci/0000:05:10.1/0 eth netdev flavour hostport
> >>
> >> So this is how it looks like in VM, right?
> >>
> >Yep.
> >Once VF is mapped to VM only two entries are seen and hostport can be
> still controlled.
> >
> >$ devlink port show
> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
> >00154d130d2f peer pci/0000:05:00.0/1
> >
> >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f
> >peer pci/0000:05:00.0/10002
> >
> >This addresses the case for Infiniband where there is no eswitch, but
> hostports exists and should be managed.
> >We shouldn't be inventing new devlink APIs or create a fake sw eswitch
> object which doesn't exist in hw.
> >
> >>
> >> >This is because unprivileged VF doesn't have visibility to eswitch
> >> >and its
> >> links.
> >> >
> >> >> I think the terminology need to be defined clearly so that we are
> >> >> all on the same page.
> >> >>
> >> >> >
> >> >> >> Currently we have ndo_set_vf_mac_addr api that works with PF
> >> >> >> netdev, but i think we are trying to move away from that API
> >> >> >> and do all the configuration via the port representor netdevs.
> >> >> > This is fine rep-netdev represents eswitch port.
> >> >> > You normally don't go to switch to program host port params.
> >> >> >
> >> >> >> As the mac address cannot be configured using this netdev, i
> >> >> >> think Jakub is suggesting creating a devlink opject for each
> >> >> >> port representor and use that interface to set peer mac address.
> >> >> >
> >> >> > I understand but is convoluted interface.
> >> >> > When you program host NIC mac address you talk to iLo or BIOS.
> >> >> > When you program switch side mac address, you go
> >> switch/router/modem.
> >> >> >
> >> >> > Also programming host params on host side, also doesn't make
> >> >> assumption that its connected to eswitch.
> >> >> > It also doesn't assume that same connectivity for its life.
> >> >> >
> >> >> > If you model around how physical devices are configured, it will
> >> >> > almost
> >> >> never go wrong and still provides same level of flexibility.
> >> >> >
> >> >> >> We should be able use this to configure port vlan too.
> >> >> >>
> >> >> >> Also, instead of subport, can we call vport and support
> >> >> >> different types of vports - sr-iov, siov, vmdq etc.
> >> >> >>
> >> >> > At switch level there are just ports.
> >> >> > sriov, siov, mdev, vmdq are their couter part (peer) where it is
> >> connected.
> >> >> >
> >> >> >>>
> >> >> >>> Also eswitch is flat. There is no need of pf/vf flavour for port.
> >> >> >>> It doesn't make sense to define 'mdev' flavour which we are
> >> >> >>> already
> >> >> >> working.
> >> >> >>> At eswitch level it is just a port, it happen to be connected
> >> >> >>> to vf or pf or
> >> >> >> other objects, it doesn't matter.
> >> >> >>> Port should be flavoured as 'hostport' or 'switchport'.
> >> >> >>>
> >> >> >>>
> >> >> >>>> (using the port ids from above)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 15:56                                                                   ` Parav Pandit
@ 2019-03-18 16:22                                                                     ` Parav Pandit
  2019-03-18 19:36                                                                       ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-18 16:22 UTC (permalink / raw)
  To: Parav Pandit, Jiri Pirko
  Cc: Samudrala, Sridhar, Jakub Kicinski, davem, netdev, oss-drivers



> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Parav Pandit
> Sent: Monday, March 18, 2019 10:57 AM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: Samudrala, Sridhar <sridhar.samudrala@intel.com>; Jakub Kicinski
> <jakub.kicinski@netronome.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> 
> 
> > -----Original Message-----
> > From: Jiri Pirko <jiri@resnulli.us>
> > Sent: Monday, March 18, 2019 7:21 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Samudrala, Sridhar <sridhar.samudrala@intel.com>; Jakub Kicinski
> > <jakub.kicinski@netronome.com>; davem@davemloft.net;
> > netdev@vger.kernel.org; oss-drivers@netronome.com
> > Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > devlink PCI ports
> >
> > Fri, Mar 15, 2019 at 10:59:33PM CET, parav@mellanox.com wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Jiri Pirko <jiri@resnulli.us>
> > >> Sent: Friday, March 15, 2019 3:08 PM
> > >> To: Parav Pandit <parav@mellanox.com>
> > >> Cc: Samudrala, Sridhar <sridhar.samudrala@intel.com>; Jakub
> > >> Kicinski <jakub.kicinski@netronome.com>; davem@davemloft.net;
> > >> netdev@vger.kernel.org; oss-drivers@netronome.com
> > >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > >> devlink PCI ports
> > >>
> > >> Fri, Mar 15, 2019 at 04:32:24PM CET, parav@mellanox.com wrote:
> > >> >
> > >> >
> > >> >> -----Original Message-----
> > >> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> > >> >> Sent: Friday, March 15, 2019 12:58 AM
> > >> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> > >> >> <jakub.kicinski@netronome.com>
> > >> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> > >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> > >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > >> >> devlink PCI ports
> > >> >>
> > >> >>
> > >> >> On 3/14/2019 7:40 PM, Parav Pandit wrote:
> > >> >> >
> > >> >> >
> > >> >> >> -----Original Message-----
> > >> >> >> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> > >> >> >> Sent: Thursday, March 14, 2019 9:16 PM
> > >> >> >> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
> > >> >> >> <jakub.kicinski@netronome.com>
> > >> >> >> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> > >> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> > >> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports
> > >> >> >> on devlink PCI ports
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On 3/14/2019 6:28 PM, Parav Pandit wrote:
> > >> >> >>>
> > >> >> >>>
> > >> >> >>>> -----Original Message-----
> > >> >> >>>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > >> >> >>>> Sent: Thursday, March 14, 2019 6:39 PM
> > >> >> >>>> To: Parav Pandit <parav@mellanox.com>
> > >> >> >>>> Cc: Jiri Pirko <jiri@resnulli.us>; davem@davemloft.net;
> > >> >> >>>> netdev@vger.kernel.org; oss-drivers@netronome.com
> > >> >> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow
> > >> >> >>>> subports on devlink PCI ports
> > >> >> >>>>
> > >> >> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
> > >> >> >>>>>>> Then instances of flavour pci_vf are going to appear in
> > >> >> >>>>>>> the same devlink instance. Those are the switch ports:
> > >> >> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
> > >> >> >>>>>>>                           flavour pci_vf pf 0 vf 0
> > >> >> >>>>>>>                           switch_id 00154d130d2f peer
> > >> >> >>>>>>> pci/0000:05:10.1/0
> > >> >> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
> > >> >> >>>>>>>                           flavour pci_vf pf 0 vf 0 subport 1
> > >> >> >>>>>>>                           switch_id 00154d130d2f peer
> > >> >> >>>>>>> pci/0000:05:10.1/1
> > >> >> >>>>>>>
> > >> >> >>>>>>> With that, peers are going to appear too, and those are
> > >> >> >>>>>>> the actual VF/VF
> > >> >> >>>>>>> subport:
> > >> >> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
> > >> >> >>>>>>>                       peer pci/0000:05:00.0/10002
> > >> >> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
> > >> >> >>>>>>>                       peer pci/0000:05:00.0/10003
> > >> >> >>>>>>>
> > >> >> >>>>>>> Later you can push this VF along with all subports to VM.
> > >> >> >>>>>>> So in VM, you are going to see the VF like this:
> > >> >> >>>>>>> $ devlink dev
> > >> >> >>>>>>> pci/0000:00:08.0
> > >> >> >>>>>>> $ devlink port
> > >> >> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour
> > >> >> >>>>>>> pci_vf_host
> > >> >> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour
> > >> >> >>>>>>> pci_vf_host
> > >> >> >>>>>>>
> > >> >> >>>>>>> And back to your question of how are they connected in
> > eswitch.
> > >> >> >>>>>>> That is totally up to the original user John who did the
> > creation.
> > >> >> >>>>>>> He is in charge of the eswitch on baremetal, he would
> > >> >> >>>>>>> configure the forwarding however he likes.
> > >> >> >>>>>>
> > >> >> >>>>>> Ack, so I think you're saying VM has to communicate to
> > >> >> >>>>>> the cloud environment to have this provisioned using some
> > >> >> >>>>>> service API, not a kernel API.  That's what I wanted to
> confirm.
> > >> >> >>>>>>
> > >> >> >>>>>> I don't see any benefit to having the "host ports" under
> > >> >> >>>>>> devlink, as such I think it's a matter of preference.
> > >> >> >>>>>
> > >> >> >>>>> We need 'host ports' to configure parameters of this host
> > >> >> >>>>> port which is not exposed by the rep-netdev.
> > >> >> >>>>> Such as mac address.
> > >> >> >>>>
> > >> >> >>>> Please look at the quote of what Jiri wrote above - the
> > >> >> >>>> host port gets passed to the VM, you can't use it as a
> > >> >> >>>> handle to set the
> > >> MAC.
> > >> >> >>>>
> > >> >> >>>> The way to set the MAC remains:
> > >> >> >>>>
> > >> >> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr
> > >> >> >>>> 00:11:22:33:44:55
> > >> >> >>>>
> > >> >> >>> Even though it can be done, I think this is wrong model to
> > >> >> >>> program
> > >> >> >> hostport mac address using eswitch port.
> > >> >> >>> All devlink objects are control objects, so what is passed
> > >> >> >>> to VM is what is
> > >> >> >> represented by devlink.
> > >> >> >>> VF in the VM will anyway create its devlink object.
> > >> >> >>> What is wrong in programming hostport?
> > >> >> >>> It gives a very clear view to users of topology and objects.
> > >> >> >>
> > >> >> >> The VF or any subport MAC address should be configured by the
> > >> >> >> orchestration layer that is running on the hypervisor and
> > >> >> >> when a VF is assigned to a VF, the host port is not visible
> > >> >> >> to the
> > hypervisor.
> > >> >> > What prevents  creation of hostport due to which is not visible?
> > >> >> > Hostport is control port to program host side of parameters.
> > >> >> > It should be created when user wants to program the parameters.
> > >> >> >
> > >> >> > Model is really straight forward.
> > >> >> > Program host port params using hostport object.
> > >> >> > Program switchport params using rep-netdev.
> > >> >>
> > >> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for
> > >> >> each port - host facing ports and switch facing ports. This is
> > >> >> in addition to the netdevs that are created today.
> > >> >>
> > >> >I am not proposing any different.
> > >> >I am proposing only two changes.
> > >> >1. control hostport params via referring hostport (not via
> > >> >indirect
> > >> >peer)
> > >>
> > >> Not really possible. If you passthrough VF into VM, the hostport
> > >> goes along with it.
> > >>
> > >No.
> > >I am sorry in showing the enumeration which is the source of confusion.
> > >
> > >Below is the right enumeration.
> > >
> > >When VF is enumerated initially in the host, where eswitch devlink
> > >instance
> > is located.
> > >Below enumeration is seen.
> > >
> > >First two entries shows the link between hostport and switchport.
> > >$ devlink port show
> > >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
> > >00154d130d2f peer pci/0000:05:00.0/1
> > >
> > >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f
> > >peer pci/0000:05:00.0/10002
> >
> > Hostport should not have switch_id.
> >
> > >
> > >pci/0000:05:10.1/0 eth netdev flavour hostport This entry won't be
> > >seen if VF auto probing is disabled. Because than VF is not enumerated.
> > >
> > >As a user, I will be programming the mac address of hostport for a VF.
> > >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f
> > >peer pci/0000:05:00.0/10002
> >
> > Hmm, so you are going to have 2 hostports for VF:
> > 1) pci/0000:05:10.1/0
> >    real one, that is going to go to VM - with a separate pci address
> >    and devlink instance.
> 
> Yep. This is the one where Yonatan's port grouping APIs work on.
> 
> > 2) pci/0000:05:00.0/1
> >    dummy one, which is not really a hostport, as there is no netdev
> >    created for it. It only models the other side of cable, which is away
> >    in VM.
> >
> Right. This is the control object which typically hypervisor programs.
> 
> > >
> > >
> > >>
> > >> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> > >> >Because switch is flat and agnostic of pf/vf/mdev.
> > >>
> > >> Not sure. It's good to have this kind of visibility.
> > >>
> > >port can have label/attribute indicating that this belong to VF-1 or
> > >mdev as
> > long as you are agreeing to have mdev attribute on host port.
> > >(and not ask for abstracting it, because mdev is well defined kernel
> object).
> >
> > Why mdev cannot be another flavour?
> >
> 
> hostport is of type pf/vf/mdev connected to some switchport.
> 
> So proposal is to have,
> port flavour = hostport/switchport
> port type/label = pf/vf/mdev
> 
Instead of having two attributes per port, how about having,
port flavour= physical/cpu/dsa/pf/vf/mdev/switchport.

physical and pf has some overlapping definitions.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 12:11                                                                 ` Jiri Pirko
@ 2019-03-18 19:16                                                                   ` Jakub Kicinski
  2019-03-21  8:45                                                                     ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-18 19:16 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Parav Pandit, Samudrala, Sridhar, davem, netdev, oss-drivers

On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
> >> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> >> >Because switch is flat and agnostic of pf/vf/mdev.    
> >> 
> >> Not sure. It's good to have this kind of visibility.  
> >
> >Yes, this subthread honestly makes me go from 60% sure to 95% sure we
> >shouldn't do the dual object thing :(  Seems like Parav is already
> >confused by it and suggests host port can exist without switch port :(  
> 
> Although I understand your hesitation, the host ports are also
> associated with the asic and should be under the devlink instance.
> It is just a matter of proper documentation and clear code to avoid
> confusions.

They are certainly a part and belong to the ASIC, the question in my
mind is more along the lines of do we want "one pipe/one port" or is
it okay to have multiple software objects of the same kind for those
objects.

To put it differently - do want a port object for each port of the ASIC
or do we want a port object for each netdev..

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 12:21                                                                 ` Jiri Pirko
  2019-03-18 15:56                                                                   ` Parav Pandit
@ 2019-03-18 19:19                                                                   ` Jakub Kicinski
  2019-03-18 19:38                                                                     ` Parav Pandit
  2019-03-21  9:09                                                                     ` Jiri Pirko
  1 sibling, 2 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-18 19:19 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Parav Pandit, Samudrala, Sridhar, davem, netdev, oss-drivers

On Mon, 18 Mar 2019 13:21:05 +0100, Jiri Pirko wrote:
> >First two entries shows the link between hostport and switchport.
> >$ devlink port show
> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
> >
> >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002  
> 
> Hostport should not have switch_id.

Isn't a concept of a port of a switch without a switch ID a red flag?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 15:43                                                                     ` Parav Pandit
@ 2019-03-18 19:29                                                                       ` Jakub Kicinski
  0 siblings, 0 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-18 19:29 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers

On Mon, 18 Mar 2019 15:43:20 +0000, Parav Pandit wrote:
> > -----Original Message-----
> > From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Sent: Friday, March 15, 2019 8:16 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> > <sridhar.samudrala@intel.com>; davem@davemloft.net;
> > netdev@vger.kernel.org; oss-drivers@netronome.com
> > Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> > ports
> > 
> > On Fri, 15 Mar 2019 22:12:13 +0000, Parav Pandit wrote:  
> > > > On Fri, 15 Mar 2019 21:08:14 +0100, Jiri Pirko wrote:  
> > > > > >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects
> > > > > >> for each port - host facing ports and switch facing ports. This
> > > > > >> is in addition to the netdevs that are created today.  
> > > >
> > > > To be clear I'm not in favour of the dual-object proposal.
> > > >  
> > > > > >I am not proposing any different.
> > > > > >I am proposing only two changes.
> > > > > >1. control hostport params via referring hostport (not via
> > > > > >indirect
> > > > > >peer)  
> > > > >
> > > > > Not really possible. If you passthrough VF into VM, the hostport
> > > > > goes along with it.
> > > > >  
> > > > > >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> > > > > >Because switch is flat and agnostic of pf/vf/mdev.  
> > > > >
> > > > > Not sure. It's good to have this kind of visibility.  
> > > >
> > > > Yes, this subthread honestly makes me go from 60% sure to 95% sure
> > > > we shouldn't do the dual object thing :(  Seems like Parav is
> > > > already confused by it and suggests host port can exist without
> > > > switch port :(
> > > >  
> > > I am almost sure that I am not confused.
> > > I am clear that hostports should be configured by devlink instance
> > > which has the capability to program it.  
> > 
> > Right now a devlink port is something that the datapath of an ASIC can
> > address.  All flavours we have presently are basically various MACs - physical
> > (front panel ports), DSA - for ASIC interconnects on a multi-ASIC board, CPU -
> > for connecting to a MAC of a NIC.
> >   
> Devlink port implementation in commit doesn't say that it is for ASIC datapath or limited to ASIC datapath id.
> It is not right to say that 'whole datapath' object should be represented with just single object 'port'.
> Datapath involves various stages in ASIC each does different processing.
> These datapath objects are interconnected, i.e. hostport is connected to switchport.
> Commit [1] says devlink port is physical port. However we already have 3 flavours of port.
> 
> > Jiri's flavour proposal was strictly extending the same logic to SR-IOV.  Each
> > object addressable within the datapath gets a port.
> > The datapath's ID can be used as port_index.
> >   
> And as I said, it is already restrictive.
> Port is a port, it can be labeled for vf/pf, but flavour is not really vf/pf.
> Also label applies more on the hostport side vs switchport.
> 
> > I just reimplemented his patches here and added the subports which I think
> > he wasn't aware of as they are a quirk of old NFP ASICs.
> > 
> > Having 3 objects for the same datapath ID is a significant departure from the
> > existing devlink port semantics.
> >   
> It is really not same datapath ID.
> Because if that is the case, we should be programming mac address on the rep-netdev itself.
> But we are not doing that because rep-netdev represents only 'eswitch port'.

Okay, I explained the history to you here, you can write your own if
you want.

> > > When hostport is in VF, that VF usually won't have privilege to
> > > program it and won't have visibility to eswitch either.  
> > 
> > If VM has no visibility into the eswitch and no permission to configure
> > things, what use does the object serve?
> >   
> To view device properties, health, RO registers, more importantly its port details.

Device != port.

> Yonatan is working on grouping these devlink ports and those are control through devlink APIs.
> Jiri is actively internally reviewing those patches since last 3+ weeks, not finished yet.
> So this visibility is needed anyway.

No idea what "grouping devlink ports" may refer to, but I'd be
surprised if it's relevant to VMs.

> > > Why would you like to start with restrictive model of peer view only?  
> > 
> > "Restrictive model" is one way of putting it.  I'd rather say that we are not
> > adding objects which:
> >  (a) do not adhere to current semantics;
> >  (b) have no distinct function.
> >   
> hostport certainly has distinct function than switchport.
> i.e. to program host side parameters. (eth.mac, rdma.port_guid and more in future).

Yeah, Ethernet or IB address, and so many other things (we just can't
happen to think about any right now)...

> > We can make the "add MAC address" command not use the word peer:
> > 
> > devlink port addr_pool add pci/0000:05:00.0/10003 type eth
> > 00:11:22:33:44:55 devlink port addr_pool del pci/0000:05:00.0/10003 type
> > eth 00:11:22:33:44:55
> > 
> > if the "peer" doesn't sit right.
> >   
> > > Hostports exist for infiniband HCA without switchport.
> > > We should be able to manage hostport objects without creating fake  
> > eswitch sw object.
> > 
> > It sounds like the RDMA subsystem is lacking a model to represent all its
> > objects, but that's RDMA's problem to solve..
> >   
> devlink framework is not limited to Ethernet, it operates on bus/device notion.
> So for Ethernet vendors program mac address.
> For rdma vendor programs port_guid (which is equivalent of mac address).
> 
> devlink also publishes rdma device info today.
> net/core/devlink.c has very well established IB device info exposed via devlink_nl_port_fill() for more than 3 years now in commit [2].
> It is not fair to say create, solve it somewhere else.
> 
> > In netdev world we have netdevs for ports which a used for bulk of the
> > configuration, most importantly - forwarding.
> >   
> > > Jakub,
> > > Can you please point to some example other than veth-pair where you
> > > configure host param (such as mac address) through a switch?  
> > 
> > Existing "legacy" SR-IOV NDOs.
> >   
> That is perfect example of programming hostport parameters, without a eswitch..
> At high level, I was looking where you open switch GUI/cli or something equivalent that program's host's mac address..
> So far we don't have such equivalent good example yet..
> 
> > > An existing example will help me to map it to devlink eswitch proposal.
> > > If we go peer programming route, what are your thoughts on how should
> > > we program infiniband hostports which doesn't have peer ports?  
> > 
> > Again, you may be trying to fix RDMA's lack of control objects, which may be
> > better fixed elsewhere..  
> 
> devlink port is link agnostic control object.
> 
> [1] bfcd3a46617209454cfc0947ab093e37fd1e84ef
> [2] commit id bfcd3a466


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 16:22                                                                     ` Parav Pandit
@ 2019-03-18 19:36                                                                       ` Jakub Kicinski
  2019-03-18 19:44                                                                         ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-18 19:36 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers

On Mon, 18 Mar 2019 16:22:33 +0000, Parav Pandit wrote:
>>>>>>2. flavour should not be vf/pf, flavour should be hostport, switchport.
>>>  >Because switch is flat and agnostic of pf/vf/mdev.  
>>>>>
>>>>> Not sure. It's good to have this kind of visibility.
>>>>>  
>>>> port can have label/attribute indicating that this belong to VF-1 or
>>>> mdev as long as you are agreeing to have mdev attribute on host port.  
>>>> (and not ask for abstracting it, because mdev is well defined kernel object).  
>>>
>>> Why mdev cannot be another flavour?
>>>  
>> 
>> hostport is of type pf/vf/mdev connected to some switchport.
>> 
>> So proposal is to have,
>> port flavour = hostport/switchport
>> port type/label = pf/vf/mdev
>>   
> Instead of having two attributes per port, how about having,
> port flavour= physical/cpu/dsa/pf/vf/mdev/switchport.
> 
> physical and pf has some overlapping definitions.

What "overlapping definitions" do physical and PF have?
Sounds like you're referring to limitations of Mellanox HW.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 19:19                                                                   ` Jakub Kicinski
@ 2019-03-18 19:38                                                                     ` Parav Pandit
  2019-03-21  9:09                                                                     ` Jiri Pirko
  1 sibling, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-18 19:38 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Monday, March 18, 2019 2:19 PM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Mon, 18 Mar 2019 13:21:05 +0100, Jiri Pirko wrote:
> > >First two entries shows the link between hostport and switchport.
> > >$ devlink port show
> > >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
> 00154d130d2f peer pci/0000:05:00.0/1
> > >
> > >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f
> peer pci/0000:05:00.0/10002
> >
> > Hostport should not have switch_id.
> 
> Isn't a concept of a port of a switch without a switch ID a red flag?
It shows that a given hostport is connected to a given switchid at given switchport.
But I believe this info can be dropped, because this is visible at the switchport side.
So when showing hostport, it can be just only,

pci/0000:05:00.0/1 eth netdev flavour hostport


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 19:36                                                                       ` Jakub Kicinski
@ 2019-03-18 19:44                                                                         ` Parav Pandit
  2019-03-18 19:59                                                                           ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-18 19:44 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Monday, March 18, 2019 2:37 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Mon, 18 Mar 2019 16:22:33 +0000, Parav Pandit wrote:
> >>>>>>2. flavour should not be vf/pf, flavour should be hostport, switchport.
> >>>  >Because switch is flat and agnostic of pf/vf/mdev.
> >>>>>
> >>>>> Not sure. It's good to have this kind of visibility.
> >>>>>
> >>>> port can have label/attribute indicating that this belong to VF-1
> >>>> or mdev as long as you are agreeing to have mdev attribute on host
> port.
> >>>> (and not ask for abstracting it, because mdev is well defined kernel
> object).
> >>>
> >>> Why mdev cannot be another flavour?
> >>>
> >>
> >> hostport is of type pf/vf/mdev connected to some switchport.
> >>
> >> So proposal is to have,
> >> port flavour = hostport/switchport
> >> port type/label = pf/vf/mdev
> >>
> > Instead of having two attributes per port, how about having, port
> > flavour= physical/cpu/dsa/pf/vf/mdev/switchport.
> >
> > physical and pf has some overlapping definitions.
> 
> What "overlapping definitions" do physical and PF have?
PF has physically user facing port.
And physical port in include/uapi/linux/devlink.h also describe that.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 19:44                                                                         ` Parav Pandit
@ 2019-03-18 19:59                                                                           ` Jakub Kicinski
  2019-03-18 20:35                                                                             ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-18 19:59 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers

On Mon, 18 Mar 2019 19:44:21 +0000, Parav Pandit wrote:
> > -----Original Message-----
> > From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Sent: Monday, March 18, 2019 2:37 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> > <sridhar.samudrala@intel.com>; davem@davemloft.net;
> > netdev@vger.kernel.org; oss-drivers@netronome.com
> > Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> > ports
> > 
> > On Mon, 18 Mar 2019 16:22:33 +0000, Parav Pandit wrote:  
> > >>>>>>2. flavour should not be vf/pf, flavour should be hostport, switchport.  
> > >>>  >Because switch is flat and agnostic of pf/vf/mdev.  
> > >>>>>
> > >>>>> Not sure. It's good to have this kind of visibility.
> > >>>>>  
> > >>>> port can have label/attribute indicating that this belong to VF-1
> > >>>> or mdev as long as you are agreeing to have mdev attribute on host port.  
> > >>>> (and not ask for abstracting it, because mdev is well defined kernel object).  
> > >>>
> > >>> Why mdev cannot be another flavour?
> > >>>  
> > >>
> > >> hostport is of type pf/vf/mdev connected to some switchport.
> > >>
> > >> So proposal is to have,
> > >> port flavour = hostport/switchport
> > >> port type/label = pf/vf/mdev
> > >>  
> > > Instead of having two attributes per port, how about having, port
> > > flavour= physical/cpu/dsa/pf/vf/mdev/switchport.
> > >
> > > physical and pf has some overlapping definitions.  
> > 
> > What "overlapping definitions" do physical and PF have?  
> PF has physically user facing port.

PF doesn't "have a user facing port" in switchdev mode.  It's a
limitation of Mellanox HW that you have some strong association 
there.

> And physical port in include/uapi/linux/devlink.h also describe that.

By "that" you must mean that the physical is a user facing port.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 19:59                                                                           ` Jakub Kicinski
@ 2019-03-18 20:35                                                                             ` Parav Pandit
  2019-03-18 21:29                                                                               ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-18 20:35 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Monday, March 18, 2019 3:00 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Mon, 18 Mar 2019 19:44:21 +0000, Parav Pandit wrote:
> > > -----Original Message-----
> > > From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > > Sent: Monday, March 18, 2019 2:37 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> > > <sridhar.samudrala@intel.com>; davem@davemloft.net;
> > > netdev@vger.kernel.org; oss-drivers@netronome.com
> > > Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > > devlink PCI ports
> > >
> > > On Mon, 18 Mar 2019 16:22:33 +0000, Parav Pandit wrote:
> > > >>>>>>2. flavour should not be vf/pf, flavour should be hostport,
> switchport.
> > > >>>  >Because switch is flat and agnostic of pf/vf/mdev.
> > > >>>>>
> > > >>>>> Not sure. It's good to have this kind of visibility.
> > > >>>>>
> > > >>>> port can have label/attribute indicating that this belong to
> > > >>>> VF-1 or mdev as long as you are agreeing to have mdev attribute on
> host port.
> > > >>>> (and not ask for abstracting it, because mdev is well defined kernel
> object).
> > > >>>
> > > >>> Why mdev cannot be another flavour?
> > > >>>
> > > >>
> > > >> hostport is of type pf/vf/mdev connected to some switchport.
> > > >>
> > > >> So proposal is to have,
> > > >> port flavour = hostport/switchport port type/label = pf/vf/mdev
> > > >>
> > > > Instead of having two attributes per port, how about having, port
> > > > flavour= physical/cpu/dsa/pf/vf/mdev/switchport.
> > > >
> > > > physical and pf has some overlapping definitions.
> > >
> > > What "overlapping definitions" do physical and PF have?
> > PF has physically user facing port.
> 
> PF doesn't "have a user facing port" in switchdev mode.  
Physical port described in include/uapi/linux/devlink.h as DEVLINK_PORT_FLAVOUR_PHYSICAL is not related to switchdev or legacy mode.
As the comment block describe it is 'any kind of port physical facing user'.
Current mlx5 driver doesn't expose ports as physical regardless of switchdev/legacy mode.

> It's a limitation of Mellanox HW that you have some strong association there.
>
Not sure why you keep saying that. Any code reference that I should look at?
Or maybe you can explain what is that limitation, because I am not aware of any.

> > And physical port in include/uapi/linux/devlink.h also describe that.
> 
> By "that" you must mean that the physical is a user facing port.

Can you please describe the difference between 'PF port' and 'physical port of include/uapi/linux/devlink.h'?
I must have missed this crisp definition in discussion between you and Jiri.
I am in meantime checking the thread.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 20:35                                                                             ` Parav Pandit
@ 2019-03-18 21:29                                                                               ` Jakub Kicinski
  2019-03-18 22:11                                                                                 ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-18 21:29 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers

On Mon, 18 Mar 2019 20:35:02 +0000, Parav Pandit wrote:
> > > > > physical and pf has some overlapping definitions.  
> > > >
> > > > What "overlapping definitions" do physical and PF have?  
> > > PF has physically user facing port.  
> > 
> > PF doesn't "have a user facing port" in switchdev mode.  
>  
> Physical port described in include/uapi/linux/devlink.h as
> DEVLINK_PORT_FLAVOUR_PHYSICAL is not related to switchdev or legacy
> mode. 

I said "PF doesn't ...", you're now talking about physical?

> As the comment block describe it is 'any kind of port physical
> facing user'. 

Are you saying PCI function is physical?  Just because PF stands for
Physical Function?  

Physical port in devlink means a port in the front panel where
networking cable goes.

> Current mlx5 driver doesn't expose ports as physical regardless of
> switchdev/legacy mode.

Today mlx5 doesn't expose devlink ports at all.

> > It's a limitation of Mellanox HW that you have some strong
> > association there. 
> Not sure why you keep saying that. Any code reference that I should
> look at? Or maybe you can explain what is that limitation, because I
> am not aware of any.

NIC designs originating from traditional NICs were build as pipelines
from PCI to wire or from wire to PCI.  Reportedly it makes it hard to
completely divorce the PCI PF from the wire port (physical port).

Which is why you may think that "PF has physically user facing port".

> > > And physical port in include/uapi/linux/devlink.h also describe
> > > that.  
> > 
> > By "that" you must mean that the physical is a user facing port.  
> 
> Can you please describe the difference between 'PF port' and
> 'physical port of include/uapi/linux/devlink.h'? I must have missed
> this crisp definition in discussion between you and Jiri. I am in
> meantime checking the thread.

Perhaps start with the cover letter which includes an ASCII drawing?

Using Mellanox nomenclature - PF port is a "representor" for the PF
which may be on another Host (SmartNIC or multihost).  It's pretty 
much the same thing as a VF port/"representor".

Physical port is the hole on the panel of the adapter where cable goes.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 21:29                                                                               ` Jakub Kicinski
@ 2019-03-18 22:11                                                                                 ` Parav Pandit
  2019-03-20 18:24                                                                                   ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-18 22:11 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Monday, March 18, 2019 4:30 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Mon, 18 Mar 2019 20:35:02 +0000, Parav Pandit wrote:
> > > > > > physical and pf has some overlapping definitions.
> > > > >
> > > > > What "overlapping definitions" do physical and PF have?
> > > > PF has physically user facing port.
> > >
> > > PF doesn't "have a user facing port" in switchdev mode.
> >
> > Physical port described in include/uapi/linux/devlink.h as
> > DEVLINK_PORT_FLAVOUR_PHYSICAL is not related to switchdev or legacy
> > mode.
> 
> I said "PF doesn't ...", you're now talking about physical?
> 
> > As the comment block describe it is 'any kind of port physical facing
> > user'.
> 
> Are you saying PCI function is physical?  Just because PF stands for Physical
> Function?
> 
> Physical port in devlink means a port in the front panel where networking
> cable goes.
> 
> > Current mlx5 driver doesn't expose ports as physical regardless of
> > switchdev/legacy mode.
> 
> Today mlx5 doesn't expose devlink ports at all.
> 
> > > It's a limitation of Mellanox HW that you have some strong
> > > association there.
> > Not sure why you keep saying that. Any code reference that I should
> > look at? Or maybe you can explain what is that limitation, because I
> > am not aware of any.
> 
> NIC designs originating from traditional NICs were build as pipelines from PCI
> to wire or from wire to PCI.  Reportedly it makes it hard to completely
> divorce the PCI PF from the wire port (physical port).
> 
> Which is why you may think that "PF has physically user facing port".
> 
> > > > And physical port in include/uapi/linux/devlink.h also describe
> > > > that.
> > >
> > > By "that" you must mean that the physical is a user facing port.
> >
> > Can you please describe the difference between 'PF port' and 'physical
> > port of include/uapi/linux/devlink.h'? I must have missed this crisp
> > definition in discussion between you and Jiri. I am in meantime
> > checking the thread.
> 
> Perhaps start with the cover letter which includes an ASCII drawing?
> 
> Using Mellanox nomenclature - PF port is a "representor" for the PF which
> may be on another Host (SmartNIC or multihost).  It's pretty much the same
> thing as a VF port/"representor".
> 
Yes. We are aligned here. :-)
I see your point, where in multi-host scenario, a physical port may be 1, but PF ports are 4, because of 4 PFs for 4 hosts.
(just an example of 4 hosts with their own mac address sharing 1 physical port).

When there is no multihost and one to one mapping between a PF and physical links, 
there is some overlap between PF port and physical port attributes.
I believe, such overlap is fine as long as we have unique indices for the ports.

So I am ok to have flavours as physical/cpu/dsa/pf/vf/mdev/switchport.
(last 4 as new port flavours).

> Physical port is the hole on the panel of the adapter where cable goes.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 22:11                                                                                 ` Parav Pandit
@ 2019-03-20 18:24                                                                                   ` Parav Pandit
  2019-03-20 20:22                                                                                     ` Jakub Kicinski
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-20 18:24 UTC (permalink / raw)
  To: Parav Pandit, Jakub Kicinski
  Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers

Hi Jiri, Jakub, Samudrala Sridhar,

> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Parav Pandit
> Sent: Monday, March 18, 2019 5:12 PM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> 
> 
> > -----Original Message-----
> > From: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Sent: Monday, March 18, 2019 4:30 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> > <sridhar.samudrala@intel.com>; davem@davemloft.net;
> > netdev@vger.kernel.org; oss-drivers@netronome.com
> > Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > devlink PCI ports
> >
> > On Mon, 18 Mar 2019 20:35:02 +0000, Parav Pandit wrote:
> > > > > > > physical and pf has some overlapping definitions.
> > > > > >
> > > > > > What "overlapping definitions" do physical and PF have?
> > > > > PF has physically user facing port.
> > > >
> > > > PF doesn't "have a user facing port" in switchdev mode.
> > >
> > > Physical port described in include/uapi/linux/devlink.h as
> > > DEVLINK_PORT_FLAVOUR_PHYSICAL is not related to switchdev or legacy
> > > mode.
> >
> > I said "PF doesn't ...", you're now talking about physical?
> >
> > > As the comment block describe it is 'any kind of port physical
> > > facing user'.
> >
> > Are you saying PCI function is physical?  Just because PF stands for
> > Physical Function?
> >
> > Physical port in devlink means a port in the front panel where
> > networking cable goes.
> >
> > > Current mlx5 driver doesn't expose ports as physical regardless of
> > > switchdev/legacy mode.
> >
> > Today mlx5 doesn't expose devlink ports at all.
> >
> > > > It's a limitation of Mellanox HW that you have some strong
> > > > association there.
> > > Not sure why you keep saying that. Any code reference that I should
> > > look at? Or maybe you can explain what is that limitation, because I
> > > am not aware of any.
> >
> > NIC designs originating from traditional NICs were build as pipelines
> > from PCI to wire or from wire to PCI.  Reportedly it makes it hard to
> > completely divorce the PCI PF from the wire port (physical port).
> >
> > Which is why you may think that "PF has physically user facing port".
> >
> > > > > And physical port in include/uapi/linux/devlink.h also describe
> > > > > that.
> > > >
> > > > By "that" you must mean that the physical is a user facing port.
> > >
> > > Can you please describe the difference between 'PF port' and
> > > 'physical port of include/uapi/linux/devlink.h'? I must have missed
> > > this crisp definition in discussion between you and Jiri. I am in
> > > meantime checking the thread.
> >
> > Perhaps start with the cover letter which includes an ASCII drawing?
> >
> > Using Mellanox nomenclature - PF port is a "representor" for the PF
> > which may be on another Host (SmartNIC or multihost).  It's pretty
> > much the same thing as a VF port/"representor".
> >
> Yes. We are aligned here. :-)
> I see your point, where in multi-host scenario, a physical port may be 1, but
> PF ports are 4, because of 4 PFs for 4 hosts.
> (just an example of 4 hosts with their own mac address sharing 1 physical
> port).
> 
> When there is no multihost and one to one mapping between a PF and
> physical links, there is some overlap between PF port and physical port
> attributes.
> I believe, such overlap is fine as long as we have unique indices for the ports.
> 
> So I am ok to have flavours as physical/cpu/dsa/pf/vf/mdev/switchport.
> (last 4 as new port flavours).
> 
> > Physical port is the hole on the panel of the adapter where cable goes.

So my take away from above discussion are:
1. Following new port flavours should be added pci_pf/pci_vf/mdev/switchport.
a. Switchport indicates port on the eswitch. Normally this port has rep-netdev attached to it.
b. host side port flavours are pci_pf/pci_vf/mdev which may be connected to switchport

2. host side port flavours are not limited to Ethernet, as it is for devlink's port instance.

3. Each port is continue to be accessed using unique port index.

4. host side ports and switchport are control objects.
a. switch side ports reside where current eswitch object of devlink instance reside
b. for a given VF/PF/mdev such host side ports may be in hypervisor or VM or both 
depending on the privilege

5. eth.mac_address, rdma.port_guid can be programmed at 
host port flavours by extending as $ devlink port param set...
(similar to devlink dev param set)

6. more host port params can be added in future when user need arise

7. rep-netdev continue to be eswitch (switchport) representor at the switch side.
a. Hence rep-netdev cannot be used for programming host port's parameters.

8. eswitch devlink instance knows when VF/PF/mdev's switchport are created/removed.
Hence, those will be created/deleted by eswitch.
Similarly for host port flavours too.

Does it look fine? Did I miss something?
We would like to progress on incremental patches for item-4 and 
any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-20 18:24                                                                                   ` Parav Pandit
@ 2019-03-20 20:22                                                                                     ` Jakub Kicinski
  2019-03-20 23:39                                                                                       ` Parav Pandit
  2019-03-21  9:08                                                                                       ` Jiri Pirko
  0 siblings, 2 replies; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-20 20:22 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers

On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
> Hi Jiri, Jakub, Samudrala Sridhar,
> > > > > > And physical port in include/uapi/linux/devlink.h also describe
> > > > > > that.  
> > > > >
> > > > > By "that" you must mean that the physical is a user facing port.  
> > > >
> > > > Can you please describe the difference between 'PF port' and
> > > > 'physical port of include/uapi/linux/devlink.h'? I must have missed
> > > > this crisp definition in discussion between you and Jiri. I am in
> > > > meantime checking the thread.  
> > >
> > > Perhaps start with the cover letter which includes an ASCII drawing?
> > >
> > > Using Mellanox nomenclature - PF port is a "representor" for the PF
> > > which may be on another Host (SmartNIC or multihost).  It's pretty
> > > much the same thing as a VF port/"representor".
> > >  
> > Yes. We are aligned here. :-)
> > I see your point, where in multi-host scenario, a physical port may be 1, but
> > PF ports are 4, because of 4 PFs for 4 hosts.
> > (just an example of 4 hosts with their own mac address sharing 1 physical
> > port).
> > 
> > When there is no multihost and one to one mapping between a PF and
> > physical links, there is some overlap between PF port and physical port
> > attributes.
> > I believe, such overlap is fine as long as we have unique indices for the ports.
> > 
> > So I am ok to have flavours as physical/cpu/dsa/pf/vf/mdev/switchport.
> > (last 4 as new port flavours).
> >   
> > > Physical port is the hole on the panel of the adapter where cable goes.  
> 
> So my take away from above discussion are:
> 1. Following new port flavours should be added pci_pf/pci_vf/mdev/switchport.
> a. Switchport indicates port on the eswitch. Normally this port has rep-netdev attached to it.

I don't understand the "switchport".  Surely physical ports are also
attached to the eswitch?  And one of the main purpose of adding the
pci_pf/pci_vf flavours was to generate phys_port_name for the port
netdevs.

Please don't use the term representor if possible.  Representor for
most developers describes the way the netdev is implemented in the
driver, so for Mellanox and Netronome different ports will be
representors and non-representors.  That's why I prefer port netdev
(attached to eswitch, has switch_id) and host netdev (PF/VF netdev,
vNIC, VSI, etc).

> b. host side port flavours are pci_pf/pci_vf/mdev which may be connected to switchport

See above, pci_pf/pci_vf are needed for phys_port_name generation.

> 2. host side port flavours are not limited to Ethernet, as it is for devlink's port instance.
> 
> 3. Each port is continue to be accessed using unique port index.
> 
> 4. host side ports and switchport are control objects.
> a. switch side ports reside where current eswitch object of devlink instance reside
> b. for a given VF/PF/mdev such host side ports may be in hypervisor or VM or both 
> depending on the privilege
> 
> 5. eth.mac_address, rdma.port_guid can be programmed at 
> host port flavours by extending as $ devlink port param set...
> (similar to devlink dev param set)

You can keep restating that's your position, but I have *not* conceded
to that.

> 6. more host port params can be added in future when user need arise
> 
> 7. rep-netdev continue to be eswitch (switchport) representor at the switch side.
> a. Hence rep-netdev cannot be used for programming host port's parameters.
> 
> 8. eswitch devlink instance knows when VF/PF/mdev's switchport are created/removed.
> Hence, those will be created/deleted by eswitch.
> Similarly for host port flavours too.
> 
> Does it look fine? Did I miss something?
> We would like to progress on incremental patches for item-4 and 
> any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports
  2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
                   ` (8 preceding siblings ...)
  2019-03-04 18:22 ` David Miller
@ 2019-03-20 20:25 ` Jakub Kicinski
  2019-03-21  9:11   ` Jiri Pirko
  9 siblings, 1 reply; 100+ messages in thread
From: Jakub Kicinski @ 2019-03-20 20:25 UTC (permalink / raw)
  To: jiri; +Cc: davem, netdev, oss-drivers

On Fri,  1 Mar 2019 10:04:46 -0800, Jakub Kicinski wrote:
> Hi!
> 
> This series is a long overdue follow up to Jiri's work on providing
> a common .ndo_phys_port_name implementation based on devlink ports.

Hi Jiri,

unfortunately I need to focus on some urgent work, so I won't have time
to work on this.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-20 20:22                                                                                     ` Jakub Kicinski
@ 2019-03-20 23:39                                                                                       ` Parav Pandit
  2019-03-21  9:08                                                                                       ` Jiri Pirko
  1 sibling, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-20 23:39 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Wednesday, March 20, 2019 3:23 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
> > Hi Jiri, Jakub, Samudrala Sridhar,
> > > > > > > And physical port in include/uapi/linux/devlink.h also
> > > > > > > describe that.
> > > > > >
> > > > > > By "that" you must mean that the physical is a user facing port.
> > > > >
> > > > > Can you please describe the difference between 'PF port' and
> > > > > 'physical port of include/uapi/linux/devlink.h'? I must have
> > > > > missed this crisp definition in discussion between you and Jiri.
> > > > > I am in meantime checking the thread.
> > > >
> > > > Perhaps start with the cover letter which includes an ASCII drawing?
> > > >
> > > > Using Mellanox nomenclature - PF port is a "representor" for the
> > > > PF which may be on another Host (SmartNIC or multihost).  It's
> > > > pretty much the same thing as a VF port/"representor".
> > > >
> > > Yes. We are aligned here. :-)
> > > I see your point, where in multi-host scenario, a physical port may
> > > be 1, but PF ports are 4, because of 4 PFs for 4 hosts.
> > > (just an example of 4 hosts with their own mac address sharing 1
> > > physical port).
> > >
> > > When there is no multihost and one to one mapping between a PF and
> > > physical links, there is some overlap between PF port and physical
> > > port attributes.
> > > I believe, such overlap is fine as long as we have unique indices for the
> ports.
> > >
> > > So I am ok to have flavours as physical/cpu/dsa/pf/vf/mdev/switchport.
> > > (last 4 as new port flavours).
> > >
> > > > Physical port is the hole on the panel of the adapter where cable goes.
> >
> > So my take away from above discussion are:
> > 1. Following new port flavours should be added
> pci_pf/pci_vf/mdev/switchport.
> > a. Switchport indicates port on the eswitch. Normally this port has rep-
> netdev attached to it.
> 
> I don't understand the "switchport".  Surely physical ports are also attached
> to the eswitch?  
Yes, physical port is attached to eswitch too.
switchport is the one which is not directly visible to the user on the physical panel.
Jiri captured that in the diagram visible here.

[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg289477.html

> And one of the main purpose of adding the pci_pf/pci_vf
> flavours was to generate phys_port_name for the port netdevs.
> 
Yes, this is surely needed due to current direction of placement of pf/vf info in rep-netdev phys_port_name.
So this information is available from the past example in thread.
I copied here and extended some parts of it.

Below example is for one PCI function that has two physical ports, one VF, and one mdev.
It enumerates 6 devlink ports.
VF 1 is physical port 1 of PF 0.
mdev uuidX's port 0 is connected to PF 0, port 0.

$ devlink port show
pci/0000:05:00.0/0 eth netdev repndev_pf0_p0 flavour physical switch_id 00154d130d2f
pci/0000:05:00.0/1 eth netdev repndev_pf0_p1 flavour physical switch_id 00154d130d2f
pci/0000:05:00.0/10001 eth netdev repndev_pf0_vf_1 flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
pci/0000:05:00.0/10002 eth netdev repndev_pf0_p0_mdev_8000 flavour switchport switch_id 00154d130d2f peer mdev/uuidX/0

pci/0000:05:00.0/1 eth netdev flavour vf_ctrl vf 1
mdev/uuidX/0 eth netdev flavour mdev_ctrl

Here eswitch side creates phys_port_name from the peer port's information.
Something like,

struct devlink_peer_info {
	struct devlink *dev;
	struct devlink_port *port;
};

struct devlink_port {
	[...] existing fields;
	struct devlink_peer_info peer_info;
};

Depending on peer port flavour, it can construct appropriate phys_port_name ndo_ops return value.
I changed port flavour name to vf_ctrl and  mdev_ctrl to make it more explicit for clarity.

Let's discuss what is missing in this model.. or part that's incorrect...

> Please don't use the term representor if possible.  Representor for most
> developers describes the way the netdev is implemented in the driver, so for
> Mellanox and Netronome different ports will be representors and non-
> representors.  That's why I prefer port netdev (attached to eswitch, has
> switch_id) and host netdev (PF/VF netdev, vNIC, VSI, etc).
> 
Frankly I don't see much difference in both the proposals.
No matter which way we go, peer_info is needed anyway for vf/pf/mdev.
Only exception is introduction of explicit host_port_side object (instead of indirect peer way).
This gives clear visibility to users and addresses rdma non_sriov case too.

> > b. host side port flavours are pci_pf/pci_vf/mdev which may be
> > connected to switchport
> 
> See above, pci_pf/pci_vf are needed for phys_port_name generation.
> 
> > 2. host side port flavours are not limited to Ethernet, as it is for devlink's
> port instance.
> >
> > 3. Each port is continue to be accessed using unique port index.
> >
> > 4. host side ports and switchport are control objects.
> > a. switch side ports reside where current eswitch object of devlink
> > instance reside b. for a given VF/PF/mdev such host side ports may be
> > in hypervisor or VM or both depending on the privilege
> >
> > 5. eth.mac_address, rdma.port_guid can be programmed at host port
> > flavours by extending as $ devlink port param set...
> > (similar to devlink dev param set)
> 
> You can keep restating that's your position, but I have *not* conceded to
> that.
> 
I understand. Lets discuss any short comings of it (if any) due to which switch side flavours to be defined via indirect peer programming.

> > 6. more host port params can be added in future when user need arise
> >
> > 7. rep-netdev continue to be eswitch (switchport) representor at the switch
> side.
> > a. Hence rep-netdev cannot be used for programming host port's
> parameters.
> >
> > 8. eswitch devlink instance knows when VF/PF/mdev's switchport are
> created/removed.
> > Hence, those will be created/deleted by eswitch.
> > Similarly for host port flavours too.
> >
> > Does it look fine? Did I miss something?
> > We would like to progress on incremental patches for item-4 and any
> > prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 19:16                                                                   ` Jakub Kicinski
@ 2019-03-21  8:45                                                                     ` Jiri Pirko
  2019-03-21 15:14                                                                       ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-21  8:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Parav Pandit, Samudrala, Sridhar, davem, netdev, oss-drivers

Mon, Mar 18, 2019 at 08:16:42PM CET, jakub.kicinski@netronome.com wrote:
>On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
>> >> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
>> >> >Because switch is flat and agnostic of pf/vf/mdev.    
>> >> 
>> >> Not sure. It's good to have this kind of visibility.  
>> >
>> >Yes, this subthread honestly makes me go from 60% sure to 95% sure we
>> >shouldn't do the dual object thing :(  Seems like Parav is already
>> >confused by it and suggests host port can exist without switch port :(  
>> 
>> Although I understand your hesitation, the host ports are also
>> associated with the asic and should be under the devlink instance.
>> It is just a matter of proper documentation and clear code to avoid
>> confusions.
>
>They are certainly a part and belong to the ASIC, the question in my
>mind is more along the lines of do we want "one pipe/one port" or is
>it okay to have multiple software objects of the same kind for those
>objects.
>
>To put it differently - do want a port object for each port of the ASIC
>or do we want a port object for each netdev..

Perhaps "port" name of the object is misleading. From the beginning, I
ment to have it for both switch ports and host ports. I admit that "host
port" is a bit misleading, as it is not really a port of eswitch, but
the counter part. But if we introduce another object for that purpose in
devlink (like "partititon"), it would be a lot of duplication I think.

Question is, do we need the "host port"? Can't we just put a relation to
host netdev in the eswitch port.

So as you suggest, we would have
devlink_port -+-- switch netdev/ibdev
              |
	      +-- host netdev/ibdev

So the "weights" of both switch/host netdev/ibdev to devlink_port
relations would be equivalent.

Then, the devlink_port would represent the whole "pipe" with both ends.

More I think about it, the more it makes sense to me...

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-20 20:22                                                                                     ` Jakub Kicinski
  2019-03-20 23:39                                                                                       ` Parav Pandit
@ 2019-03-21  9:08                                                                                       ` Jiri Pirko
  2019-03-21 15:03                                                                                         ` Parav Pandit
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-21  9:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Parav Pandit, Samudrala, Sridhar, davem, netdev, oss-drivers

Wed, Mar 20, 2019 at 09:22:57PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
>> Hi Jiri, Jakub, Samudrala Sridhar,
>> > > > > > And physical port in include/uapi/linux/devlink.h also describe
>> > > > > > that.  
>> > > > >
>> > > > > By "that" you must mean that the physical is a user facing port.  
>> > > >
>> > > > Can you please describe the difference between 'PF port' and
>> > > > 'physical port of include/uapi/linux/devlink.h'? I must have missed
>> > > > this crisp definition in discussion between you and Jiri. I am in
>> > > > meantime checking the thread.  
>> > >
>> > > Perhaps start with the cover letter which includes an ASCII drawing?
>> > >
>> > > Using Mellanox nomenclature - PF port is a "representor" for the PF
>> > > which may be on another Host (SmartNIC or multihost).  It's pretty
>> > > much the same thing as a VF port/"representor".
>> > >  
>> > Yes. We are aligned here. :-)
>> > I see your point, where in multi-host scenario, a physical port may be 1, but
>> > PF ports are 4, because of 4 PFs for 4 hosts.
>> > (just an example of 4 hosts with their own mac address sharing 1 physical
>> > port).
>> > 
>> > When there is no multihost and one to one mapping between a PF and
>> > physical links, there is some overlap between PF port and physical port
>> > attributes.
>> > I believe, such overlap is fine as long as we have unique indices for the ports.
>> > 
>> > So I am ok to have flavours as physical/cpu/dsa/pf/vf/mdev/switchport.
>> > (last 4 as new port flavours).
>> >   
>> > > Physical port is the hole on the panel of the adapter where cable goes.  
>> 
>> So my take away from above discussion are:
>> 1. Following new port flavours should be added pci_pf/pci_vf/mdev/switchport.
>> a. Switchport indicates port on the eswitch. Normally this port has rep-netdev attached to it.
>
>I don't understand the "switchport".  Surely physical ports are also
>attached to the eswitch?  And one of the main purpose of adding the
>pci_pf/pci_vf flavours was to generate phys_port_name for the port
>netdevs.
>
>Please don't use the term representor if possible.  Representor for
>most developers describes the way the netdev is implemented in the
>driver, so for Mellanox and Netronome different ports will be
>representors and non-representors.  That's why I prefer port netdev
>(attached to eswitch, has switch_id) and host netdev (PF/VF netdev,
>vNIC, VSI, etc).
>
>> b. host side port flavours are pci_pf/pci_vf/mdev which may be connected to switchport
>
>See above, pci_pf/pci_vf are needed for phys_port_name generation.

Yep, that makes sense.


>
>> 2. host side port flavours are not limited to Ethernet, as it is for devlink's port instance.
>> 
>> 3. Each port is continue to be accessed using unique port index.
>> 
>> 4. host side ports and switchport are control objects.
>> a. switch side ports reside where current eswitch object of devlink instance reside
>> b. for a given VF/PF/mdev such host side ports may be in hypervisor or VM or both 
>> depending on the privilege
>> 
>> 5. eth.mac_address, rdma.port_guid can be programmed at 
>> host port flavours by extending as $ devlink port param set...
>> (similar to devlink dev param set)
>
>You can keep restating that's your position, but I have *not* conceded
>to that.

I'm also not convinced that host dummy ports are good idea to hold
these.


>
>> 6. more host port params can be added in future when user need arise
>> 
>> 7. rep-netdev continue to be eswitch (switchport) representor at the switch side.
>> a. Hence rep-netdev cannot be used for programming host port's parameters.
>> 
>> 8. eswitch devlink instance knows when VF/PF/mdev's switchport are created/removed.
>> Hence, those will be created/deleted by eswitch.
>> Similarly for host port flavours too.
>> 
>> Does it look fine? Did I miss something?
>> We would like to progress on incremental patches for item-4 and 
>> any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-18 19:19                                                                   ` Jakub Kicinski
  2019-03-18 19:38                                                                     ` Parav Pandit
@ 2019-03-21  9:09                                                                     ` Jiri Pirko
  1 sibling, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-21  9:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Parav Pandit, Samudrala, Sridhar, davem, netdev, oss-drivers

Mon, Mar 18, 2019 at 08:19:00PM CET, jakub.kicinski@netronome.com wrote:
>On Mon, 18 Mar 2019 13:21:05 +0100, Jiri Pirko wrote:
>> >First two entries shows the link between hostport and switchport.
>> >$ devlink port show
>> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
>> >
>> >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002  
>> 
>> Hostport should not have switch_id.
>
>Isn't a concept of a port of a switch without a switch ID a red flag?

If we want to have hostports as devlink ports, they should not have
switch id. They are not switch ports but rather counterparts of the
switch ports.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports
  2019-03-20 20:25 ` Jakub Kicinski
@ 2019-03-21  9:11   ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2019-03-21  9:11 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers

Wed, Mar 20, 2019 at 09:25:53PM CET, jakub.kicinski@netronome.com wrote:
>On Fri,  1 Mar 2019 10:04:46 -0800, Jakub Kicinski wrote:
>> Hi!
>> 
>> This series is a long overdue follow up to Jiri's work on providing
>> a common .ndo_phys_port_name implementation based on devlink ports.
>
>Hi Jiri,
>
>unfortunately I need to focus on some urgent work, so I won't have time
>to work on this.

Okay. I'll start pushing my devlink patches I have in qeueu and we'll
get back to this later, no worries.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21  9:08                                                                                       ` Jiri Pirko
@ 2019-03-21 15:03                                                                                         ` Parav Pandit
  2019-03-21 16:16                                                                                           ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-21 15:03 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski; +Cc: Samudrala, Sridhar, davem, netdev, oss-drivers

Hi Jiri,

> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Thursday, March 21, 2019 4:08 AM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Wed, Mar 20, 2019 at 09:22:57PM CET, jakub.kicinski@netronome.com
> wrote:
> >On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
> >> Hi Jiri, Jakub, Samudrala Sridhar,
> >> > > > > > And physical port in include/uapi/linux/devlink.h also
> >> > > > > > describe that.
> >> > > > >
> >> > > > > By "that" you must mean that the physical is a user facing port.
> >> > > >
> >> > > > Can you please describe the difference between 'PF port' and
> >> > > > 'physical port of include/uapi/linux/devlink.h'? I must have
> >> > > > missed this crisp definition in discussion between you and
> >> > > > Jiri. I am in meantime checking the thread.
> >> > >
> >> > > Perhaps start with the cover letter which includes an ASCII drawing?
> >> > >
> >> > > Using Mellanox nomenclature - PF port is a "representor" for the
> >> > > PF which may be on another Host (SmartNIC or multihost).  It's
> >> > > pretty much the same thing as a VF port/"representor".
> >> > >
> >> > Yes. We are aligned here. :-)
> >> > I see your point, where in multi-host scenario, a physical port may
> >> > be 1, but PF ports are 4, because of 4 PFs for 4 hosts.
> >> > (just an example of 4 hosts with their own mac address sharing 1
> >> > physical port).
> >> >
> >> > When there is no multihost and one to one mapping between a PF and
> >> > physical links, there is some overlap between PF port and physical
> >> > port attributes.
> >> > I believe, such overlap is fine as long as we have unique indices for the
> ports.
> >> >
> >> > So I am ok to have flavours as physical/cpu/dsa/pf/vf/mdev/switchport.
> >> > (last 4 as new port flavours).
> >> >
> >> > > Physical port is the hole on the panel of the adapter where cable
> goes.
> >>
> >> So my take away from above discussion are:
> >> 1. Following new port flavours should be added
> pci_pf/pci_vf/mdev/switchport.
> >> a. Switchport indicates port on the eswitch. Normally this port has rep-
> netdev attached to it.
> >
> >I don't understand the "switchport".  Surely physical ports are also
> >attached to the eswitch?  And one of the main purpose of adding the
> >pci_pf/pci_vf flavours was to generate phys_port_name for the port
> >netdevs.
> >
> >Please don't use the term representor if possible.  Representor for
> >most developers describes the way the netdev is implemented in the
> >driver, so for Mellanox and Netronome different ports will be
> >representors and non-representors.  That's why I prefer port netdev
> >(attached to eswitch, has switch_id) and host netdev (PF/VF netdev,
> >vNIC, VSI, etc).
> >
> >> b. host side port flavours are pci_pf/pci_vf/mdev which may be
> >> connected to switchport
> >
> >See above, pci_pf/pci_vf are needed for phys_port_name generation.
> 
> Yep, that makes sense.
> 
> 
> >
> >> 2. host side port flavours are not limited to Ethernet, as it is for devlink's
> port instance.
> >>
> >> 3. Each port is continue to be accessed using unique port index.
> >>
> >> 4. host side ports and switchport are control objects.
> >> a. switch side ports reside where current eswitch object of devlink
> >> instance reside b. for a given VF/PF/mdev such host side ports may be
> >> in hypervisor or VM or both depending on the privilege
> >>
> >> 5. eth.mac_address, rdma.port_guid can be programmed at host port
> >> flavours by extending as $ devlink port param set...
> >> (similar to devlink dev param set)
> >
> >You can keep restating that's your position, but I have *not* conceded
> >to that.
> 
> I'm also not convinced that host dummy ports are good idea to hold these.
> 
> 
I didn't understand what do you mean my dummy port.
Can you explain what is wrong in programming host port params using host_port object?
Few questions are unanswered in my past 2 or 3 emails.
Can you please go through them?
Can you point to some example switch API where you program host params at switch?

> >
> >> 6. more host port params can be added in future when user need arise
> >>
> >> 7. rep-netdev continue to be eswitch (switchport) representor at the
> switch side.
> >> a. Hence rep-netdev cannot be used for programming host port's
> parameters.
> >>
> >> 8. eswitch devlink instance knows when VF/PF/mdev's switchport are
> created/removed.
> >> Hence, those will be created/deleted by eswitch.
> >> Similarly for host port flavours too.
> >>
> >> Does it look fine? Did I miss something?
> >> We would like to progress on incremental patches for item-4 and any
> >> prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21  8:45                                                                     ` Jiri Pirko
@ 2019-03-21 15:14                                                                       ` Parav Pandit
  2019-03-21 16:14                                                                         ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-21 15:14 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski; +Cc: Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Thursday, March 21, 2019 3:45 AM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Mon, Mar 18, 2019 at 08:16:42PM CET, jakub.kicinski@netronome.com
> wrote:
> >On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
> >> >> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
> >> >> >Because switch is flat and agnostic of pf/vf/mdev.
> >> >>
> >> >> Not sure. It's good to have this kind of visibility.
> >> >
> >> >Yes, this subthread honestly makes me go from 60% sure to 95% sure
> >> >we shouldn't do the dual object thing :(  Seems like Parav is
> >> >already confused by it and suggests host port can exist without
> >> >switch port :(
> >>
> >> Although I understand your hesitation, the host ports are also
> >> associated with the asic and should be under the devlink instance.
> >> It is just a matter of proper documentation and clear code to avoid
> >> confusions.
> >
> >They are certainly a part and belong to the ASIC, the question in my
> >mind is more along the lines of do we want "one pipe/one port" or is it
> >okay to have multiple software objects of the same kind for those
> >objects.
> >
> >To put it differently - do want a port object for each port of the ASIC
> >or do we want a port object for each netdev..
> 
> Perhaps "port" name of the object is misleading. From the beginning, I ment
> to have it for both switch ports and host ports. I admit that "host port" is a
> bit misleading, as it is not really a port of eswitch, but the counter part. But
> if we introduce another object for that purpose in devlink (like "partititon"),
> it would be a lot of duplication I think.
> 
> Question is, do we need the "host port"? Can't we just put a relation to host
> netdev in the eswitch port.
> 
Can you please explain how does it work for rdma for non sriov use case?
Do we have to create a fake eswitch object?

> So as you suggest, we would have
> devlink_port -+-- switch netdev/ibdev
>               |
> 	      +-- host netdev/ibdev
>

How does this work for rdma to program single node_guid for dual port ibdev?
Did you actually read the recent example I showed in [1]?
[1] https://marc.info/?l=linux-netdev&m=155312521817191&w=2
And why it doesn't address all the use cases of pf/vf/mdev, ibdev, netdev?

> So the "weights" of both switch/host netdev/ibdev to devlink_port relations
> would be equivalent.
> 
> Then, the devlink_port would represent the whole "pipe" with both ends.
> 
> More I think about it, the more it makes sense to me...

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 15:14                                                                       ` Parav Pandit
@ 2019-03-21 16:14                                                                         ` Jiri Pirko
  2019-03-21 16:52                                                                           ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-21 16:14 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers

Thu, Mar 21, 2019 at 04:14:53PM CET, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Thursday, March 21, 2019 3:45 AM
>> To: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
>> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>> 
>> Mon, Mar 18, 2019 at 08:16:42PM CET, jakub.kicinski@netronome.com
>> wrote:
>> >On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
>> >> >> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
>> >> >> >Because switch is flat and agnostic of pf/vf/mdev.
>> >> >>
>> >> >> Not sure. It's good to have this kind of visibility.
>> >> >
>> >> >Yes, this subthread honestly makes me go from 60% sure to 95% sure
>> >> >we shouldn't do the dual object thing :(  Seems like Parav is
>> >> >already confused by it and suggests host port can exist without
>> >> >switch port :(
>> >>
>> >> Although I understand your hesitation, the host ports are also
>> >> associated with the asic and should be under the devlink instance.
>> >> It is just a matter of proper documentation and clear code to avoid
>> >> confusions.
>> >
>> >They are certainly a part and belong to the ASIC, the question in my
>> >mind is more along the lines of do we want "one pipe/one port" or is it
>> >okay to have multiple software objects of the same kind for those
>> >objects.
>> >
>> >To put it differently - do want a port object for each port of the ASIC
>> >or do we want a port object for each netdev..
>> 
>> Perhaps "port" name of the object is misleading. From the beginning, I ment
>> to have it for both switch ports and host ports. I admit that "host port" is a
>> bit misleading, as it is not really a port of eswitch, but the counter part. But
>> if we introduce another object for that purpose in devlink (like "partititon"),
>> it would be a lot of duplication I think.
>> 
>> Question is, do we need the "host port"? Can't we just put a relation to host
>> netdev in the eswitch port.
>> 
>Can you please explain how does it work for rdma for non sriov use case?
>Do we have to create a fake eswitch object?

Could you please provide details on "rdma for non sriov use case"?


>
>> So as you suggest, we would have
>> devlink_port -+-- switch netdev/ibdev
>>               |
>> 	      +-- host netdev/ibdev
>>
>
>How does this work for rdma to program single node_guid for dual port ibdev?
>Did you actually read the recent example I showed in [1]?
>[1] https://marc.info/?l=linux-netdev&m=155312521817191&w=2
>And why it doesn't address all the use cases of pf/vf/mdev, ibdev, netdev?
>
>> So the "weights" of both switch/host netdev/ibdev to devlink_port relations
>> would be equivalent.
>> 
>> Then, the devlink_port would represent the whole "pipe" with both ends.
>> 
>> More I think about it, the more it makes sense to me...

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 15:03                                                                                         ` Parav Pandit
@ 2019-03-21 16:16                                                                                           ` Jiri Pirko
  2019-03-21 16:50                                                                                             ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-21 16:16 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers

Thu, Mar 21, 2019 at 04:03:58PM CET, parav@mellanox.com wrote:
>Hi Jiri,
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Thursday, March 21, 2019 4:08 AM
>> To: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
>> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>> 
>> Wed, Mar 20, 2019 at 09:22:57PM CET, jakub.kicinski@netronome.com
>> wrote:
>> >On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
>> >> Hi Jiri, Jakub, Samudrala Sridhar,
>> >> > > > > > And physical port in include/uapi/linux/devlink.h also
>> >> > > > > > describe that.
>> >> > > > >
>> >> > > > > By "that" you must mean that the physical is a user facing port.
>> >> > > >
>> >> > > > Can you please describe the difference between 'PF port' and
>> >> > > > 'physical port of include/uapi/linux/devlink.h'? I must have
>> >> > > > missed this crisp definition in discussion between you and
>> >> > > > Jiri. I am in meantime checking the thread.
>> >> > >
>> >> > > Perhaps start with the cover letter which includes an ASCII drawing?
>> >> > >
>> >> > > Using Mellanox nomenclature - PF port is a "representor" for the
>> >> > > PF which may be on another Host (SmartNIC or multihost).  It's
>> >> > > pretty much the same thing as a VF port/"representor".
>> >> > >
>> >> > Yes. We are aligned here. :-)
>> >> > I see your point, where in multi-host scenario, a physical port may
>> >> > be 1, but PF ports are 4, because of 4 PFs for 4 hosts.
>> >> > (just an example of 4 hosts with their own mac address sharing 1
>> >> > physical port).
>> >> >
>> >> > When there is no multihost and one to one mapping between a PF and
>> >> > physical links, there is some overlap between PF port and physical
>> >> > port attributes.
>> >> > I believe, such overlap is fine as long as we have unique indices for the
>> ports.
>> >> >
>> >> > So I am ok to have flavours as physical/cpu/dsa/pf/vf/mdev/switchport.
>> >> > (last 4 as new port flavours).
>> >> >
>> >> > > Physical port is the hole on the panel of the adapter where cable
>> goes.
>> >>
>> >> So my take away from above discussion are:
>> >> 1. Following new port flavours should be added
>> pci_pf/pci_vf/mdev/switchport.
>> >> a. Switchport indicates port on the eswitch. Normally this port has rep-
>> netdev attached to it.
>> >
>> >I don't understand the "switchport".  Surely physical ports are also
>> >attached to the eswitch?  And one of the main purpose of adding the
>> >pci_pf/pci_vf flavours was to generate phys_port_name for the port
>> >netdevs.
>> >
>> >Please don't use the term representor if possible.  Representor for
>> >most developers describes the way the netdev is implemented in the
>> >driver, so for Mellanox and Netronome different ports will be
>> >representors and non-representors.  That's why I prefer port netdev
>> >(attached to eswitch, has switch_id) and host netdev (PF/VF netdev,
>> >vNIC, VSI, etc).
>> >
>> >> b. host side port flavours are pci_pf/pci_vf/mdev which may be
>> >> connected to switchport
>> >
>> >See above, pci_pf/pci_vf are needed for phys_port_name generation.
>> 
>> Yep, that makes sense.
>> 
>> 
>> >
>> >> 2. host side port flavours are not limited to Ethernet, as it is for devlink's
>> port instance.
>> >>
>> >> 3. Each port is continue to be accessed using unique port index.
>> >>
>> >> 4. host side ports and switchport are control objects.
>> >> a. switch side ports reside where current eswitch object of devlink
>> >> instance reside b. for a given VF/PF/mdev such host side ports may be
>> >> in hypervisor or VM or both depending on the privilege
>> >>
>> >> 5. eth.mac_address, rdma.port_guid can be programmed at host port
>> >> flavours by extending as $ devlink port param set...
>> >> (similar to devlink dev param set)
>> >
>> >You can keep restating that's your position, but I have *not* conceded
>> >to that.
>> 
>> I'm also not convinced that host dummy ports are good idea to hold these.
>> 
>> 
>I didn't understand what do you mean my dummy port.

It's a port for a VF host port which is not actually in the host but in
the vm. Very confusing.

>Can you explain what is wrong in programming host port params using host_port object?
>Few questions are unanswered in my past 2 or 3 emails.
>Can you please go through them?
>Can you point to some example switch API where you program host params at switch?
>
>> >
>> >> 6. more host port params can be added in future when user need arise
>> >>
>> >> 7. rep-netdev continue to be eswitch (switchport) representor at the
>> switch side.
>> >> a. Hence rep-netdev cannot be used for programming host port's
>> parameters.
>> >>
>> >> 8. eswitch devlink instance knows when VF/PF/mdev's switchport are
>> created/removed.
>> >> Hence, those will be created/deleted by eswitch.
>> >> Similarly for host port flavours too.
>> >>
>> >> Does it look fine? Did I miss something?
>> >> We would like to progress on incremental patches for item-4 and any
>> >> prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 16:16                                                                                           ` Jiri Pirko
@ 2019-03-21 16:50                                                                                             ` Parav Pandit
  2019-03-21 17:23                                                                                               ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-21 16:50 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Thursday, March 21, 2019 11:16 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Thu, Mar 21, 2019 at 04:03:58PM CET, parav@mellanox.com wrote:
> >Hi Jiri,
> >
> >> -----Original Message-----
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Sent: Thursday, March 21, 2019 4:08 AM
> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >> Wed, Mar 20, 2019 at 09:22:57PM CET, jakub.kicinski@netronome.com
> >> wrote:
> >> >On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
> >> >> Hi Jiri, Jakub, Samudrala Sridhar,
> >> >> > > > > > And physical port in include/uapi/linux/devlink.h also
> >> >> > > > > > describe that.
> >> >> > > > >
> >> >> > > > > By "that" you must mean that the physical is a user facing port.
> >> >> > > >
> >> >> > > > Can you please describe the difference between 'PF port' and
> >> >> > > > 'physical port of include/uapi/linux/devlink.h'? I must have
> >> >> > > > missed this crisp definition in discussion between you and
> >> >> > > > Jiri. I am in meantime checking the thread.
> >> >> > >
> >> >> > > Perhaps start with the cover letter which includes an ASCII
> drawing?
> >> >> > >
> >> >> > > Using Mellanox nomenclature - PF port is a "representor" for
> >> >> > > the PF which may be on another Host (SmartNIC or multihost).
> >> >> > > It's pretty much the same thing as a VF port/"representor".
> >> >> > >
> >> >> > Yes. We are aligned here. :-)
> >> >> > I see your point, where in multi-host scenario, a physical port
> >> >> > may be 1, but PF ports are 4, because of 4 PFs for 4 hosts.
> >> >> > (just an example of 4 hosts with their own mac address sharing 1
> >> >> > physical port).
> >> >> >
> >> >> > When there is no multihost and one to one mapping between a PF
> >> >> > and physical links, there is some overlap between PF port and
> >> >> > physical port attributes.
> >> >> > I believe, such overlap is fine as long as we have unique
> >> >> > indices for the
> >> ports.
> >> >> >
> >> >> > So I am ok to have flavours as
> physical/cpu/dsa/pf/vf/mdev/switchport.
> >> >> > (last 4 as new port flavours).
> >> >> >
> >> >> > > Physical port is the hole on the panel of the adapter where
> >> >> > > cable
> >> goes.
> >> >>
> >> >> So my take away from above discussion are:
> >> >> 1. Following new port flavours should be added
> >> pci_pf/pci_vf/mdev/switchport.
> >> >> a. Switchport indicates port on the eswitch. Normally this port
> >> >> has rep-
> >> netdev attached to it.
> >> >
> >> >I don't understand the "switchport".  Surely physical ports are also
> >> >attached to the eswitch?  And one of the main purpose of adding the
> >> >pci_pf/pci_vf flavours was to generate phys_port_name for the port
> >> >netdevs.
> >> >
> >> >Please don't use the term representor if possible.  Representor for
> >> >most developers describes the way the netdev is implemented in the
> >> >driver, so for Mellanox and Netronome different ports will be
> >> >representors and non-representors.  That's why I prefer port netdev
> >> >(attached to eswitch, has switch_id) and host netdev (PF/VF netdev,
> >> >vNIC, VSI, etc).
> >> >
> >> >> b. host side port flavours are pci_pf/pci_vf/mdev which may be
> >> >> connected to switchport
> >> >
> >> >See above, pci_pf/pci_vf are needed for phys_port_name generation.
> >>
> >> Yep, that makes sense.
> >>
> >>
> >> >
> >> >> 2. host side port flavours are not limited to Ethernet, as it is
> >> >> for devlink's
> >> port instance.
> >> >>
> >> >> 3. Each port is continue to be accessed using unique port index.
> >> >>
> >> >> 4. host side ports and switchport are control objects.
> >> >> a. switch side ports reside where current eswitch object of
> >> >> devlink instance reside b. for a given VF/PF/mdev such host side
> >> >> ports may be in hypervisor or VM or both depending on the
> >> >> privilege
> >> >>
> >> >> 5. eth.mac_address, rdma.port_guid can be programmed at host port
> >> >> flavours by extending as $ devlink port param set...
> >> >> (similar to devlink dev param set)
> >> >
> >> >You can keep restating that's your position, but I have *not*
> >> >conceded to that.
> >>
> >> I'm also not convinced that host dummy ports are good idea to hold
> these.
> >>
> >>
> >I didn't understand what do you mean my dummy port.
> 
> It's a port for a VF host port which is not actually in the host but in the vm.
> Very confusing.
> 
It is the vf_ctrl flavour. I don't see it any different than rep-netdev.
rep-netdev is not that confusing to us that represent eswitch vport.
Why vf_ctrl flavour port that represents otherside of the pipe as you have shown in example?
Why it that confusing?


> >Can you explain what is wrong in programming host port params using
> host_port object?
> >Few questions are unanswered in my past 2 or 3 emails.
> >Can you please go through them?
> >Can you point to some example switch API where you program host params
> at switch?
> >
> >> >
> >> >> 6. more host port params can be added in future when user need
> >> >> arise
> >> >>
> >> >> 7. rep-netdev continue to be eswitch (switchport) representor at
> >> >> the
> >> switch side.
> >> >> a. Hence rep-netdev cannot be used for programming host port's
> >> parameters.
> >> >>
> >> >> 8. eswitch devlink instance knows when VF/PF/mdev's switchport are
> >> created/removed.
> >> >> Hence, those will be created/deleted by eswitch.
> >> >> Similarly for host port flavours too.
> >> >>
> >> >> Does it look fine? Did I miss something?
> >> >> We would like to progress on incremental patches for item-4 and
> >> >> any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 16:14                                                                         ` Jiri Pirko
@ 2019-03-21 16:52                                                                           ` Parav Pandit
  2019-03-21 17:20                                                                             ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-21 16:52 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Thursday, March 21, 2019 11:14 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Thu, Mar 21, 2019 at 04:14:53PM CET, parav@mellanox.com wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Sent: Thursday, March 21, 2019 3:45 AM
> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >> Mon, Mar 18, 2019 at 08:16:42PM CET, jakub.kicinski@netronome.com
> >> wrote:
> >> >On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
> >> >> >> >2. flavour should not be vf/pf, flavour should be hostport,
> switchport.
> >> >> >> >Because switch is flat and agnostic of pf/vf/mdev.
> >> >> >>
> >> >> >> Not sure. It's good to have this kind of visibility.
> >> >> >
> >> >> >Yes, this subthread honestly makes me go from 60% sure to 95%
> >> >> >sure we shouldn't do the dual object thing :(  Seems like Parav
> >> >> >is already confused by it and suggests host port can exist
> >> >> >without switch port :(
> >> >>
> >> >> Although I understand your hesitation, the host ports are also
> >> >> associated with the asic and should be under the devlink instance.
> >> >> It is just a matter of proper documentation and clear code to
> >> >> avoid confusions.
> >> >
> >> >They are certainly a part and belong to the ASIC, the question in my
> >> >mind is more along the lines of do we want "one pipe/one port" or is
> >> >it okay to have multiple software objects of the same kind for those
> >> >objects.
> >> >
> >> >To put it differently - do want a port object for each port of the
> >> >ASIC or do we want a port object for each netdev..
> >>
> >> Perhaps "port" name of the object is misleading. From the beginning,
> >> I ment to have it for both switch ports and host ports. I admit that
> >> "host port" is a bit misleading, as it is not really a port of
> >> eswitch, but the counter part. But if we introduce another object for
> >> that purpose in devlink (like "partititon"), it would be a lot of duplication
> I think.
> >>
> >> Question is, do we need the "host port"? Can't we just put a relation
> >> to host netdev in the eswitch port.
> >>
> >Can you please explain how does it work for rdma for non sriov use case?
> >Do we have to create a fake eswitch object?
> 
> Could you please provide details on "rdma for non sriov use case"?
> 
There are multiple mdevs on PFs that happen to have link layer as IB and those devlink instances have port that deserved to be configured same way as that of Eth.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 16:52                                                                           ` Parav Pandit
@ 2019-03-21 17:20                                                                             ` Jiri Pirko
  2019-03-21 17:34                                                                               ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-21 17:20 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers

Thu, Mar 21, 2019 at 05:52:09PM CET, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Thursday, March 21, 2019 11:14 AM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
>> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>> 
>> Thu, Mar 21, 2019 at 04:14:53PM CET, parav@mellanox.com wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Jiri Pirko <jiri@resnulli.us>
>> >> Sent: Thursday, March 21, 2019 3:45 AM
>> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
>> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> devlink PCI ports
>> >>
>> >> Mon, Mar 18, 2019 at 08:16:42PM CET, jakub.kicinski@netronome.com
>> >> wrote:
>> >> >On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
>> >> >> >> >2. flavour should not be vf/pf, flavour should be hostport,
>> switchport.
>> >> >> >> >Because switch is flat and agnostic of pf/vf/mdev.
>> >> >> >>
>> >> >> >> Not sure. It's good to have this kind of visibility.
>> >> >> >
>> >> >> >Yes, this subthread honestly makes me go from 60% sure to 95%
>> >> >> >sure we shouldn't do the dual object thing :(  Seems like Parav
>> >> >> >is already confused by it and suggests host port can exist
>> >> >> >without switch port :(
>> >> >>
>> >> >> Although I understand your hesitation, the host ports are also
>> >> >> associated with the asic and should be under the devlink instance.
>> >> >> It is just a matter of proper documentation and clear code to
>> >> >> avoid confusions.
>> >> >
>> >> >They are certainly a part and belong to the ASIC, the question in my
>> >> >mind is more along the lines of do we want "one pipe/one port" or is
>> >> >it okay to have multiple software objects of the same kind for those
>> >> >objects.
>> >> >
>> >> >To put it differently - do want a port object for each port of the
>> >> >ASIC or do we want a port object for each netdev..
>> >>
>> >> Perhaps "port" name of the object is misleading. From the beginning,
>> >> I ment to have it for both switch ports and host ports. I admit that
>> >> "host port" is a bit misleading, as it is not really a port of
>> >> eswitch, but the counter part. But if we introduce another object for
>> >> that purpose in devlink (like "partititon"), it would be a lot of duplication
>> I think.
>> >>
>> >> Question is, do we need the "host port"? Can't we just put a relation
>> >> to host netdev in the eswitch port.
>> >>
>> >Can you please explain how does it work for rdma for non sriov use case?
>> >Do we have to create a fake eswitch object?
>> 
>> Could you please provide details on "rdma for non sriov use case"?
>> 
>There are multiple mdevs on PFs that happen to have link layer as IB and those devlink instances have port that deserved to be configured same way as that of Eth.

Could you please describe it a bit more. There is still an eswitch
through which the traffic is going, isn't it?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 16:50                                                                                             ` Parav Pandit
@ 2019-03-21 17:23                                                                                               ` Jiri Pirko
  2019-03-21 17:42                                                                                                 ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-21 17:23 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers

Thu, Mar 21, 2019 at 05:50:37PM CET, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Thursday, March 21, 2019 11:16 AM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
>> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>> 
>> Thu, Mar 21, 2019 at 04:03:58PM CET, parav@mellanox.com wrote:
>> >Hi Jiri,
>> >
>> >> -----Original Message-----
>> >> From: Jiri Pirko <jiri@resnulli.us>
>> >> Sent: Thursday, March 21, 2019 4:08 AM
>> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
>> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> devlink PCI ports
>> >>
>> >> Wed, Mar 20, 2019 at 09:22:57PM CET, jakub.kicinski@netronome.com
>> >> wrote:
>> >> >On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
>> >> >> Hi Jiri, Jakub, Samudrala Sridhar,
>> >> >> > > > > > And physical port in include/uapi/linux/devlink.h also
>> >> >> > > > > > describe that.
>> >> >> > > > >
>> >> >> > > > > By "that" you must mean that the physical is a user facing port.
>> >> >> > > >
>> >> >> > > > Can you please describe the difference between 'PF port' and
>> >> >> > > > 'physical port of include/uapi/linux/devlink.h'? I must have
>> >> >> > > > missed this crisp definition in discussion between you and
>> >> >> > > > Jiri. I am in meantime checking the thread.
>> >> >> > >
>> >> >> > > Perhaps start with the cover letter which includes an ASCII
>> drawing?
>> >> >> > >
>> >> >> > > Using Mellanox nomenclature - PF port is a "representor" for
>> >> >> > > the PF which may be on another Host (SmartNIC or multihost).
>> >> >> > > It's pretty much the same thing as a VF port/"representor".
>> >> >> > >
>> >> >> > Yes. We are aligned here. :-)
>> >> >> > I see your point, where in multi-host scenario, a physical port
>> >> >> > may be 1, but PF ports are 4, because of 4 PFs for 4 hosts.
>> >> >> > (just an example of 4 hosts with their own mac address sharing 1
>> >> >> > physical port).
>> >> >> >
>> >> >> > When there is no multihost and one to one mapping between a PF
>> >> >> > and physical links, there is some overlap between PF port and
>> >> >> > physical port attributes.
>> >> >> > I believe, such overlap is fine as long as we have unique
>> >> >> > indices for the
>> >> ports.
>> >> >> >
>> >> >> > So I am ok to have flavours as
>> physical/cpu/dsa/pf/vf/mdev/switchport.
>> >> >> > (last 4 as new port flavours).
>> >> >> >
>> >> >> > > Physical port is the hole on the panel of the adapter where
>> >> >> > > cable
>> >> goes.
>> >> >>
>> >> >> So my take away from above discussion are:
>> >> >> 1. Following new port flavours should be added
>> >> pci_pf/pci_vf/mdev/switchport.
>> >> >> a. Switchport indicates port on the eswitch. Normally this port
>> >> >> has rep-
>> >> netdev attached to it.
>> >> >
>> >> >I don't understand the "switchport".  Surely physical ports are also
>> >> >attached to the eswitch?  And one of the main purpose of adding the
>> >> >pci_pf/pci_vf flavours was to generate phys_port_name for the port
>> >> >netdevs.
>> >> >
>> >> >Please don't use the term representor if possible.  Representor for
>> >> >most developers describes the way the netdev is implemented in the
>> >> >driver, so for Mellanox and Netronome different ports will be
>> >> >representors and non-representors.  That's why I prefer port netdev
>> >> >(attached to eswitch, has switch_id) and host netdev (PF/VF netdev,
>> >> >vNIC, VSI, etc).
>> >> >
>> >> >> b. host side port flavours are pci_pf/pci_vf/mdev which may be
>> >> >> connected to switchport
>> >> >
>> >> >See above, pci_pf/pci_vf are needed for phys_port_name generation.
>> >>
>> >> Yep, that makes sense.
>> >>
>> >>
>> >> >
>> >> >> 2. host side port flavours are not limited to Ethernet, as it is
>> >> >> for devlink's
>> >> port instance.
>> >> >>
>> >> >> 3. Each port is continue to be accessed using unique port index.
>> >> >>
>> >> >> 4. host side ports and switchport are control objects.
>> >> >> a. switch side ports reside where current eswitch object of
>> >> >> devlink instance reside b. for a given VF/PF/mdev such host side
>> >> >> ports may be in hypervisor or VM or both depending on the
>> >> >> privilege
>> >> >>
>> >> >> 5. eth.mac_address, rdma.port_guid can be programmed at host port
>> >> >> flavours by extending as $ devlink port param set...
>> >> >> (similar to devlink dev param set)
>> >> >
>> >> >You can keep restating that's your position, but I have *not*
>> >> >conceded to that.
>> >>
>> >> I'm also not convinced that host dummy ports are good idea to hold
>> these.
>> >>
>> >>
>> >I didn't understand what do you mean my dummy port.
>> 
>> It's a port for a VF host port which is not actually in the host but in the vm.
>> Very confusing.
>> 
>It is the vf_ctrl flavour. I don't see it any different than rep-netdev.
>rep-netdev is not that confusing to us that represent eswitch vport.
>Why vf_ctrl flavour port that represents otherside of the pipe as you have shown in example?
>Why it that confusing?

Because sometimes it is there only once (PF), sometimes twice (VF) - and
one of these is kind-of zombie.

>
>
>> >Can you explain what is wrong in programming host port params using
>> host_port object?
>> >Few questions are unanswered in my past 2 or 3 emails.
>> >Can you please go through them?
>> >Can you point to some example switch API where you program host params
>> at switch?
>> >
>> >> >
>> >> >> 6. more host port params can be added in future when user need
>> >> >> arise
>> >> >>
>> >> >> 7. rep-netdev continue to be eswitch (switchport) representor at
>> >> >> the
>> >> switch side.
>> >> >> a. Hence rep-netdev cannot be used for programming host port's
>> >> parameters.
>> >> >>
>> >> >> 8. eswitch devlink instance knows when VF/PF/mdev's switchport are
>> >> created/removed.
>> >> >> Hence, those will be created/deleted by eswitch.
>> >> >> Similarly for host port flavours too.
>> >> >>
>> >> >> Does it look fine? Did I miss something?
>> >> >> We would like to progress on incremental patches for item-4 and
>> >> >> any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 17:20                                                                             ` Jiri Pirko
@ 2019-03-21 17:34                                                                               ` Parav Pandit
  2019-03-22 16:27                                                                                 ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-21 17:34 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Thursday, March 21, 2019 12:21 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Thu, Mar 21, 2019 at 05:52:09PM CET, parav@mellanox.com wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Sent: Thursday, March 21, 2019 11:14 AM
> >> To: Parav Pandit <parav@mellanox.com>
> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >> Thu, Mar 21, 2019 at 04:14:53PM CET, parav@mellanox.com wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Jiri Pirko <jiri@resnulli.us>
> >> >> Sent: Thursday, March 21, 2019 3:45 AM
> >> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> >> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> devlink PCI ports
> >> >>
> >> >> Mon, Mar 18, 2019 at 08:16:42PM CET, jakub.kicinski@netronome.com
> >> >> wrote:
> >> >> >On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
> >> >> >> >> >2. flavour should not be vf/pf, flavour should be hostport,
> >> switchport.
> >> >> >> >> >Because switch is flat and agnostic of pf/vf/mdev.
> >> >> >> >>
> >> >> >> >> Not sure. It's good to have this kind of visibility.
> >> >> >> >
> >> >> >> >Yes, this subthread honestly makes me go from 60% sure to 95%
> >> >> >> >sure we shouldn't do the dual object thing :(  Seems like
> >> >> >> >Parav is already confused by it and suggests host port can
> >> >> >> >exist without switch port :(
> >> >> >>
> >> >> >> Although I understand your hesitation, the host ports are also
> >> >> >> associated with the asic and should be under the devlink instance.
> >> >> >> It is just a matter of proper documentation and clear code to
> >> >> >> avoid confusions.
> >> >> >
> >> >> >They are certainly a part and belong to the ASIC, the question in
> >> >> >my mind is more along the lines of do we want "one pipe/one port"
> >> >> >or is it okay to have multiple software objects of the same kind
> >> >> >for those objects.
> >> >> >
> >> >> >To put it differently - do want a port object for each port of
> >> >> >the ASIC or do we want a port object for each netdev..
> >> >>
> >> >> Perhaps "port" name of the object is misleading. From the
> >> >> beginning, I ment to have it for both switch ports and host ports.
> >> >> I admit that "host port" is a bit misleading, as it is not really
> >> >> a port of eswitch, but the counter part. But if we introduce
> >> >> another object for that purpose in devlink (like "partititon"), it
> >> >> would be a lot of duplication
> >> I think.
> >> >>
> >> >> Question is, do we need the "host port"? Can't we just put a
> >> >> relation to host netdev in the eswitch port.
> >> >>
> >> >Can you please explain how does it work for rdma for non sriov use
> case?
> >> >Do we have to create a fake eswitch object?
> >>
> >> Could you please provide details on "rdma for non sriov use case"?
> >>
> >There are multiple mdevs on PFs that happen to have link layer as IB and
> those devlink instances have port that deserved to be configured same way
> as that of Eth.
> 
> Could you please describe it a bit more. There is still an eswitch through
> which the traffic is going, isn't it?
Yes, there is an eswitch but it doesn't have switch side of vports.
It is equivalent to legacy mode.
I hope you are not thinking to create fake eswitch vports. :-)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 17:23                                                                                               ` Jiri Pirko
@ 2019-03-21 17:42                                                                                                 ` Parav Pandit
  2019-03-22 13:32                                                                                                   ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-21 17:42 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Thursday, March 21, 2019 12:24 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Thu, Mar 21, 2019 at 05:50:37PM CET, parav@mellanox.com wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Sent: Thursday, March 21, 2019 11:16 AM
> >> To: Parav Pandit <parav@mellanox.com>
> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >> Thu, Mar 21, 2019 at 04:03:58PM CET, parav@mellanox.com wrote:
> >> >Hi Jiri,
> >> >
> >> >> -----Original Message-----
> >> >> From: Jiri Pirko <jiri@resnulli.us>
> >> >> Sent: Thursday, March 21, 2019 4:08 AM
> >> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> >> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> devlink PCI ports
> >> >>
> >> >> Wed, Mar 20, 2019 at 09:22:57PM CET, jakub.kicinski@netronome.com
> >> >> wrote:
> >> >> >On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
> >> >> >> Hi Jiri, Jakub, Samudrala Sridhar,
> >> >> >> > > > > > And physical port in include/uapi/linux/devlink.h
> >> >> >> > > > > > also describe that.
> >> >> >> > > > >
> >> >> >> > > > > By "that" you must mean that the physical is a user facing
> port.
> >> >> >> > > >
> >> >> >> > > > Can you please describe the difference between 'PF port'
> >> >> >> > > > and 'physical port of include/uapi/linux/devlink.h'? I
> >> >> >> > > > must have missed this crisp definition in discussion
> >> >> >> > > > between you and Jiri. I am in meantime checking the thread.
> >> >> >> > >
> >> >> >> > > Perhaps start with the cover letter which includes an ASCII
> >> drawing?
> >> >> >> > >
> >> >> >> > > Using Mellanox nomenclature - PF port is a "representor"
> >> >> >> > > for the PF which may be on another Host (SmartNIC or
> multihost).
> >> >> >> > > It's pretty much the same thing as a VF port/"representor".
> >> >> >> > >
> >> >> >> > Yes. We are aligned here. :-) I see your point, where in
> >> >> >> > multi-host scenario, a physical port may be 1, but PF ports
> >> >> >> > are 4, because of 4 PFs for 4 hosts.
> >> >> >> > (just an example of 4 hosts with their own mac address
> >> >> >> > sharing 1 physical port).
> >> >> >> >
> >> >> >> > When there is no multihost and one to one mapping between a
> >> >> >> > PF and physical links, there is some overlap between PF port
> >> >> >> > and physical port attributes.
> >> >> >> > I believe, such overlap is fine as long as we have unique
> >> >> >> > indices for the
> >> >> ports.
> >> >> >> >
> >> >> >> > So I am ok to have flavours as
> >> physical/cpu/dsa/pf/vf/mdev/switchport.
> >> >> >> > (last 4 as new port flavours).
> >> >> >> >
> >> >> >> > > Physical port is the hole on the panel of the adapter where
> >> >> >> > > cable
> >> >> goes.
> >> >> >>
> >> >> >> So my take away from above discussion are:
> >> >> >> 1. Following new port flavours should be added
> >> >> pci_pf/pci_vf/mdev/switchport.
> >> >> >> a. Switchport indicates port on the eswitch. Normally this port
> >> >> >> has rep-
> >> >> netdev attached to it.
> >> >> >
> >> >> >I don't understand the "switchport".  Surely physical ports are
> >> >> >also attached to the eswitch?  And one of the main purpose of
> >> >> >adding the pci_pf/pci_vf flavours was to generate phys_port_name
> >> >> >for the port netdevs.
> >> >> >
> >> >> >Please don't use the term representor if possible.  Representor
> >> >> >for most developers describes the way the netdev is implemented
> >> >> >in the driver, so for Mellanox and Netronome different ports will
> >> >> >be representors and non-representors.  That's why I prefer port
> >> >> >netdev (attached to eswitch, has switch_id) and host netdev
> >> >> >(PF/VF netdev, vNIC, VSI, etc).
> >> >> >
> >> >> >> b. host side port flavours are pci_pf/pci_vf/mdev which may be
> >> >> >> connected to switchport
> >> >> >
> >> >> >See above, pci_pf/pci_vf are needed for phys_port_name generation.
> >> >>
> >> >> Yep, that makes sense.
> >> >>
> >> >>
> >> >> >
> >> >> >> 2. host side port flavours are not limited to Ethernet, as it
> >> >> >> is for devlink's
> >> >> port instance.
> >> >> >>
> >> >> >> 3. Each port is continue to be accessed using unique port index.
> >> >> >>
> >> >> >> 4. host side ports and switchport are control objects.
> >> >> >> a. switch side ports reside where current eswitch object of
> >> >> >> devlink instance reside b. for a given VF/PF/mdev such host
> >> >> >> side ports may be in hypervisor or VM or both depending on the
> >> >> >> privilege
> >> >> >>
> >> >> >> 5. eth.mac_address, rdma.port_guid can be programmed at host
> >> >> >> port flavours by extending as $ devlink port param set...
> >> >> >> (similar to devlink dev param set)
> >> >> >
> >> >> >You can keep restating that's your position, but I have *not*
> >> >> >conceded to that.
> >> >>
> >> >> I'm also not convinced that host dummy ports are good idea to hold
> >> these.
> >> >>
> >> >>
> >> >I didn't understand what do you mean my dummy port.
> >>
> >> It's a port for a VF host port which is not actually in the host but in the
> vm.
> >> Very confusing.
> >>
> >It is the vf_ctrl flavour. I don't see it any different than rep-netdev.
> >rep-netdev is not that confusing to us that represent eswitch vport.
> >Why vf_ctrl flavour port that represents otherside of the pipe as you have
> shown in example?
> >Why it that confusing?
> 
> Because sometimes it is there only once (PF), sometimes twice (VF) - and one
> of these is kind-of zombie.
> 
I gave the example in email that contains description yesterday.
You didn't respond to it.
So repeating here.
Can you please point what looks like zombie below?

$ devlink port show
pci/0000:05:00.0/0 eth netdev repndev_pf0_p0 flavour physical switch_id 00154d130d2f
pci/0000:05:00.0/1 eth netdev repndev_pf0_p1 flavour physical switch_id 00154d130d2f
pci/0000:05:00.0/10001 eth netdev repndev_pf0_vf_1 flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
pci/0000:05:00.0/10002 eth netdev repndev_pf0_p0_mdev_8000 flavour switchport switch_id 00154d130d2f peer mdev/uuidX/0

pci/0000:05:00.0/1 eth netdev flavour vf_ctrl vf 1
mdev/uuidX/0 eth netdev flavour mdev_ctrl

> >
> >
> >> >Can you explain what is wrong in programming host port params using
> >> host_port object?
> >> >Few questions are unanswered in my past 2 or 3 emails.
> >> >Can you please go through them?
> >> >Can you point to some example switch API where you program host
> >> >params
> >> at switch?
> >> >
> >> >> >
> >> >> >> 6. more host port params can be added in future when user need
> >> >> >> arise
> >> >> >>
> >> >> >> 7. rep-netdev continue to be eswitch (switchport) representor
> >> >> >> at the
> >> >> switch side.
> >> >> >> a. Hence rep-netdev cannot be used for programming host port's
> >> >> parameters.
> >> >> >>
> >> >> >> 8. eswitch devlink instance knows when VF/PF/mdev's switchport
> >> >> >> are
> >> >> created/removed.
> >> >> >> Hence, those will be created/deleted by eswitch.
> >> >> >> Similarly for host port flavours too.
> >> >> >>
> >> >> >> Does it look fine? Did I miss something?
> >> >> >> We would like to progress on incremental patches for item-4 and
> >> >> >> any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 17:42                                                                                                 ` Parav Pandit
@ 2019-03-22 13:32                                                                                                   ` Jiri Pirko
  2019-03-23  0:40                                                                                                     ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-22 13:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers

Thu, Mar 21, 2019 at 06:42:55PM CET, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Thursday, March 21, 2019 12:24 PM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
>> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>> 
>> Thu, Mar 21, 2019 at 05:50:37PM CET, parav@mellanox.com wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Jiri Pirko <jiri@resnulli.us>
>> >> Sent: Thursday, March 21, 2019 11:16 AM
>> >> To: Parav Pandit <parav@mellanox.com>
>> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
>> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> devlink PCI ports
>> >>
>> >> Thu, Mar 21, 2019 at 04:03:58PM CET, parav@mellanox.com wrote:
>> >> >Hi Jiri,
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Jiri Pirko <jiri@resnulli.us>
>> >> >> Sent: Thursday, March 21, 2019 4:08 AM
>> >> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
>> >> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> >> devlink PCI ports
>> >> >>
>> >> >> Wed, Mar 20, 2019 at 09:22:57PM CET, jakub.kicinski@netronome.com
>> >> >> wrote:
>> >> >> >On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
>> >> >> >> Hi Jiri, Jakub, Samudrala Sridhar,
>> >> >> >> > > > > > And physical port in include/uapi/linux/devlink.h
>> >> >> >> > > > > > also describe that.
>> >> >> >> > > > >
>> >> >> >> > > > > By "that" you must mean that the physical is a user facing
>> port.
>> >> >> >> > > >
>> >> >> >> > > > Can you please describe the difference between 'PF port'
>> >> >> >> > > > and 'physical port of include/uapi/linux/devlink.h'? I
>> >> >> >> > > > must have missed this crisp definition in discussion
>> >> >> >> > > > between you and Jiri. I am in meantime checking the thread.
>> >> >> >> > >
>> >> >> >> > > Perhaps start with the cover letter which includes an ASCII
>> >> drawing?
>> >> >> >> > >
>> >> >> >> > > Using Mellanox nomenclature - PF port is a "representor"
>> >> >> >> > > for the PF which may be on another Host (SmartNIC or
>> multihost).
>> >> >> >> > > It's pretty much the same thing as a VF port/"representor".
>> >> >> >> > >
>> >> >> >> > Yes. We are aligned here. :-) I see your point, where in
>> >> >> >> > multi-host scenario, a physical port may be 1, but PF ports
>> >> >> >> > are 4, because of 4 PFs for 4 hosts.
>> >> >> >> > (just an example of 4 hosts with their own mac address
>> >> >> >> > sharing 1 physical port).
>> >> >> >> >
>> >> >> >> > When there is no multihost and one to one mapping between a
>> >> >> >> > PF and physical links, there is some overlap between PF port
>> >> >> >> > and physical port attributes.
>> >> >> >> > I believe, such overlap is fine as long as we have unique
>> >> >> >> > indices for the
>> >> >> ports.
>> >> >> >> >
>> >> >> >> > So I am ok to have flavours as
>> >> physical/cpu/dsa/pf/vf/mdev/switchport.
>> >> >> >> > (last 4 as new port flavours).
>> >> >> >> >
>> >> >> >> > > Physical port is the hole on the panel of the adapter where
>> >> >> >> > > cable
>> >> >> goes.
>> >> >> >>
>> >> >> >> So my take away from above discussion are:
>> >> >> >> 1. Following new port flavours should be added
>> >> >> pci_pf/pci_vf/mdev/switchport.
>> >> >> >> a. Switchport indicates port on the eswitch. Normally this port
>> >> >> >> has rep-
>> >> >> netdev attached to it.
>> >> >> >
>> >> >> >I don't understand the "switchport".  Surely physical ports are
>> >> >> >also attached to the eswitch?  And one of the main purpose of
>> >> >> >adding the pci_pf/pci_vf flavours was to generate phys_port_name
>> >> >> >for the port netdevs.
>> >> >> >
>> >> >> >Please don't use the term representor if possible.  Representor
>> >> >> >for most developers describes the way the netdev is implemented
>> >> >> >in the driver, so for Mellanox and Netronome different ports will
>> >> >> >be representors and non-representors.  That's why I prefer port
>> >> >> >netdev (attached to eswitch, has switch_id) and host netdev
>> >> >> >(PF/VF netdev, vNIC, VSI, etc).
>> >> >> >
>> >> >> >> b. host side port flavours are pci_pf/pci_vf/mdev which may be
>> >> >> >> connected to switchport
>> >> >> >
>> >> >> >See above, pci_pf/pci_vf are needed for phys_port_name generation.
>> >> >>
>> >> >> Yep, that makes sense.
>> >> >>
>> >> >>
>> >> >> >
>> >> >> >> 2. host side port flavours are not limited to Ethernet, as it
>> >> >> >> is for devlink's
>> >> >> port instance.
>> >> >> >>
>> >> >> >> 3. Each port is continue to be accessed using unique port index.
>> >> >> >>
>> >> >> >> 4. host side ports and switchport are control objects.
>> >> >> >> a. switch side ports reside where current eswitch object of
>> >> >> >> devlink instance reside b. for a given VF/PF/mdev such host
>> >> >> >> side ports may be in hypervisor or VM or both depending on the
>> >> >> >> privilege
>> >> >> >>
>> >> >> >> 5. eth.mac_address, rdma.port_guid can be programmed at host
>> >> >> >> port flavours by extending as $ devlink port param set...
>> >> >> >> (similar to devlink dev param set)
>> >> >> >
>> >> >> >You can keep restating that's your position, but I have *not*
>> >> >> >conceded to that.
>> >> >>
>> >> >> I'm also not convinced that host dummy ports are good idea to hold
>> >> these.
>> >> >>
>> >> >>
>> >> >I didn't understand what do you mean my dummy port.
>> >>
>> >> It's a port for a VF host port which is not actually in the host but in the
>> vm.
>> >> Very confusing.
>> >>
>> >It is the vf_ctrl flavour. I don't see it any different than rep-netdev.
>> >rep-netdev is not that confusing to us that represent eswitch vport.
>> >Why vf_ctrl flavour port that represents otherside of the pipe as you have
>> shown in example?
>> >Why it that confusing?
>> 
>> Because sometimes it is there only once (PF), sometimes twice (VF) - and one
>> of these is kind-of zombie.
>> 
>I gave the example in email that contains description yesterday.
>You didn't respond to it.
>So repeating here.
>Can you please point what looks like zombie below?
>
>$ devlink port show
>pci/0000:05:00.0/0 eth netdev repndev_pf0_p0 flavour physical switch_id 00154d130d2f
>pci/0000:05:00.0/1 eth netdev repndev_pf0_p1 flavour physical switch_id 00154d130d2f
>pci/0000:05:00.0/10001 eth netdev repndev_pf0_vf_1 flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
>pci/0000:05:00.0/10002 eth netdev repndev_pf0_p0_mdev_8000 flavour switchport switch_id 00154d130d2f peer mdev/uuidX/0
>
>pci/0000:05:00.0/1 eth netdev flavour vf_ctrl vf 1

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this one.
You are missing an actual VF instance.

>mdev/uuidX/0 eth netdev flavour mdev_ctrl
Why "ctrl"?

>
>> >
>> >
>> >> >Can you explain what is wrong in programming host port params using
>> >> host_port object?
>> >> >Few questions are unanswered in my past 2 or 3 emails.
>> >> >Can you please go through them?
>> >> >Can you point to some example switch API where you program host
>> >> >params
>> >> at switch?
>> >> >
>> >> >> >
>> >> >> >> 6. more host port params can be added in future when user need
>> >> >> >> arise
>> >> >> >>
>> >> >> >> 7. rep-netdev continue to be eswitch (switchport) representor
>> >> >> >> at the
>> >> >> switch side.
>> >> >> >> a. Hence rep-netdev cannot be used for programming host port's
>> >> >> parameters.
>> >> >> >>
>> >> >> >> 8. eswitch devlink instance knows when VF/PF/mdev's switchport
>> >> >> >> are
>> >> >> created/removed.
>> >> >> >> Hence, those will be created/deleted by eswitch.
>> >> >> >> Similarly for host port flavours too.
>> >> >> >>
>> >> >> >> Does it look fine? Did I miss something?
>> >> >> >> We would like to progress on incremental patches for item-4 and
>> >> >> >> any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-21 17:34                                                                               ` Parav Pandit
@ 2019-03-22 16:27                                                                                 ` Jiri Pirko
  2019-03-23  0:37                                                                                   ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2019-03-22 16:27 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers

Thu, Mar 21, 2019 at 06:34:22PM CET, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Thursday, March 21, 2019 12:21 PM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
>> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> netdev@vger.kernel.org; oss-drivers@netronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>> 
>> Thu, Mar 21, 2019 at 05:52:09PM CET, parav@mellanox.com wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Jiri Pirko <jiri@resnulli.us>
>> >> Sent: Thursday, March 21, 2019 11:14 AM
>> >> To: Parav Pandit <parav@mellanox.com>
>> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
>> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> devlink PCI ports
>> >>
>> >> Thu, Mar 21, 2019 at 04:14:53PM CET, parav@mellanox.com wrote:
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Jiri Pirko <jiri@resnulli.us>
>> >> >> Sent: Thursday, March 21, 2019 3:45 AM
>> >> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
>> >> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
>> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
>> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> >> devlink PCI ports
>> >> >>
>> >> >> Mon, Mar 18, 2019 at 08:16:42PM CET, jakub.kicinski@netronome.com
>> >> >> wrote:
>> >> >> >On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
>> >> >> >> >> >2. flavour should not be vf/pf, flavour should be hostport,
>> >> switchport.
>> >> >> >> >> >Because switch is flat and agnostic of pf/vf/mdev.
>> >> >> >> >>
>> >> >> >> >> Not sure. It's good to have this kind of visibility.
>> >> >> >> >
>> >> >> >> >Yes, this subthread honestly makes me go from 60% sure to 95%
>> >> >> >> >sure we shouldn't do the dual object thing :(  Seems like
>> >> >> >> >Parav is already confused by it and suggests host port can
>> >> >> >> >exist without switch port :(
>> >> >> >>
>> >> >> >> Although I understand your hesitation, the host ports are also
>> >> >> >> associated with the asic and should be under the devlink instance.
>> >> >> >> It is just a matter of proper documentation and clear code to
>> >> >> >> avoid confusions.
>> >> >> >
>> >> >> >They are certainly a part and belong to the ASIC, the question in
>> >> >> >my mind is more along the lines of do we want "one pipe/one port"
>> >> >> >or is it okay to have multiple software objects of the same kind
>> >> >> >for those objects.
>> >> >> >
>> >> >> >To put it differently - do want a port object for each port of
>> >> >> >the ASIC or do we want a port object for each netdev..
>> >> >>
>> >> >> Perhaps "port" name of the object is misleading. From the
>> >> >> beginning, I ment to have it for both switch ports and host ports.
>> >> >> I admit that "host port" is a bit misleading, as it is not really
>> >> >> a port of eswitch, but the counter part. But if we introduce
>> >> >> another object for that purpose in devlink (like "partititon"), it
>> >> >> would be a lot of duplication
>> >> I think.
>> >> >>
>> >> >> Question is, do we need the "host port"? Can't we just put a
>> >> >> relation to host netdev in the eswitch port.
>> >> >>
>> >> >Can you please explain how does it work for rdma for non sriov use
>> case?
>> >> >Do we have to create a fake eswitch object?
>> >>
>> >> Could you please provide details on "rdma for non sriov use case"?
>> >>
>> >There are multiple mdevs on PFs that happen to have link layer as IB and
>> those devlink instances have port that deserved to be configured same way
>> as that of Eth.
>> 
>> Could you please describe it a bit more. There is still an eswitch through
>> which the traffic is going, isn't it?
>Yes, there is an eswitch but it doesn't have switch side of vports.

Why? They should have.


>It is equivalent to legacy mode.
>I hope you are not thinking to create fake eswitch vports. :-)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-22 16:27                                                                                 ` Jiri Pirko
@ 2019-03-23  0:37                                                                                   ` Parav Pandit
  0 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-23  0:37 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Friday, March 22, 2019 11:28 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Thu, Mar 21, 2019 at 06:34:22PM CET, parav@mellanox.com wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Sent: Thursday, March 21, 2019 12:21 PM
> >> To: Parav Pandit <parav@mellanox.com>
> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >> Thu, Mar 21, 2019 at 05:52:09PM CET, parav@mellanox.com wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Jiri Pirko <jiri@resnulli.us>
> >> >> Sent: Thursday, March 21, 2019 11:14 AM
> >> >> To: Parav Pandit <parav@mellanox.com>
> >> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala,
> >> >> Sridhar <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> devlink PCI ports
> >> >>
> >> >> Thu, Mar 21, 2019 at 04:14:53PM CET, parav@mellanox.com wrote:
> >> >> >
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> >> >> >> Sent: Thursday, March 21, 2019 3:45 AM
> >> >> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> >> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> >> >> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> >> devlink PCI ports
> >> >> >>
> >> >> >> Mon, Mar 18, 2019 at 08:16:42PM CET,
> >> >> >> jakub.kicinski@netronome.com
> >> >> >> wrote:
> >> >> >> >On Mon, 18 Mar 2019 13:11:54 +0100, Jiri Pirko wrote:
> >> >> >> >> >> >2. flavour should not be vf/pf, flavour should be
> >> >> >> >> >> >hostport,
> >> >> switchport.
> >> >> >> >> >> >Because switch is flat and agnostic of pf/vf/mdev.
> >> >> >> >> >>
> >> >> >> >> >> Not sure. It's good to have this kind of visibility.
> >> >> >> >> >
> >> >> >> >> >Yes, this subthread honestly makes me go from 60% sure to
> >> >> >> >> >95% sure we shouldn't do the dual object thing :(  Seems
> >> >> >> >> >like Parav is already confused by it and suggests host port
> >> >> >> >> >can exist without switch port :(
> >> >> >> >>
> >> >> >> >> Although I understand your hesitation, the host ports are
> >> >> >> >> also associated with the asic and should be under the devlink
> instance.
> >> >> >> >> It is just a matter of proper documentation and clear code
> >> >> >> >> to avoid confusions.
> >> >> >> >
> >> >> >> >They are certainly a part and belong to the ASIC, the question
> >> >> >> >in my mind is more along the lines of do we want "one pipe/one
> port"
> >> >> >> >or is it okay to have multiple software objects of the same
> >> >> >> >kind for those objects.
> >> >> >> >
> >> >> >> >To put it differently - do want a port object for each port of
> >> >> >> >the ASIC or do we want a port object for each netdev..
> >> >> >>
> >> >> >> Perhaps "port" name of the object is misleading. From the
> >> >> >> beginning, I ment to have it for both switch ports and host ports.
> >> >> >> I admit that "host port" is a bit misleading, as it is not
> >> >> >> really a port of eswitch, but the counter part. But if we
> >> >> >> introduce another object for that purpose in devlink (like
> >> >> >> "partititon"), it would be a lot of duplication
> >> >> I think.
> >> >> >>
> >> >> >> Question is, do we need the "host port"? Can't we just put a
> >> >> >> relation to host netdev in the eswitch port.
> >> >> >>
> >> >> >Can you please explain how does it work for rdma for non sriov
> >> >> >use
> >> case?
> >> >> >Do we have to create a fake eswitch object?
> >> >>
> >> >> Could you please provide details on "rdma for non sriov use case"?
> >> >>
> >> >There are multiple mdevs on PFs that happen to have link layer as IB
> >> >and
> >> those devlink instances have port that deserved to be configured same
> >> way as that of Eth.
> >>
> >> Could you please describe it a bit more. There is still an eswitch
> >> through which the traffic is going, isn't it?
> >Yes, there is an eswitch but it doesn't have switch side of vports.
> 
> Why? They should have.
> 
It doesn't have.

> 
> >It is equivalent to legacy mode.
> >I hope you are not thinking to create fake eswitch vports. :-)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-22 13:32                                                                                                   ` Jiri Pirko
@ 2019-03-23  0:40                                                                                                     ` Parav Pandit
  2019-03-25 20:34                                                                                                       ` Parav Pandit
  0 siblings, 1 reply; 100+ messages in thread
From: Parav Pandit @ 2019-03-23  0:40 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Friday, March 22, 2019 8:33 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> Thu, Mar 21, 2019 at 06:42:55PM CET, parav@mellanox.com wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Sent: Thursday, March 21, 2019 12:24 PM
> >> To: Parav Pandit <parav@mellanox.com>
> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> devlink PCI ports
> >>
> >> Thu, Mar 21, 2019 at 05:50:37PM CET, parav@mellanox.com wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Jiri Pirko <jiri@resnulli.us>
> >> >> Sent: Thursday, March 21, 2019 11:16 AM
> >> >> To: Parav Pandit <parav@mellanox.com>
> >> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala,
> >> >> Sridhar <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> devlink PCI ports
> >> >>
> >> >> Thu, Mar 21, 2019 at 04:03:58PM CET, parav@mellanox.com wrote:
> >> >> >Hi Jiri,
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> >> >> >> Sent: Thursday, March 21, 2019 4:08 AM
> >> >> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> >> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> >> >> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> >> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> >> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> >> >> >> devlink PCI ports
> >> >> >>
> >> >> >> Wed, Mar 20, 2019 at 09:22:57PM CET,
> >> >> >> jakub.kicinski@netronome.com
> >> >> >> wrote:
> >> >> >> >On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
> >> >> >> >> Hi Jiri, Jakub, Samudrala Sridhar,
> >> >> >> >> > > > > > And physical port in include/uapi/linux/devlink.h
> >> >> >> >> > > > > > also describe that.
> >> >> >> >> > > > >
> >> >> >> >> > > > > By "that" you must mean that the physical is a user
> >> >> >> >> > > > > facing
> >> port.
> >> >> >> >> > > >
> >> >> >> >> > > > Can you please describe the difference between 'PF port'
> >> >> >> >> > > > and 'physical port of include/uapi/linux/devlink.h'? I
> >> >> >> >> > > > must have missed this crisp definition in discussion
> >> >> >> >> > > > between you and Jiri. I am in meantime checking the
> thread.
> >> >> >> >> > >
> >> >> >> >> > > Perhaps start with the cover letter which includes an
> >> >> >> >> > > ASCII
> >> >> drawing?
> >> >> >> >> > >
> >> >> >> >> > > Using Mellanox nomenclature - PF port is a "representor"
> >> >> >> >> > > for the PF which may be on another Host (SmartNIC or
> >> multihost).
> >> >> >> >> > > It's pretty much the same thing as a VF port/"representor".
> >> >> >> >> > >
> >> >> >> >> > Yes. We are aligned here. :-) I see your point, where in
> >> >> >> >> > multi-host scenario, a physical port may be 1, but PF
> >> >> >> >> > ports are 4, because of 4 PFs for 4 hosts.
> >> >> >> >> > (just an example of 4 hosts with their own mac address
> >> >> >> >> > sharing 1 physical port).
> >> >> >> >> >
> >> >> >> >> > When there is no multihost and one to one mapping between
> >> >> >> >> > a PF and physical links, there is some overlap between PF
> >> >> >> >> > port and physical port attributes.
> >> >> >> >> > I believe, such overlap is fine as long as we have unique
> >> >> >> >> > indices for the
> >> >> >> ports.
> >> >> >> >> >
> >> >> >> >> > So I am ok to have flavours as
> >> >> physical/cpu/dsa/pf/vf/mdev/switchport.
> >> >> >> >> > (last 4 as new port flavours).
> >> >> >> >> >
> >> >> >> >> > > Physical port is the hole on the panel of the adapter
> >> >> >> >> > > where cable
> >> >> >> goes.
> >> >> >> >>
> >> >> >> >> So my take away from above discussion are:
> >> >> >> >> 1. Following new port flavours should be added
> >> >> >> pci_pf/pci_vf/mdev/switchport.
> >> >> >> >> a. Switchport indicates port on the eswitch. Normally this
> >> >> >> >> port has rep-
> >> >> >> netdev attached to it.
> >> >> >> >
> >> >> >> >I don't understand the "switchport".  Surely physical ports
> >> >> >> >are also attached to the eswitch?  And one of the main purpose
> >> >> >> >of adding the pci_pf/pci_vf flavours was to generate
> >> >> >> >phys_port_name for the port netdevs.
> >> >> >> >
> >> >> >> >Please don't use the term representor if possible.
> >> >> >> >Representor for most developers describes the way the netdev
> >> >> >> >is implemented in the driver, so for Mellanox and Netronome
> >> >> >> >different ports will be representors and non-representors.
> >> >> >> >That's why I prefer port netdev (attached to eswitch, has
> >> >> >> >switch_id) and host netdev (PF/VF netdev, vNIC, VSI, etc).
> >> >> >> >
> >> >> >> >> b. host side port flavours are pci_pf/pci_vf/mdev which may
> >> >> >> >> be connected to switchport
> >> >> >> >
> >> >> >> >See above, pci_pf/pci_vf are needed for phys_port_name
> generation.
> >> >> >>
> >> >> >> Yep, that makes sense.
> >> >> >>
> >> >> >>
> >> >> >> >
> >> >> >> >> 2. host side port flavours are not limited to Ethernet, as
> >> >> >> >> it is for devlink's
> >> >> >> port instance.
> >> >> >> >>
> >> >> >> >> 3. Each port is continue to be accessed using unique port index.
> >> >> >> >>
> >> >> >> >> 4. host side ports and switchport are control objects.
> >> >> >> >> a. switch side ports reside where current eswitch object of
> >> >> >> >> devlink instance reside b. for a given VF/PF/mdev such host
> >> >> >> >> side ports may be in hypervisor or VM or both depending on
> >> >> >> >> the privilege
> >> >> >> >>
> >> >> >> >> 5. eth.mac_address, rdma.port_guid can be programmed at host
> >> >> >> >> port flavours by extending as $ devlink port param set...
> >> >> >> >> (similar to devlink dev param set)
> >> >> >> >
> >> >> >> >You can keep restating that's your position, but I have *not*
> >> >> >> >conceded to that.
> >> >> >>
> >> >> >> I'm also not convinced that host dummy ports are good idea to
> >> >> >> hold
> >> >> these.
> >> >> >>
> >> >> >>
> >> >> >I didn't understand what do you mean my dummy port.
> >> >>
> >> >> It's a port for a VF host port which is not actually in the host
> >> >> but in the
> >> vm.
> >> >> Very confusing.
> >> >>
> >> >It is the vf_ctrl flavour. I don't see it any different than rep-netdev.
> >> >rep-netdev is not that confusing to us that represent eswitch vport.
> >> >Why vf_ctrl flavour port that represents otherside of the pipe as
> >> >you have
> >> shown in example?
> >> >Why it that confusing?
> >>
> >> Because sometimes it is there only once (PF), sometimes twice (VF) -
> >> and one of these is kind-of zombie.
> >>
> >I gave the example in email that contains description yesterday.
> >You didn't respond to it.
> >So repeating here.
> >Can you please point what looks like zombie below?
> >
> >$ devlink port show
> >pci/0000:05:00.0/0 eth netdev repndev_pf0_p0 flavour physical switch_id
> >00154d130d2f
> >pci/0000:05:00.0/1 eth netdev repndev_pf0_p1 flavour physical switch_id
> >00154d130d2f
> >pci/0000:05:00.0/10001 eth netdev repndev_pf0_vf_1 flavour switchport
> >switch_id 00154d130d2f peer pci/0000:05:00.0/1
> >pci/0000:05:00.0/10002 eth netdev repndev_pf0_p0_mdev_8000 flavour
> >switchport switch_id 00154d130d2f peer mdev/uuidX/0
> >
> >pci/0000:05:00.0/1 eth netdev flavour vf_ctrl vf 1
> 
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this one.
> You are missing an actual VF instance.
>
VF instance is in VM. It is not visible here in Hypervisor. But if you prefer to see it,
It looks like below.
pci/0000:05:01.0/0 eth netdev eth0 flavour vf

 
> >mdev/uuidX/0 eth netdev flavour mdev_ctrl
> Why "ctrl"?
>
I suffixed it with ctrl to indicate you that it is used for control functionality.
Again, I described in previous email to Jakub' response in lot detail.
 
> >
> >> >
> >> >
> >> >> >Can you explain what is wrong in programming host port params
> >> >> >using
> >> >> host_port object?
> >> >> >Few questions are unanswered in my past 2 or 3 emails.
> >> >> >Can you please go through them?
> >> >> >Can you point to some example switch API where you program host
> >> >> >params
> >> >> at switch?
> >> >> >
> >> >> >> >
> >> >> >> >> 6. more host port params can be added in future when user
> >> >> >> >> need arise
> >> >> >> >>
> >> >> >> >> 7. rep-netdev continue to be eswitch (switchport)
> >> >> >> >> representor at the
> >> >> >> switch side.
> >> >> >> >> a. Hence rep-netdev cannot be used for programming host
> >> >> >> >> port's
> >> >> >> parameters.
> >> >> >> >>
> >> >> >> >> 8. eswitch devlink instance knows when VF/PF/mdev's
> >> >> >> >> switchport are
> >> >> >> created/removed.
> >> >> >> >> Hence, those will be created/deleted by eswitch.
> >> >> >> >> Similarly for host port flavours too.
> >> >> >> >>
> >> >> >> >> Does it look fine? Did I miss something?
> >> >> >> >> We would like to progress on incremental patches for item-4
> >> >> >> >> and any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports
  2019-03-23  0:40                                                                                                     ` Parav Pandit
@ 2019-03-25 20:34                                                                                                       ` Parav Pandit
  0 siblings, 0 replies; 100+ messages in thread
From: Parav Pandit @ 2019-03-25 20:34 UTC (permalink / raw)
  To: Parav Pandit, Jiri Pirko
  Cc: Jakub Kicinski, Samudrala, Sridhar, davem, netdev, oss-drivers



> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Parav Pandit
> Sent: Friday, March 22, 2019 7:40 PM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> netdev@vger.kernel.org; oss-drivers@netronome.com
> Subject: RE: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
> ports
> 
> 
> 
> > -----Original Message-----
> > From: Jiri Pirko <jiri@resnulli.us>
> > Sent: Friday, March 22, 2019 8:33 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala, Sridhar
> > <sridhar.samudrala@intel.com>; davem@davemloft.net;
> > netdev@vger.kernel.org; oss-drivers@netronome.com
> > Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > devlink PCI ports
> >
> > Thu, Mar 21, 2019 at 06:42:55PM CET, parav@mellanox.com wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Jiri Pirko <jiri@resnulli.us>
> > >> Sent: Thursday, March 21, 2019 12:24 PM
> > >> To: Parav Pandit <parav@mellanox.com>
> > >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala,
> > >> Sridhar <sridhar.samudrala@intel.com>; davem@davemloft.net;
> > >> netdev@vger.kernel.org; oss-drivers@netronome.com
> > >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > >> devlink PCI ports
> > >>
> > >> Thu, Mar 21, 2019 at 05:50:37PM CET, parav@mellanox.com wrote:
> > >> >
> > >> >
> > >> >> -----Original Message-----
> > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > >> >> Sent: Thursday, March 21, 2019 11:16 AM
> > >> >> To: Parav Pandit <parav@mellanox.com>
> > >> >> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; Samudrala,
> > >> >> Sridhar <sridhar.samudrala@intel.com>; davem@davemloft.net;
> > >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> > >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
> > >> >> devlink PCI ports
> > >> >>
> > >> >> Thu, Mar 21, 2019 at 04:03:58PM CET, parav@mellanox.com wrote:
> > >> >> >Hi Jiri,
> > >> >> >
> > >> >> >> -----Original Message-----
> > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > >> >> >> Sent: Thursday, March 21, 2019 4:08 AM
> > >> >> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> > >> >> >> Cc: Parav Pandit <parav@mellanox.com>; Samudrala, Sridhar
> > >> >> >> <sridhar.samudrala@intel.com>; davem@davemloft.net;
> > >> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com
> > >> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports
> > >> >> >> on devlink PCI ports
> > >> >> >>
> > >> >> >> Wed, Mar 20, 2019 at 09:22:57PM CET,
> > >> >> >> jakub.kicinski@netronome.com
> > >> >> >> wrote:
> > >> >> >> >On Wed, 20 Mar 2019 18:24:15 +0000, Parav Pandit wrote:
> > >> >> >> >> Hi Jiri, Jakub, Samudrala Sridhar,
> > >> >> >> >> > > > > > And physical port in
> > >> >> >> >> > > > > > include/uapi/linux/devlink.h also describe that.
> > >> >> >> >> > > > >
> > >> >> >> >> > > > > By "that" you must mean that the physical is a
> > >> >> >> >> > > > > user facing
> > >> port.
> > >> >> >> >> > > >
> > >> >> >> >> > > > Can you please describe the difference between 'PF port'
> > >> >> >> >> > > > and 'physical port of include/uapi/linux/devlink.h'?
> > >> >> >> >> > > > I must have missed this crisp definition in
> > >> >> >> >> > > > discussion between you and Jiri. I am in meantime
> > >> >> >> >> > > > checking the
> > thread.
> > >> >> >> >> > >
> > >> >> >> >> > > Perhaps start with the cover letter which includes an
> > >> >> >> >> > > ASCII
> > >> >> drawing?
> > >> >> >> >> > >
> > >> >> >> >> > > Using Mellanox nomenclature - PF port is a "representor"
> > >> >> >> >> > > for the PF which may be on another Host (SmartNIC or
> > >> multihost).
> > >> >> >> >> > > It's pretty much the same thing as a VF port/"representor".
> > >> >> >> >> > >
> > >> >> >> >> > Yes. We are aligned here. :-) I see your point, where in
> > >> >> >> >> > multi-host scenario, a physical port may be 1, but PF
> > >> >> >> >> > ports are 4, because of 4 PFs for 4 hosts.
> > >> >> >> >> > (just an example of 4 hosts with their own mac address
> > >> >> >> >> > sharing 1 physical port).
> > >> >> >> >> >
> > >> >> >> >> > When there is no multihost and one to one mapping
> > >> >> >> >> > between a PF and physical links, there is some overlap
> > >> >> >> >> > between PF port and physical port attributes.
> > >> >> >> >> > I believe, such overlap is fine as long as we have
> > >> >> >> >> > unique indices for the
> > >> >> >> ports.
> > >> >> >> >> >
> > >> >> >> >> > So I am ok to have flavours as
> > >> >> physical/cpu/dsa/pf/vf/mdev/switchport.
> > >> >> >> >> > (last 4 as new port flavours).
> > >> >> >> >> >
> > >> >> >> >> > > Physical port is the hole on the panel of the adapter
> > >> >> >> >> > > where cable
> > >> >> >> goes.
> > >> >> >> >>
> > >> >> >> >> So my take away from above discussion are:
> > >> >> >> >> 1. Following new port flavours should be added
> > >> >> >> pci_pf/pci_vf/mdev/switchport.
> > >> >> >> >> a. Switchport indicates port on the eswitch. Normally this
> > >> >> >> >> port has rep-
> > >> >> >> netdev attached to it.
> > >> >> >> >
> > >> >> >> >I don't understand the "switchport".  Surely physical ports
> > >> >> >> >are also attached to the eswitch?  And one of the main
> > >> >> >> >purpose of adding the pci_pf/pci_vf flavours was to generate
> > >> >> >> >phys_port_name for the port netdevs.
> > >> >> >> >
> > >> >> >> >Please don't use the term representor if possible.
> > >> >> >> >Representor for most developers describes the way the netdev
> > >> >> >> >is implemented in the driver, so for Mellanox and Netronome
> > >> >> >> >different ports will be representors and non-representors.
> > >> >> >> >That's why I prefer port netdev (attached to eswitch, has
> > >> >> >> >switch_id) and host netdev (PF/VF netdev, vNIC, VSI, etc).
> > >> >> >> >
> > >> >> >> >> b. host side port flavours are pci_pf/pci_vf/mdev which
> > >> >> >> >> may be connected to switchport
> > >> >> >> >
> > >> >> >> >See above, pci_pf/pci_vf are needed for phys_port_name
> > generation.
> > >> >> >>
> > >> >> >> Yep, that makes sense.
> > >> >> >>
> > >> >> >>
> > >> >> >> >
> > >> >> >> >> 2. host side port flavours are not limited to Ethernet, as
> > >> >> >> >> it is for devlink's
> > >> >> >> port instance.
> > >> >> >> >>
> > >> >> >> >> 3. Each port is continue to be accessed using unique port
> index.
> > >> >> >> >>
> > >> >> >> >> 4. host side ports and switchport are control objects.
> > >> >> >> >> a. switch side ports reside where current eswitch object
> > >> >> >> >> of devlink instance reside b. for a given VF/PF/mdev such
> > >> >> >> >> host side ports may be in hypervisor or VM or both
> > >> >> >> >> depending on the privilege
> > >> >> >> >>
> > >> >> >> >> 5. eth.mac_address, rdma.port_guid can be programmed at
> > >> >> >> >> host port flavours by extending as $ devlink port param set...
> > >> >> >> >> (similar to devlink dev param set)
> > >> >> >> >
> > >> >> >> >You can keep restating that's your position, but I have
> > >> >> >> >*not* conceded to that.
> > >> >> >>
> > >> >> >> I'm also not convinced that host dummy ports are good idea to
> > >> >> >> hold
> > >> >> these.
> > >> >> >>
> > >> >> >>
> > >> >> >I didn't understand what do you mean my dummy port.
> > >> >>
> > >> >> It's a port for a VF host port which is not actually in the host
> > >> >> but in the
> > >> vm.
> > >> >> Very confusing.
> > >> >>
> > >> >It is the vf_ctrl flavour. I don't see it any different than rep-netdev.
> > >> >rep-netdev is not that confusing to us that represent eswitch vport.
> > >> >Why vf_ctrl flavour port that represents otherside of the pipe as
> > >> >you have
> > >> shown in example?
> > >> >Why it that confusing?
> > >>
> > >> Because sometimes it is there only once (PF), sometimes twice (VF)
> > >> - and one of these is kind-of zombie.
> > >>
> > >I gave the example in email that contains description yesterday.
> > >You didn't respond to it.
> > >So repeating here.
> > >Can you please point what looks like zombie below?
> > >
> > >$ devlink port show
> > >pci/0000:05:00.0/0 eth netdev repndev_pf0_p0 flavour physical
> > >switch_id 00154d130d2f
> > >pci/0000:05:00.0/1 eth netdev repndev_pf0_p1 flavour physical
> > >switch_id 00154d130d2f
> > >pci/0000:05:00.0/10001 eth netdev repndev_pf0_vf_1 flavour switchport
> > >switch_id 00154d130d2f peer pci/0000:05:00.0/1
> > >pci/0000:05:00.0/10002 eth netdev repndev_pf0_p0_mdev_8000 flavour
> > >switchport switch_id 00154d130d2f peer mdev/uuidX/0
> > >
> > >pci/0000:05:00.0/1 eth netdev flavour vf_ctrl vf 1
> >
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this one.
> > You are missing an actual VF instance.
> >
> VF instance is in VM. It is not visible here in Hypervisor. But if you prefer to
> see it, It looks like below.
> pci/0000:05:01.0/0 eth netdev eth0 flavour vf
> 
> 
> > >mdev/uuidX/0 eth netdev flavour mdev_ctrl
> > Why "ctrl"?
> >
> I suffixed it with ctrl to indicate you that it is used for control functionality.
> Again, I described in previous email to Jakub' response in lot detail.
> 

So we had offline discussion. Jiri and Jakub prefers to program hostport's parameters via 'peer' way.
This would require creating unmanaged switch port for rdma.

We concluded to expose host side property via below indirect way on eswitch side port.

pci/0000:05:00.0/1 type eth netdev repndev_pf0_p1 flavour physical switch_id 00154d130d2f
pci/0000:05:00.0/2 type eth netdev repndev_pf0_vf_1 flavour eswitch switch_id 00154d130d2f vf 1 pf 0
pci/0000:05:00.0/4 type eth netdev repndev_pf0_sp_3 flavour eswitch switch_id 00154d130d2f mdev/uuidA/0

                                +---+      +---+
                              vf|   |      |   | mdev
                                +-+-+      +-+-+
physical link <---------+         |          |
                        |         |          |
                        |         |          |
                      +-+-+     +-+-+      +-+-+
                      | 1 |     | 2 |      | 3 |
                   +--+---+-----+---+------+---+--+
                   |  physical   vf         pfsub |
                   |  port       port       port  |
                   |                              |
                   |             eswitch          |
                   |                              |
                   +------------------------------+

Host port parameters such as ether.mac_addr, rdma.node_port_guid and more port internal parameters to be programmed via peer mode.
These ports are created by the driver code and not by a user.

This is very unusual way to program host params via switch.
No solid example provided to support devlink model...
Anyways, I dislike but I agree to Jiri and Jakob suggestion. :-)
Let's move forward this way. I will let future speak for the design choices made...

> > >
> > >> >
> > >> >
> > >> >> >Can you explain what is wrong in programming host port params
> > >> >> >using
> > >> >> host_port object?
> > >> >> >Few questions are unanswered in my past 2 or 3 emails.
> > >> >> >Can you please go through them?
> > >> >> >Can you point to some example switch API where you program host
> > >> >> >params
> > >> >> at switch?
> > >> >> >
> > >> >> >> >
> > >> >> >> >> 6. more host port params can be added in future when user
> > >> >> >> >> need arise
> > >> >> >> >>
> > >> >> >> >> 7. rep-netdev continue to be eswitch (switchport)
> > >> >> >> >> representor at the
> > >> >> >> switch side.
> > >> >> >> >> a. Hence rep-netdev cannot be used for programming host
> > >> >> >> >> port's
> > >> >> >> parameters.
> > >> >> >> >>
> > >> >> >> >> 8. eswitch devlink instance knows when VF/PF/mdev's
> > >> >> >> >> switchport are
> > >> >> >> created/removed.
> > >> >> >> >> Hence, those will be created/deleted by eswitch.
> > >> >> >> >> Similarly for host port flavours too.
> > >> >> >> >>
> > >> >> >> >> Does it look fine? Did I miss something?
> > >> >> >> >> We would like to progress on incremental patches for
> > >> >> >> >> item-4 and any prep work needed to reach to item-4.

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2019-03-25 20:34 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-01 18:04 [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jakub Kicinski
2019-03-01 18:04 ` [PATCH net-next v2 1/7] nfp: split devlink port init from registration Jakub Kicinski
2019-03-01 18:04 ` [PATCH net-next v2 2/7] devlink: add PF and VF port flavours Jakub Kicinski
2019-03-01 18:04 ` [PATCH net-next v2 3/7] nfp: register devlink ports of all reprs Jakub Kicinski
2019-03-02  8:43   ` Jiri Pirko
2019-03-02 19:07     ` Jakub Kicinski
2019-03-04  7:36       ` Jiri Pirko
2019-03-04 23:32         ` Jakub Kicinski
2019-03-01 18:04 ` [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports Jakub Kicinski
2019-03-02  9:41   ` Jiri Pirko
2019-03-02 19:48     ` Jakub Kicinski
2019-03-04  7:56       ` Jiri Pirko
2019-03-05  0:33         ` Jakub Kicinski
2019-03-05 11:06           ` Jiri Pirko
2019-03-05 17:15             ` Jakub Kicinski
2019-03-05 19:59               ` Parav Pandit
2019-03-06 12:20               ` Jiri Pirko
2019-03-06 17:56                 ` Jakub Kicinski
2019-03-07  3:56                   ` Parav Pandit
2019-03-07  9:48                   ` Jiri Pirko
2019-03-08  2:52                     ` Jakub Kicinski
2019-03-08 14:54                       ` Jiri Pirko
2019-03-08 19:09                         ` Jakub Kicinski
2019-03-11  8:52                           ` Jiri Pirko
2019-03-12  2:10                             ` Jakub Kicinski
2019-03-12 14:02                               ` Jiri Pirko
2019-03-12 20:56                                 ` Jakub Kicinski
2019-03-13  6:07                                   ` Jiri Pirko
2019-03-13 16:17                                     ` Jakub Kicinski
2019-03-13 16:22                                       ` Jiri Pirko
2019-03-13 16:55                                         ` Jakub Kicinski
2019-03-14  7:38                                           ` Jiri Pirko
2019-03-14 22:09                                             ` Jakub Kicinski
2019-03-14 22:35                                               ` Parav Pandit
2019-03-14 23:39                                                 ` Jakub Kicinski
2019-03-15  1:28                                                   ` Parav Pandit
2019-03-15  1:31                                                     ` Parav Pandit
2019-03-15  2:15                                                     ` Samudrala, Sridhar
2019-03-15  2:40                                                       ` Parav Pandit
     [not found]                                                         ` <ae938b4f-5fa9-3c33-8ae6-eab2d3d9f1ec@intel.com>
2019-03-15 15:32                                                           ` Parav Pandit
2019-03-15 20:08                                                             ` Jiri Pirko
2019-03-15 20:44                                                               ` Jakub Kicinski
2019-03-15 22:12                                                                 ` Parav Pandit
2019-03-16  1:16                                                                   ` Jakub Kicinski
2019-03-18 15:43                                                                     ` Parav Pandit
2019-03-18 19:29                                                                       ` Jakub Kicinski
2019-03-18 12:11                                                                 ` Jiri Pirko
2019-03-18 19:16                                                                   ` Jakub Kicinski
2019-03-21  8:45                                                                     ` Jiri Pirko
2019-03-21 15:14                                                                       ` Parav Pandit
2019-03-21 16:14                                                                         ` Jiri Pirko
2019-03-21 16:52                                                                           ` Parav Pandit
2019-03-21 17:20                                                                             ` Jiri Pirko
2019-03-21 17:34                                                                               ` Parav Pandit
2019-03-22 16:27                                                                                 ` Jiri Pirko
2019-03-23  0:37                                                                                   ` Parav Pandit
2019-03-15 21:59                                                               ` Parav Pandit
2019-03-18 12:21                                                                 ` Jiri Pirko
2019-03-18 15:56                                                                   ` Parav Pandit
2019-03-18 16:22                                                                     ` Parav Pandit
2019-03-18 19:36                                                                       ` Jakub Kicinski
2019-03-18 19:44                                                                         ` Parav Pandit
2019-03-18 19:59                                                                           ` Jakub Kicinski
2019-03-18 20:35                                                                             ` Parav Pandit
2019-03-18 21:29                                                                               ` Jakub Kicinski
2019-03-18 22:11                                                                                 ` Parav Pandit
2019-03-20 18:24                                                                                   ` Parav Pandit
2019-03-20 20:22                                                                                     ` Jakub Kicinski
2019-03-20 23:39                                                                                       ` Parav Pandit
2019-03-21  9:08                                                                                       ` Jiri Pirko
2019-03-21 15:03                                                                                         ` Parav Pandit
2019-03-21 16:16                                                                                           ` Jiri Pirko
2019-03-21 16:50                                                                                             ` Parav Pandit
2019-03-21 17:23                                                                                               ` Jiri Pirko
2019-03-21 17:42                                                                                                 ` Parav Pandit
2019-03-22 13:32                                                                                                   ` Jiri Pirko
2019-03-23  0:40                                                                                                     ` Parav Pandit
2019-03-25 20:34                                                                                                       ` Parav Pandit
2019-03-18 19:19                                                                   ` Jakub Kicinski
2019-03-18 19:38                                                                     ` Parav Pandit
2019-03-21  9:09                                                                     ` Jiri Pirko
2019-03-15  7:00                                               ` Jiri Pirko
     [not found]                                 ` <7227d58e-ac58-d549-b921-ca0a0dd3f4b0@intel.com>
2019-03-13  7:37                                   ` Jiri Pirko
2019-03-13 16:03                                     ` Samudrala, Sridhar
2019-03-13 16:24                                       ` Jiri Pirko
2019-03-04 11:19       ` Jiri Pirko
2019-03-05  0:40         ` Jakub Kicinski
2019-03-05 11:07           ` Jiri Pirko
2019-03-04 11:08   ` Jiri Pirko
2019-03-05  0:51     ` Jakub Kicinski
2019-03-05 11:09       ` Jiri Pirko
2019-03-01 18:04 ` [PATCH net-next v2 5/7] nfp: switch to devlink_port_get_phys_port_name() Jakub Kicinski
2019-03-01 18:04 ` [PATCH net-next v2 6/7] devlink: introduce port's peer netdevs Jakub Kicinski
2019-03-01 18:04 ` [PATCH net-next v2 7/7] nfp: expose PF " Jakub Kicinski
2019-03-02 10:13 ` [PATCH net-next v2 0/7] devlink: expose PF and VF representors as ports Jiri Pirko
2019-03-02 19:49   ` [oss-drivers] " Jakub Kicinski
2019-03-04  5:12   ` Parav Pandit
2019-03-04 18:22 ` David Miller
2019-03-20 20:25 ` Jakub Kicinski
2019-03-21  9:11   ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.