netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] devlink: add PF and VF port flavours
@ 2019-02-26 18:24 Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 1/8] nfp: split devlink port init from registration Jakub Kicinski
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

Hi!

This series brings port flavours for SR-IOV NICs.

First patch reshuffles nfp port registration to make the
port info available at netdev registration time. Next
the VF and PF port flavours are added to the core and
the nfp driver. The existing port information (split)
is now physical port-specific.

The currently deployed NFPs have a single PF per PCIe
link which means multiport cards have multiple logical
ports (netdevs) per function. This forces us to add
a notion of a PCIe subport (implemented in patches
4 and 5).

With that in place nfp driver can now switch to using
the devlink_port_get_phys_port_name() helper (patch 6).

Next two patches add a notion of a peer netdev, this
is the netdev "on the other side of the link". It can
be used to inform users which PF and VF corresponds
to which port/"representor" without the need for bus
info parsing.

Last but not least - I fix some kdoc warnings :)

Jakub Kicinski (8):
  nfp: split devlink port init from registration
  devlink: add PF and VF port flavours
  nfp: register devlink ports of all reprs
  devlink: allow subports on devlink PCI ports
  nfp: switch to devlink_port_get_phys_port_name()
  devlink: introduce port's peer netdevs
  nfp: expose PF peer netdevs
  devlink: fix kdoc

 drivers/net/ethernet/netronome/nfp/abm/main.c |   1 +
 .../net/ethernet/netronome/nfp/flower/main.c  |   1 +
 .../net/ethernet/netronome/nfp/nfp_devlink.c  |  60 ++++++-
 .../net/ethernet/netronome/nfp/nfp_net_main.c |  17 +-
 .../net/ethernet/netronome/nfp/nfp_net_repr.c |  16 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.c |  33 +---
 drivers/net/ethernet/netronome/nfp/nfp_port.h |   4 +
 include/net/devlink.h                         |  41 ++++-
 include/uapi/linux/devlink.h                  |   9 ++
 net/core/devlink.c                            | 153 +++++++++++++++---
 10 files changed, 268 insertions(+), 67 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH net-next 1/8] nfp: split devlink port init from registration
  2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
@ 2019-02-26 18:24 ` Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 2/8] devlink: add PF and VF port flavours Jakub Kicinski
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

In the future we want to switch to using the
devlink_port_get_phys_port_name() helper for .ndo_phys_port_name.
This requires that we initialize the devlink ports before the
netdevs are registered. Split the registration from init.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c    | 13 ++++++++++---
 .../net/ethernet/netronome/nfp/nfp_net_main.c   | 17 ++++++++++++++---
 drivers/net/ethernet/netronome/nfp/nfp_port.h   |  2 ++
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index e9eca99cf493..9af3cb1f2f17 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -350,10 +350,9 @@ const struct devlink_ops nfp_devlink_ops = {
 	.flash_update		= nfp_devlink_flash_update,
 };
 
-int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
 {
 	struct nfp_eth_table_port eth_port;
-	struct devlink *devlink;
 	int ret;
 
 	rtnl_lock();
@@ -366,8 +365,16 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
 	devlink_port_attrs_set(&port->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
 			       eth_port.label_port, eth_port.is_split,
 			       eth_port.label_subport);
+	return 0;
+}
+
+void nfp_devlink_port_clean(struct nfp_port *port)
+{
+}
 
-	devlink = priv_to_devlink(app->pf);
+int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
+{
+	struct devlink *devlink = priv_to_devlink(app->pf);
 
 	return devlink_port_register(devlink, &port->dl_port, port->eth_id);
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 08f5fdbd8e41..39e87bb73ebf 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -150,9 +150,15 @@ nfp_net_pf_init_vnic(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id)
 
 	nn->id = id;
 
+	if (nn->port) {
+		err = nfp_devlink_port_init(pf->app, nn->port);
+		if (err)
+			return err;
+	}
+
 	err = nfp_net_init(nn);
 	if (err)
-		return err;
+		goto err_port_clean;
 
 	nfp_net_debugfs_vnic_add(nn, pf->ddir);
 
@@ -167,17 +173,20 @@ nfp_net_pf_init_vnic(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id)
 	if (nfp_net_is_data_vnic(nn)) {
 		err = nfp_app_vnic_init(pf->app, nn);
 		if (err)
-			goto err_devlink_port_clean;
+			goto err_port_unreg;
 	}
 
 	return 0;
 
-err_devlink_port_clean:
+err_port_unreg:
 	if (nn->port)
 		nfp_devlink_port_unregister(nn->port);
 err_dfs_clean:
 	nfp_net_debugfs_dir_clean(&nn->debugfs_dir);
 	nfp_net_clean(nn);
+err_port_clean:
+	if (nn->port)
+		nfp_devlink_port_clean(nn->port);
 	return err;
 }
 
@@ -224,6 +233,8 @@ static void nfp_net_pf_clean_vnic(struct nfp_pf *pf, struct nfp_net *nn)
 		nfp_devlink_port_unregister(nn->port);
 	nfp_net_debugfs_dir_clean(&nn->debugfs_dir);
 	nfp_net_clean(nn);
+	if (nn->port)
+		nfp_devlink_port_clean(nn->port);
 }
 
 static int nfp_net_pf_alloc_irqs(struct nfp_pf *pf)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index 90ae053f5c07..09c55ca2371a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -129,6 +129,8 @@ int nfp_net_refresh_eth_port(struct nfp_port *port);
 void nfp_net_refresh_port_table(struct nfp_port *port);
 int nfp_net_refresh_port_table_sync(struct nfp_pf *pf);
 
+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port);
+void nfp_devlink_port_clean(struct nfp_port *port);
 int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port);
 void nfp_devlink_port_unregister(struct nfp_port *port);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 1/8] nfp: split devlink port init from registration Jakub Kicinski
@ 2019-02-26 18:24 ` Jakub Kicinski
  2019-02-27 12:16   ` Jiri Pirko
  2019-02-27 12:23   ` Jiri Pirko
  2019-02-26 18:24 ` [PATCH net-next 3/8] nfp: register devlink ports of all reprs Jakub Kicinski
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

Current port flavours cover simple switches and DSA.  Add PF
and VF flavours to cover "switchdev" SR-IOV NICs.

Example devlink user space output:

$ devlink port
pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1

$ devlink -jp port
{
    "port": {
        "pci/0000:82:00.0/0": {
            "type": "eth",
            "netdev": "p4p1",
            "flavour": "physical"
        },
        "pci/0000:82:00.0/10000": {
            "type": "eth",
            "netdev": "eth0",
            "flavour": "pci_pf",
            "pf": 0,
        },
        "pci/0000:82:00.0/10001": {
            "type": "eth",
            "netdev": "eth1",
            "flavour": "pci_vf",
            "pf": 0,
            "vf": 0
        },
        "pci/0000:82:00.0/10002": {
            "type": "eth",
            "netdev": "eth2",
            "flavour": "pci_vf",
            "pf": 0,
            "vf": 1
        }
    }
}

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h        | 25 ++++++++++--
 include/uapi/linux/devlink.h |  5 +++
 net/core/devlink.c           | 73 +++++++++++++++++++++++++++++++-----
 3 files changed, 91 insertions(+), 12 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7f5a0bdca228..b5376ef492f1 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -42,9 +42,19 @@ struct devlink {
 struct devlink_port_attrs {
 	bool set;
 	enum devlink_port_flavour flavour;
-	u32 port_number; /* same value as "split group" */
-	bool split;
-	u32 split_subport_number;
+	union { /* port identifiers differ per-flavour */
+		/* PHYSICAL, CPU, DSA */
+		struct {
+			bool split;
+			u32 split_subport_number;
+			u32 port_number; /* same value as "split group" */
+		};
+		 /* PCI_PF, PCI_VF */
+		struct {
+			u32 pf_number;
+			u32 vf_number;
+		} pci;
+	};
 };
 
 struct devlink_port {
@@ -568,6 +578,9 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    enum devlink_port_flavour flavour,
 			    u32 port_number, bool split,
 			    u32 split_subport_number);
+void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
+				enum devlink_port_flavour flavour,
+				u32 pf_number, u32 vf_number);
 int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 				    char *name, size_t len);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
@@ -782,6 +795,12 @@ static inline void devlink_port_attrs_set(struct devlink_port *devlink_port,
 {
 }
 
+static inline void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
+					      enum devlink_port_flavour flavour,
+					      u32 pf_number, u32 vf_number)
+{
+}
+
 static inline int
 devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 				char *name, size_t len)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 5bb4ea67d84f..9ce76d4f640d 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -167,6 +167,8 @@ enum devlink_port_flavour {
 	DEVLINK_PORT_FLAVOUR_DSA, /* Distributed switch architecture
 				   * interconnect port.
 				   */
+	DEVLINK_PORT_FLAVOUR_PCI_PF, /* PCI Physical function port */
+	DEVLINK_PORT_FLAVOUR_PCI_VF, /* PCI Physical function port */
 };
 
 enum devlink_param_cmode {
@@ -332,6 +334,9 @@ enum devlink_attr {
 	DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME,	/* string */
 	DEVLINK_ATTR_FLASH_UPDATE_COMPONENT,	/* string */
 
+	DEVLINK_ATTR_PORT_PCI_PF_NUMBER,	/* u32 */
+	DEVLINK_ATTR_PORT_PCI_VF_NUMBER,	/* u32 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index a49dee67e66f..af177284830b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -516,16 +516,35 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		return 0;
 	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour))
 		return -EMSGSIZE;
-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, attrs->port_number))
-		return -EMSGSIZE;
-	if (!attrs->split)
+
+	switch (attrs->flavour) {
+	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
+	case DEVLINK_PORT_FLAVOUR_CPU:
+	case DEVLINK_PORT_FLAVOUR_DSA:
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER,
+				attrs->port_number))
+			return -EMSGSIZE;
+
+		if (attrs->split &&
+		    (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP,
+				 attrs->port_number) ||
+		     nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER,
+				 attrs->split_subport_number)))
+			return -EMSGSIZE;
 		return 0;
-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP, attrs->port_number))
-		return -EMSGSIZE;
-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER,
-			attrs->split_subport_number))
-		return -EMSGSIZE;
-	return 0;
+	case DEVLINK_PORT_FLAVOUR_PCI_VF:
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
+				attrs->pci.vf_number))
+			return -EMSGSIZE;
+		/* fall through */
+	case DEVLINK_PORT_FLAVOUR_PCI_PF:
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
+				attrs->pci.pf_number))
+			return -EMSGSIZE;
+		return 0;
+	default:
+		return -EINVAL;
+	}
 }
 
 static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
@@ -5410,6 +5429,9 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 
+	WARN_ON(flavour == DEVLINK_PORT_FLAVOUR_PCI_PF ||
+		flavour == DEVLINK_PORT_FLAVOUR_PCI_VF);
+
 	attrs->set = true;
 	attrs->flavour = flavour;
 	attrs->port_number = port_number;
@@ -5419,6 +5441,32 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
 
+/**
+ *	devlink_port_attrs_pci_set - Set port attributes for a PCI port
+ *
+ *	@devlink_port: devlink port
+ *	@flavour: flavour of the port (PF or VF only)
+ *	@pf_number: PCI PF number, in multi-host mapping to hosts depends
+ *	            on the platform
+ *	@vf_number: PCI VF number within given PF (ignored for PF itself)
+ */
+void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
+				enum devlink_port_flavour flavour,
+				u32 pf_number, u32 vf_number)
+{
+	struct devlink_port_attrs *attrs = &devlink_port->attrs;
+
+	WARN_ON(flavour != DEVLINK_PORT_FLAVOUR_PCI_PF &&
+		flavour != DEVLINK_PORT_FLAVOUR_PCI_VF);
+
+	attrs->set = true;
+	attrs->flavour = flavour;
+	attrs->pci.pf_number = pf_number;
+	attrs->pci.vf_number = vf_number;
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+}
+EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_set);
+
 int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 				    char *name, size_t len)
 {
@@ -5443,6 +5491,13 @@ int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 		 */
 		WARN_ON(1);
 		return -EINVAL;
+	case DEVLINK_PORT_FLAVOUR_PCI_PF:
+		n = snprintf(name, len, "pf%u", attrs->pci.pf_number);
+		break;
+	case DEVLINK_PORT_FLAVOUR_PCI_VF:
+		n = snprintf(name, len, "pf%uvf%u",
+			     attrs->pf_number, attrs->pci.vf_number);
+		break;
 	}
 
 	if (n >= len)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 3/8] nfp: register devlink ports of all reprs
  2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 1/8] nfp: split devlink port init from registration Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 2/8] devlink: add PF and VF port flavours Jakub Kicinski
@ 2019-02-26 18:24 ` Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports Jakub Kicinski
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

Register all representors as devlink ports.

The port_index is slightly tricky to figure out, we use a bit of
arbitrary math to create unique IDs for PCI ports.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c  | 47 ++++++++++++++++++-
 .../net/ethernet/netronome/nfp/nfp_net_repr.c | 16 ++++++-
 2 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 9af3cb1f2f17..36976e37d162 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -350,7 +350,8 @@ const struct devlink_ops nfp_devlink_ops = {
 	.flash_update		= nfp_devlink_flash_update,
 };
 
-int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
+static int
+nfp_devlink_port_init_phys(struct devlink *devlink, struct nfp_port *port)
 {
 	struct nfp_eth_table_port eth_port;
 	int ret;
@@ -368,6 +369,34 @@ int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
 	return 0;
 }
 
+static int
+nfp_devlink_port_init_pci(struct devlink *devlink, struct nfp_port *port,
+			  u32 flavour)
+{
+	devlink_port_type_eth_set(&port->dl_port, port->netdev);
+	devlink_port_attrs_pci_set(&port->dl_port, flavour,
+				   port->pf_id, port->vf_id);
+	return 0;
+}
+
+int nfp_devlink_port_init(struct nfp_app *app, struct nfp_port *port)
+{
+	struct devlink *devlink = priv_to_devlink(app->pf);
+
+	switch (port->type) {
+	case NFP_PORT_PHYS_PORT:
+		return nfp_devlink_port_init_phys(devlink, port);
+	case NFP_PORT_PF_PORT:
+		return nfp_devlink_port_init_pci(devlink, port,
+						 DEVLINK_PORT_FLAVOUR_PCI_PF);
+	case NFP_PORT_VF_PORT:
+		return nfp_devlink_port_init_pci(devlink, port,
+						 DEVLINK_PORT_FLAVOUR_PCI_VF);
+	default:
+		return -EINVAL;
+	}
+}
+
 void nfp_devlink_port_clean(struct nfp_port *port)
 {
 }
@@ -376,7 +405,21 @@ int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
 {
 	struct devlink *devlink = priv_to_devlink(app->pf);
 
-	return devlink_port_register(devlink, &port->dl_port, port->eth_id);
+	switch (port->type) {
+	case NFP_PORT_PHYS_PORT:
+		return devlink_port_register(devlink, &port->dl_port,
+					     port->eth_id);
+	case NFP_PORT_PF_PORT:
+		return devlink_port_register(devlink, &port->dl_port,
+					     (port->pf_id + 1) * 10000 +
+					     port->pf_split_id * 1000);
+	case NFP_PORT_VF_PORT:
+		return devlink_port_register(devlink, &port->dl_port,
+					     (port->pf_id + 1) * 10000 +
+					     port->vf_id + 1);
+	default:
+		return -EINVAL;
+	}
 }
 
 void nfp_devlink_port_unregister(struct nfp_port *port)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index d2c803bb4e56..869d22760a6e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -292,7 +292,9 @@ nfp_repr_transfer_features(struct net_device *netdev, struct net_device *lower)
 
 static void nfp_repr_clean(struct nfp_repr *repr)
 {
+	nfp_devlink_port_unregister(repr->port);
 	unregister_netdev(repr->netdev);
+	nfp_devlink_port_clean(repr->port);
 	nfp_app_repr_clean(repr->app, repr->netdev);
 	dst_release((struct dst_entry *)repr->dst);
 	nfp_port_free(repr->port);
@@ -395,12 +397,24 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev,
 	if (err)
 		goto err_clean;
 
-	err = register_netdev(netdev);
+	err = nfp_devlink_port_init(app, repr->port);
 	if (err)
 		goto err_repr_clean;
 
+	err = register_netdev(netdev);
+	if (err)
+		goto err_port_clean;
+
+	err = nfp_devlink_port_register(app, repr->port);
+	if (err)
+		goto err_unreg_netdev;
+
 	return 0;
 
+err_unreg_netdev:
+	unregister_netdev(repr->netdev);
+err_port_clean:
+	nfp_devlink_port_clean(repr->port);
 err_repr_clean:
 	nfp_app_repr_clean(app, netdev);
 err_clean:
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
                   ` (2 preceding siblings ...)
  2019-02-26 18:24 ` [PATCH net-next 3/8] nfp: register devlink ports of all reprs Jakub Kicinski
@ 2019-02-26 18:24 ` Jakub Kicinski
  2019-02-27 12:37   ` Jiri Pirko
  2019-02-26 18:24 ` [PATCH net-next 5/8] nfp: switch to devlink_port_get_phys_port_name() Jakub Kicinski
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

PCI endpoint corresponds to a PCI device, but such device
can have one more more logical device ports associated with it.
We need a way to distinguish those. Add a PCI subport in the
dumps and print the info in phys_port_name appropriately.

This is not equivalent to port splitting, there is no split
group. It's just a way of representing multiple netdevs on
a single PCI function.

Note that the quality of being multiport pertains only to
the PCI function itself. A PF having multiple netdevs does
not mean that its VFs will also have multiple, or that VFs
are associated with any particular port of a multiport VF.

Example (bus 05 device has subports, bus 82 has only one port per
function):

$ devlink port
pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1
pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
pci/0000:82:00.0/10000: type eth netdev eth0 flavour pci_pf pf 0

$ devlink -jp port
{
    "port": {
        "pci/0000:05:00.0/0": {
            "type": "eth",
            "netdev": "enp5s0np0",
            "flavour": "physical"
        },
        "pci/0000:05:00.0/10000": {
            "type": "eth",
            "netdev": "enp5s0npf0s0",
            "flavour": "pci_pf",
            "pf": 0,
            "subport": 0
        },
        "pci/0000:05:00.0/4": {
            "type": "eth",
            "netdev": "enp5s0np1",
            "flavour": "physical"
        },
        "pci/0000:05:00.0/11000": {
            "type": "eth",
            "netdev": "enp5s0npf0s1",
            "flavour": "pci_pf",
            "pf": 0,
            "subport": 1
        },
        "pci/0000:82:00.0/0": {
            "type": "eth",
            "netdev": "p4p1",
            "flavour": "physical"
        },
        "pci/0000:82:00.0/10000": {
            "type": "eth",
            "netdev": "eth0",
            "flavour": "pci_pf",
            "pf": 0
        }
    }
}

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_devlink.c  |  3 +-
 include/net/devlink.h                         |  9 ++++--
 include/uapi/linux/devlink.h                  |  1 +
 net/core/devlink.c                            | 29 ++++++++++++++++---
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 36976e37d162..bb07be4117a3 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -375,7 +375,8 @@ nfp_devlink_port_init_pci(struct devlink *devlink, struct nfp_port *port,
 {
 	devlink_port_type_eth_set(&port->dl_port, port->netdev);
 	devlink_port_attrs_pci_set(&port->dl_port, flavour,
-				   port->pf_id, port->vf_id);
+				   port->pf_id, port->vf_id,
+				   port->pf_split, port->pf_split_id);
 	return 0;
 }
 
diff --git a/include/net/devlink.h b/include/net/devlink.h
index b5376ef492f1..13e0a479c546 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -53,6 +53,8 @@ struct devlink_port_attrs {
 		struct {
 			u32 pf_number;
 			u32 vf_number;
+			bool multiport;
+			u32 subport_number;
 		} pci;
 	};
 };
@@ -580,7 +582,8 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    u32 split_subport_number);
 void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
 				enum devlink_port_flavour flavour,
-				u32 pf_number, u32 vf_number);
+				u32 pf_number, u32 vf_number,
+				bool multiport, u32 subport_number);
 int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 				    char *name, size_t len);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
@@ -797,7 +800,9 @@ static inline void devlink_port_attrs_set(struct devlink_port *devlink_port,
 
 static inline void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
 					      enum devlink_port_flavour flavour,
-					      u32 pf_number, u32 vf_number)
+					      u32 pf_number, u32 vf_number,
+					      bool multiport,
+					      u32 subport_number)
 {
 }
 
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 9ce76d4f640d..417ae8233cce 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -336,6 +336,7 @@ enum devlink_attr {
 
 	DEVLINK_ATTR_PORT_PCI_PF_NUMBER,	/* u32 */
 	DEVLINK_ATTR_PORT_PCI_VF_NUMBER,	/* u32 */
+	DEVLINK_ATTR_PORT_PCI_SUBPORT,		/* u32 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index af177284830b..6cdf7e87d7fc 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -541,6 +541,11 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
 				attrs->pci.pf_number))
 			return -EMSGSIZE;
+
+		if (attrs->pci.multiport &&
+		    nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SUBPORT,
+				attrs->pci.subport_number))
+			return -EMSGSIZE;
 		return 0;
 	default:
 		return -EINVAL;
@@ -5449,10 +5454,13 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
  *	@pf_number: PCI PF number, in multi-host mapping to hosts depends
  *	            on the platform
  *	@vf_number: PCI VF number within given PF (ignored for PF itself)
+ *	@multiport: PCI function has more than one logical port
+ *	@subport_number: PCI function has more than one logical port
  */
 void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
 				enum devlink_port_flavour flavour,
-				u32 pf_number, u32 vf_number)
+				u32 pf_number, u32 vf_number,
+				bool multiport, u32 subport_number)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 
@@ -5463,6 +5471,8 @@ void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
 	attrs->flavour = flavour;
 	attrs->pci.pf_number = pf_number;
 	attrs->pci.vf_number = vf_number;
+	attrs->pci.multiport = multiport;
+	attrs->pci.subport_number = subport_number;
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_set);
@@ -5492,11 +5502,22 @@ int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
 		WARN_ON(1);
 		return -EINVAL;
 	case DEVLINK_PORT_FLAVOUR_PCI_PF:
-		n = snprintf(name, len, "pf%u", attrs->pci.pf_number);
+		if (!attrs->pci.multiport)
+			n = snprintf(name, len, "pf%u", attrs->pci.pf_number);
+		else
+			n = snprintf(name, len, "pf%us%u", attrs->pci.pf_number,
+				     attrs->pci.subport_number);
 		break;
 	case DEVLINK_PORT_FLAVOUR_PCI_VF:
-		n = snprintf(name, len, "pf%uvf%u",
-			     attrs->pf_number, attrs->pci.vf_number);
+		if (!attrs->pci.multiport)
+			n = snprintf(name, len, "pf%uvf%u",
+				     attrs->pci.pf_number,
+				     attrs->pci.vf_number);
+		else
+			n = snprintf(name, len, "pf%uvf%us%u",
+				     attrs->pci.pf_number,
+				     attrs->pci.vf_number,
+				     attrs->pci.subport_number);
 		break;
 	}
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 5/8] nfp: switch to devlink_port_get_phys_port_name()
  2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
                   ` (3 preceding siblings ...)
  2019-02-26 18:24 ` [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports Jakub Kicinski
@ 2019-02-26 18:24 ` Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 6/8] devlink: introduce port's peer netdevs Jakub Kicinski
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

Now that devlink understands all port flavours - switch
to the devlink_port_get_phys_port_name() helper.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_port.c | 33 +------------------
 1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c b/drivers/net/ethernet/netronome/nfp/nfp_port.c
index 93c5bfc0510b..3e2ff8d35e8d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
@@ -117,44 +117,13 @@ struct nfp_eth_table_port *nfp_port_get_eth_port(struct nfp_port *port)
 int
 nfp_port_get_phys_port_name(struct net_device *netdev, char *name, size_t len)
 {
-	struct nfp_eth_table_port *eth_port;
 	struct nfp_port *port;
-	int n;
 
 	port = nfp_port_from_netdev(netdev);
 	if (!port)
 		return -EOPNOTSUPP;
 
-	switch (port->type) {
-	case NFP_PORT_PHYS_PORT:
-		eth_port = __nfp_port_get_eth_port(port);
-		if (!eth_port)
-			return -EOPNOTSUPP;
-
-		if (!eth_port->is_split)
-			n = snprintf(name, len, "p%d", eth_port->label_port);
-		else
-			n = snprintf(name, len, "p%ds%d", eth_port->label_port,
-				     eth_port->label_subport);
-		break;
-	case NFP_PORT_PF_PORT:
-		if (!port->pf_split)
-			n = snprintf(name, len, "pf%d", port->pf_id);
-		else
-			n = snprintf(name, len, "pf%ds%d", port->pf_id,
-				     port->pf_split_id);
-		break;
-	case NFP_PORT_VF_PORT:
-		n = snprintf(name, len, "pf%dvf%d", port->pf_id, port->vf_id);
-		break;
-	default:
-		return -EOPNOTSUPP;
-	}
-
-	if (n >= len)
-		return -EINVAL;
-
-	return 0;
+	return devlink_port_get_phys_port_name(&port->dl_port, name, len);
 }
 
 /**
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 6/8] devlink: introduce port's peer netdevs
  2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
                   ` (4 preceding siblings ...)
  2019-02-26 18:24 ` [PATCH net-next 5/8] nfp: switch to devlink_port_get_phys_port_name() Jakub Kicinski
@ 2019-02-26 18:24 ` Jakub Kicinski
  2019-02-27 13:08   ` Jiri Pirko
  2019-02-26 18:24 ` [PATCH net-next 7/8] nfp: expose PF " Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 8/8] devlink: fix kdoc Jakub Kicinski
  7 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

Devlink ports represent ports of a switch device (or SR-IOV
NIC which has an embedded switch). In case of SR-IOV when
PCIe PFs are exposed the PFs which are directly connected
to the local machine may also spawn PF netdev (much like
VFs have a port/"repr" and an actual VF netdev).

Allow devlink to expose such linking. There is currently no
way to find out which netdev corresponds to which PF.

Example:

$ devlink port
pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
pci/0000:82:00.0/10000: type eth netdev eth1 flavour pci_pf pf 0 peer_netdev enp130s0
pci/0000:82:00.0/10001: type eth netdev eth0 flavour pci_vf pf 0 vf 0
pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/devlink.h        | 11 ++++++++
 include/uapi/linux/devlink.h |  3 +++
 net/core/devlink.c           | 51 +++++++++++++++++++++++++++++-------
 3 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 13e0a479c546..15912cca5d16 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -68,6 +68,7 @@ struct devlink_port {
 	enum devlink_port_type type;
 	enum devlink_port_type desired_type;
 	void *type_dev;
+	void *type_peer;
 	struct devlink_port_attrs attrs;
 };
 
@@ -573,6 +574,9 @@ int devlink_port_register(struct devlink *devlink,
 void devlink_port_unregister(struct devlink_port *devlink_port);
 void devlink_port_type_eth_set(struct devlink_port *devlink_port,
 			       struct net_device *netdev);
+void devlink_port_type_eth_set_peer(struct devlink_port *devlink_port,
+				    struct net_device *netdev,
+				    struct net_device *peer);
 void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 			      struct ib_device *ibdev);
 void devlink_port_type_clear(struct devlink_port *devlink_port);
@@ -782,6 +786,13 @@ static inline void devlink_port_type_eth_set(struct devlink_port *devlink_port,
 {
 }
 
+static inline void
+devlink_port_type_eth_set_peer(struct devlink_port *devlink_port,
+			       struct net_device *netdev,
+			       struct net_device *peer)
+{
+}
+
 static inline void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 					    struct ib_device *ibdev)
 {
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 417ae8233cce..9ce4718f8b8b 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -338,6 +338,9 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_PCI_VF_NUMBER,	/* u32 */
 	DEVLINK_ATTR_PORT_PCI_SUBPORT,		/* u32 */
 
+	DEVLINK_ATTR_PORT_PEER_NETDEV_IFINDEX,	/* u32 */
+	DEVLINK_ATTR_PORT_PEER_NETDEV_NAME,	/* string */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6cdf7e87d7fc..b514be300839 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -552,6 +552,17 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 	}
 }
 
+static int devlink_nl_port_put_netdev(struct sk_buff *msg,
+				      int attr_ifindex, int attr_name,
+				      struct net_device *netdev)
+{
+	if (netdev &&
+	    (nla_put_u32(msg, attr_ifindex, netdev->ifindex) ||
+	     nla_put_string(msg, attr_name, netdev->name)))
+		return -EMSGSIZE;
+	return 0;
+}
+
 static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 				struct devlink_port *devlink_port,
 				enum devlink_command cmd, u32 portid,
@@ -574,13 +585,16 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 			devlink_port->desired_type))
 		goto nla_put_failure;
 	if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) {
-		struct net_device *netdev = devlink_port->type_dev;
+		if (devlink_nl_port_put_netdev(msg,
+					       DEVLINK_ATTR_PORT_NETDEV_IFINDEX,
+					       DEVLINK_ATTR_PORT_NETDEV_NAME,
+					       devlink_port->type_dev))
+			goto nla_put_failure;
 
-		if (netdev &&
-		    (nla_put_u32(msg, DEVLINK_ATTR_PORT_NETDEV_IFINDEX,
-				 netdev->ifindex) ||
-		     nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME,
-				    netdev->name)))
+		if (devlink_nl_port_put_netdev(msg,
+					       DEVLINK_ATTR_PORT_PEER_NETDEV_IFINDEX,
+					       DEVLINK_ATTR_PORT_PEER_NETDEV_NAME,
+					       devlink_port->type_peer))
 			goto nla_put_failure;
 	}
 	if (devlink_port->type == DEVLINK_PORT_TYPE_IB) {
@@ -5369,10 +5383,11 @@ EXPORT_SYMBOL_GPL(devlink_port_unregister);
 
 static void __devlink_port_type_set(struct devlink_port *devlink_port,
 				    enum devlink_port_type type,
-				    void *type_dev)
+				    void *type_dev, void *type_peer)
 {
 	devlink_port->type = type;
 	devlink_port->type_dev = type_dev;
+	devlink_port->type_peer = type_peer;
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
 }
 
@@ -5386,10 +5401,26 @@ void devlink_port_type_eth_set(struct devlink_port *devlink_port,
 			       struct net_device *netdev)
 {
 	return __devlink_port_type_set(devlink_port,
-				       DEVLINK_PORT_TYPE_ETH, netdev);
+				       DEVLINK_PORT_TYPE_ETH, netdev, NULL);
 }
 EXPORT_SYMBOL_GPL(devlink_port_type_eth_set);
 
+/**
+ *	devlink_port_type_eth_set_peer - Set port type to Ethernet with peer
+ *
+ *	@devlink_port: devlink port
+ *	@netdev: related netdevice
+ *	@peer: for PCIe ports the non-port netdev (actual VF or PF)
+ */
+void devlink_port_type_eth_set_peer(struct devlink_port *devlink_port,
+				    struct net_device *netdev,
+				    struct net_device *peer)
+{
+	return __devlink_port_type_set(devlink_port,
+				       DEVLINK_PORT_TYPE_ETH, netdev, peer);
+}
+EXPORT_SYMBOL_GPL(devlink_port_type_eth_set_peer);
+
 /**
  *	devlink_port_type_ib_set - Set port type to InfiniBand
  *
@@ -5400,7 +5431,7 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 			      struct ib_device *ibdev)
 {
 	return __devlink_port_type_set(devlink_port,
-				       DEVLINK_PORT_TYPE_IB, ibdev);
+				       DEVLINK_PORT_TYPE_IB, ibdev, NULL);
 }
 EXPORT_SYMBOL_GPL(devlink_port_type_ib_set);
 
@@ -5412,7 +5443,7 @@ EXPORT_SYMBOL_GPL(devlink_port_type_ib_set);
 void devlink_port_type_clear(struct devlink_port *devlink_port)
 {
 	return __devlink_port_type_set(devlink_port,
-				       DEVLINK_PORT_TYPE_NOTSET, NULL);
+				       DEVLINK_PORT_TYPE_NOTSET, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(devlink_port_type_clear);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 7/8] nfp: expose PF peer netdevs
  2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
                   ` (5 preceding siblings ...)
  2019-02-26 18:24 ` [PATCH net-next 6/8] devlink: introduce port's peer netdevs Jakub Kicinski
@ 2019-02-26 18:24 ` Jakub Kicinski
  2019-02-26 18:24 ` [PATCH net-next 8/8] devlink: fix kdoc Jakub Kicinski
  7 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

Expose PF netdevs as devlink port's peers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/abm/main.c    | 1 +
 drivers/net/ethernet/netronome/nfp/flower/main.c | 1 +
 drivers/net/ethernet/netronome/nfp/nfp_devlink.c | 3 ++-
 drivers/net/ethernet/netronome/nfp/nfp_port.h    | 2 ++
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c
index 4d4ff5844c47..8d7ff1200fd4 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -113,6 +113,7 @@ nfp_abm_spawn_repr(struct nfp_app *app, struct nfp_abm_link *alink,
 		port->pf_id = alink->abm->pf_id;
 		port->pf_split = app->pf->max_data_vnics > 1;
 		port->pf_split_id = alink->id;
+		port->peer = alink->vnic->dp.netdev;
 		port->vnic = alink->vnic->dp.ctrl_bar;
 	}
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 408089133599..13aa21923bf7 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -300,6 +300,7 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app,
 		if (repr_type == NFP_REPR_TYPE_PF) {
 			port->pf_id = i;
 			port->vnic = priv->nn->dp.ctrl_bar;
+			port->peer = priv->nn->dp.netdev;
 		} else {
 			port->pf_id = 0;
 			port->vf_id = i;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index bb07be4117a3..3d59a57eda0c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -373,7 +373,8 @@ static int
 nfp_devlink_port_init_pci(struct devlink *devlink, struct nfp_port *port,
 			  u32 flavour)
 {
-	devlink_port_type_eth_set(&port->dl_port, port->netdev);
+	devlink_port_type_eth_set_peer(&port->dl_port, port->netdev,
+				       port->peer);
 	devlink_port_attrs_pci_set(&port->dl_port, flavour,
 				   port->pf_id, port->vf_id,
 				   port->pf_split, port->pf_split_id);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index 09c55ca2371a..c75a25cc5cea 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -51,6 +51,7 @@ enum nfp_port_flags {
  * @eth_forced:	for %NFP_PORT_PHYS_PORT port is forced UP or DOWN, don't change
  * @eth_port:	for %NFP_PORT_PHYS_PORT translated ETH Table port entry
  * @eth_stats:	for %NFP_PORT_PHYS_PORT MAC stats if available
+ * @peer:	for %NFP_PORT_PF_PORT netdev of the actual vNIC, if reachable
  * @pf_id:	for %NFP_PORT_PF_PORT, %NFP_PORT_VF_PORT ID of the PCI PF (0-3)
  * @vf_id:	for %NFP_PORT_VF_PORT ID of the PCI VF within @pf_id
  * @pf_split:	for %NFP_PORT_PF_PORT %true if PCI PF has more than one vNIC
@@ -79,6 +80,7 @@ struct nfp_port {
 		};
 		/* NFP_PORT_PF_PORT, NFP_PORT_VF_PORT */
 		struct {
+			struct net_device *peer;
 			unsigned int pf_id;
 			unsigned int vf_id;
 			bool pf_split;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 8/8] devlink: fix kdoc
  2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
                   ` (6 preceding siblings ...)
  2019-02-26 18:24 ` [PATCH net-next 7/8] nfp: expose PF " Jakub Kicinski
@ 2019-02-26 18:24 ` Jakub Kicinski
  2019-02-27 13:13   ` Jiri Pirko
  7 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-26 18:24 UTC (permalink / raw)
  To: davem, jiri; +Cc: oss-drivers, netdev, Jakub Kicinski

devlink suffers from a few kdoc warnings:

net/core/devlink.c:5292: warning: Function parameter or member 'dev' not described in 'devlink_register'
net/core/devlink.c:5351: warning: Function parameter or member 'port_index' not described in 'devlink_port_register'
net/core/devlink.c:5753: warning: Function parameter or member 'parent_resource_id' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Function parameter or member 'size_params' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'top_hierarchy' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'reload_required' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'parent_reosurce_id' description in 'devlink_resource_register'
net/core/devlink.c:6451: warning: Function parameter or member 'region' not described in 'devlink_region_snapshot_create'
net/core/devlink.c:6451: warning: Excess function parameter 'devlink_region' description in 'devlink_region_snapshot_create'

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 net/core/devlink.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index b514be300839..cd3f466496d3 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -5287,6 +5287,7 @@ EXPORT_SYMBOL_GPL(devlink_alloc);
  *	devlink_register - Register devlink instance
  *
  *	@devlink: devlink
+ *	@dev: parent device
  */
 int devlink_register(struct devlink *devlink, struct device *dev)
 {
@@ -5337,7 +5338,7 @@ EXPORT_SYMBOL_GPL(devlink_free);
  *
  *	@devlink: devlink
  *	@devlink_port: devlink port
- *	@port_index
+ *	@port_index: driver-specific numerical identifier of the port
  *
  *	Register devlink port with provided port index. User can use
  *	any indexing, even hw-related one. devlink_port structure
@@ -5736,13 +5737,10 @@ EXPORT_SYMBOL_GPL(devlink_dpipe_table_unregister);
  *
  *	@devlink: devlink
  *	@resource_name: resource's name
- *	@top_hierarchy: top hierarchy
- *	@reload_required: reload is required for new configuration to
- *			  apply
  *	@resource_size: resource's size
  *	@resource_id: resource's id
- *	@parent_reosurce_id: resource's parent id
- *	@size params: size parameters
+ *	@parent_resource_id: resource's parent id
+ *	@size_params: size parameters
  */
 int devlink_resource_register(struct devlink *devlink,
 			      const char *resource_name,
@@ -6439,7 +6437,7 @@ EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
  *	Multiple snapshots can be created on a region.
  *	The @snapshot_id should be obtained using the getter function.
  *
- *	@devlink_region: devlink region of the snapshot
+ *	@region: devlink region of the snapshot
  *	@data_len: size of snapshot data
  *	@data: snapshot data
  *	@snapshot_id: snapshot id to be created
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-26 18:24 ` [PATCH net-next 2/8] devlink: add PF and VF port flavours Jakub Kicinski
@ 2019-02-27 12:16   ` Jiri Pirko
  2019-03-04  4:59     ` Parav Pandit
  2019-02-27 12:23   ` Jiri Pirko
  1 sibling, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-02-27 12:16 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:
>Current port flavours cover simple switches and DSA.  Add PF
>and VF flavours to cover "switchdev" SR-IOV NICs.
>
>Example devlink user space output:
>
>$ devlink port
>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
>pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
>pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1

I believe that that this output should be in sync with attr names. So
the names should be:
pci_vf
pci_pf
pf_number
vf_number

Like:
pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf_number 0 vf_number 1

But that is comment to the userspace part.

>
>$ devlink -jp port
>{
>    "port": {
>        "pci/0000:82:00.0/0": {
>            "type": "eth",
>            "netdev": "p4p1",
>            "flavour": "physical"
>        },
>        "pci/0000:82:00.0/10000": {
>            "type": "eth",
>            "netdev": "eth0",
>            "flavour": "pci_pf",
>            "pf": 0,
>        },
>        "pci/0000:82:00.0/10001": {
>            "type": "eth",
>            "netdev": "eth1",
>            "flavour": "pci_vf",
>            "pf": 0,
>            "vf": 0
>        },
>        "pci/0000:82:00.0/10002": {
>            "type": "eth",
>            "netdev": "eth2",
>            "flavour": "pci_vf",
>            "pf": 0,
>            "vf": 1
>        }
>    }
>}
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> include/net/devlink.h        | 25 ++++++++++--
> include/uapi/linux/devlink.h |  5 +++
> net/core/devlink.c           | 73 +++++++++++++++++++++++++++++++-----
> 3 files changed, 91 insertions(+), 12 deletions(-)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 7f5a0bdca228..b5376ef492f1 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -42,9 +42,19 @@ struct devlink {
> struct devlink_port_attrs {
> 	bool set;
> 	enum devlink_port_flavour flavour;
>-	u32 port_number; /* same value as "split group" */
>-	bool split;
>-	u32 split_subport_number;
>+	union { /* port identifiers differ per-flavour */
>+		/* PHYSICAL, CPU, DSA */
>+		struct {
>+			bool split;
>+			u32 split_subport_number;
>+			u32 port_number; /* same value as "split group" */
>+		};
>+		 /* PCI_PF, PCI_VF */
>+		struct {
>+			u32 pf_number;
>+			u32 vf_number;
>+		} pci;
>+	};
> };
> 
> struct devlink_port {
>@@ -568,6 +578,9 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
> 			    enum devlink_port_flavour flavour,
> 			    u32 port_number, bool split,
> 			    u32 split_subport_number);
>+void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
>+				enum devlink_port_flavour flavour,
>+				u32 pf_number, u32 vf_number);
> int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
> 				    char *name, size_t len);
> int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>@@ -782,6 +795,12 @@ static inline void devlink_port_attrs_set(struct devlink_port *devlink_port,
> {
> }
> 
>+static inline void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
>+					      enum devlink_port_flavour flavour,
>+					      u32 pf_number, u32 vf_number)
>+{
>+}
>+
> static inline int
> devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
> 				char *name, size_t len)
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index 5bb4ea67d84f..9ce76d4f640d 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -167,6 +167,8 @@ enum devlink_port_flavour {
> 	DEVLINK_PORT_FLAVOUR_DSA, /* Distributed switch architecture
> 				   * interconnect port.
> 				   */
>+	DEVLINK_PORT_FLAVOUR_PCI_PF, /* PCI Physical function port */
>+	DEVLINK_PORT_FLAVOUR_PCI_VF, /* PCI Physical function port */
> };
> 
> enum devlink_param_cmode {
>@@ -332,6 +334,9 @@ enum devlink_attr {
> 	DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME,	/* string */
> 	DEVLINK_ATTR_FLASH_UPDATE_COMPONENT,	/* string */
> 
>+	DEVLINK_ATTR_PORT_PCI_PF_NUMBER,	/* u32 */
>+	DEVLINK_ATTR_PORT_PCI_VF_NUMBER,	/* u32 */
>+
> 	/* add new attributes above here, update the policy in devlink.c */
> 
> 	__DEVLINK_ATTR_MAX,
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index a49dee67e66f..af177284830b 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -516,16 +516,35 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
> 		return 0;
> 	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour))
> 		return -EMSGSIZE;
>-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, attrs->port_number))
>-		return -EMSGSIZE;
>-	if (!attrs->split)
>+
>+	switch (attrs->flavour) {
>+	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
>+	case DEVLINK_PORT_FLAVOUR_CPU:
>+	case DEVLINK_PORT_FLAVOUR_DSA:
>+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER,
>+				attrs->port_number))
>+			return -EMSGSIZE;
>+
>+		if (attrs->split &&
>+		    (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP,
>+				 attrs->port_number) ||
>+		     nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER,
>+				 attrs->split_subport_number)))
>+			return -EMSGSIZE;
> 		return 0;
>-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP, attrs->port_number))
>-		return -EMSGSIZE;
>-	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER,
>-			attrs->split_subport_number))
>-		return -EMSGSIZE;
>-	return 0;
>+	case DEVLINK_PORT_FLAVOUR_PCI_VF:
>+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
>+				attrs->pci.vf_number))
>+			return -EMSGSIZE;
>+		/* fall through */
>+	case DEVLINK_PORT_FLAVOUR_PCI_PF:
>+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
>+				attrs->pci.pf_number))
>+			return -EMSGSIZE;
>+		return 0;
>+	default:
>+		return -EINVAL;
>+	}
> }
> 
> static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
>@@ -5410,6 +5429,9 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
> {
> 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
> 
>+	WARN_ON(flavour == DEVLINK_PORT_FLAVOUR_PCI_PF ||
>+		flavour == DEVLINK_PORT_FLAVOUR_PCI_VF);
>+
> 	attrs->set = true;
> 	attrs->flavour = flavour;
> 	attrs->port_number = port_number;
>@@ -5419,6 +5441,32 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
> }
> EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
> 
>+/**
>+ *	devlink_port_attrs_pci_set - Set port attributes for a PCI port
>+ *
>+ *	@devlink_port: devlink port
>+ *	@flavour: flavour of the port (PF or VF only)
>+ *	@pf_number: PCI PF number, in multi-host mapping to hosts depends
>+ *	            on the platform
>+ *	@vf_number: PCI VF number within given PF (ignored for PF itself)
>+ */
>+void devlink_port_attrs_pci_set(struct devlink_port *devlink_port,
>+				enum devlink_port_flavour flavour,
>+				u32 pf_number, u32 vf_number)

Maybe nicer would be to have this static and have 2 helpers:

void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port,
				   u32 pf_number)
void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
				   u32 pf_number, u32 vf_number)

Then you need no warnon. Also you won't have dead arg in driver api
case of pf


Other than this the patch looks good to me.


>+{
>+	struct devlink_port_attrs *attrs = &devlink_port->attrs;
>+
>+	WARN_ON(flavour != DEVLINK_PORT_FLAVOUR_PCI_PF &&
>+		flavour != DEVLINK_PORT_FLAVOUR_PCI_VF);
>+
>+	attrs->set = true;
>+	attrs->flavour = flavour;
>+	attrs->pci.pf_number = pf_number;
>+	attrs->pci.vf_number = vf_number;
>+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
>+}
>+EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_set);
>+
> int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
> 				    char *name, size_t len)
> {
>@@ -5443,6 +5491,13 @@ int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
> 		 */
> 		WARN_ON(1);
> 		return -EINVAL;
>+	case DEVLINK_PORT_FLAVOUR_PCI_PF:
>+		n = snprintf(name, len, "pf%u", attrs->pci.pf_number);
>+		break;
>+	case DEVLINK_PORT_FLAVOUR_PCI_VF:
>+		n = snprintf(name, len, "pf%uvf%u",
>+			     attrs->pf_number, attrs->pci.vf_number);
>+		break;
> 	}
> 
> 	if (n >= len)
>-- 
>2.19.2
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-26 18:24 ` [PATCH net-next 2/8] devlink: add PF and VF port flavours Jakub Kicinski
  2019-02-27 12:16   ` Jiri Pirko
@ 2019-02-27 12:23   ` Jiri Pirko
  2019-02-27 12:41     ` Jiri Pirko
  1 sibling, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-02-27 12:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:
>Current port flavours cover simple switches and DSA.  Add PF
>and VF flavours to cover "switchdev" SR-IOV NICs.
>
>Example devlink user space output:
>
>$ devlink port
>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
>pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
>pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1

Wait a second, howcome pf and vfs have the same PCI address?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-26 18:24 ` [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports Jakub Kicinski
@ 2019-02-27 12:37   ` Jiri Pirko
  2019-02-27 18:30     ` Jakub Kicinski
  2019-03-04  5:00     ` Parav Pandit
  0 siblings, 2 replies; 44+ messages in thread
From: Jiri Pirko @ 2019-02-27 12:37 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev, parav, jgg

Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote:
>PCI endpoint corresponds to a PCI device, but such device
>can have one more more logical device ports associated with it.
>We need a way to distinguish those. Add a PCI subport in the
>dumps and print the info in phys_port_name appropriately.
>
>This is not equivalent to port splitting, there is no split
>group. It's just a way of representing multiple netdevs on
>a single PCI function.
>
>Note that the quality of being multiport pertains only to
>the PCI function itself. A PF having multiple netdevs does
>not mean that its VFs will also have multiple, or that VFs
>are associated with any particular port of a multiport VF.
>

We've been discussing the problem of subport (we call it "subfunction"
or "SF") for some time internally. Turned out, this is probably harder
task to model. Please prove me wrong.

The nature of VF makes it a logically separate entity. It has a separate
PCI address, it should therefore have a separate devlink instance.
You can pass it through to VM, then the same devlink instance should be
created inside the VM and disappear from the host.

SF (or subport) feels similar to that. Basically it is exactly the same
thing as VF, only does reside under PF PCI function.

That is why I think, for sake of consistency, it should have a separate
devlink entity as well. The problem is correct sysfs modelling and
devlink handle derived from that. Parav is working on a simple soft
bus for this purpose called "subbus". There is a RFC floating around on
Mellanox internal mailing list, looks like it is time to send it
upstream.

Then each PF driver which have SFs would register subbus devices
according to SFs/subports and they would be properly handled by bus
probe, devlink and devlink port and netdev instances created.

Ccing Parav and Jason.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-27 12:23   ` Jiri Pirko
@ 2019-02-27 12:41     ` Jiri Pirko
  2019-02-27 17:23       ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-02-27 12:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Wed, Feb 27, 2019 at 01:23:27PM CET, jiri@resnulli.us wrote:
>Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:
>>Current port flavours cover simple switches and DSA.  Add PF
>>and VF flavours to cover "switchdev" SR-IOV NICs.
>>
>>Example devlink user space output:
>>
>>$ devlink port
>>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>>pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
>>pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
>>pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1
>
>Wait a second, howcome pf and vfs have the same PCI address?

Oh, I think you have these as eswitch port representors. Confusing...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 6/8] devlink: introduce port's peer netdevs
  2019-02-26 18:24 ` [PATCH net-next 6/8] devlink: introduce port's peer netdevs Jakub Kicinski
@ 2019-02-27 13:08   ` Jiri Pirko
  2019-02-27 18:47     ` Jakub Kicinski
  2019-03-04  5:07     ` Parav Pandit
  0 siblings, 2 replies; 44+ messages in thread
From: Jiri Pirko @ 2019-02-27 13:08 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Tue, Feb 26, 2019 at 07:24:34PM CET, jakub.kicinski@netronome.com wrote:
>Devlink ports represent ports of a switch device (or SR-IOV
>NIC which has an embedded switch). In case of SR-IOV when
>PCIe PFs are exposed the PFs which are directly connected
>to the local machine may also spawn PF netdev (much like
>VFs have a port/"repr" and an actual VF netdev).
>
>Allow devlink to expose such linking. There is currently no
>way to find out which netdev corresponds to which PF.
>
>Example:
>
>$ devlink port
>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>pci/0000:82:00.0/10000: type eth netdev eth1 flavour pci_pf pf 0 peer_netdev enp130s0
>pci/0000:82:00.0/10001: type eth netdev eth0 flavour pci_vf pf 0 vf 0
>pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1

Peer as the other side of a "virtual cable". For PF, that is probably
sufficient. But I think what a "peer of devlink port" should be "a
devlink port".

Not sure about VF.

Consider a simple problem of setting up a VF mac address. In legacy, you
do it like this:
$ ip link set eth2 vf 1 mac 00:52:44:11:22:33
However, in new model, you so far cannot do that.

What I was thinking about was some "dummy peer" which would be on the
host. Not sure if only as a "dummy peer devlink port" or even as some
sort of "dummy netdev".

One way or another, it would provide the user some info about which VF
representor is connected to which VF in VM (mac mapping).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 8/8] devlink: fix kdoc
  2019-02-26 18:24 ` [PATCH net-next 8/8] devlink: fix kdoc Jakub Kicinski
@ 2019-02-27 13:13   ` Jiri Pirko
  0 siblings, 0 replies; 44+ messages in thread
From: Jiri Pirko @ 2019-02-27 13:13 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Tue, Feb 26, 2019 at 07:24:36PM CET, jakub.kicinski@netronome.com wrote:
>devlink suffers from a few kdoc warnings:
>
>net/core/devlink.c:5292: warning: Function parameter or member 'dev' not described in 'devlink_register'
>net/core/devlink.c:5351: warning: Function parameter or member 'port_index' not described in 'devlink_port_register'
>net/core/devlink.c:5753: warning: Function parameter or member 'parent_resource_id' not described in 'devlink_resource_register'
>net/core/devlink.c:5753: warning: Function parameter or member 'size_params' not described in 'devlink_resource_register'
>net/core/devlink.c:5753: warning: Excess function parameter 'top_hierarchy' description in 'devlink_resource_register'
>net/core/devlink.c:5753: warning: Excess function parameter 'reload_required' description in 'devlink_resource_register'
>net/core/devlink.c:5753: warning: Excess function parameter 'parent_reosurce_id' description in 'devlink_resource_register'
>net/core/devlink.c:6451: warning: Function parameter or member 'region' not described in 'devlink_region_snapshot_create'
>net/core/devlink.c:6451: warning: Excess function parameter 'devlink_region' description in 'devlink_region_snapshot_create'
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-27 12:41     ` Jiri Pirko
@ 2019-02-27 17:23       ` Jakub Kicinski
  2019-02-27 20:17         ` Jiri Pirko
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-27 17:23 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev

On Wed, 27 Feb 2019 13:41:35 +0100, Jiri Pirko wrote:
> Wed, Feb 27, 2019 at 01:23:27PM CET, jiri@resnulli.us wrote:
> >Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:  
> >>Current port flavours cover simple switches and DSA.  Add PF
> >>and VF flavours to cover "switchdev" SR-IOV NICs.
> >>
> >>Example devlink user space output:
> >>
> >>$ devlink port
> >>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
> >>pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
> >>pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
> >>pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1  
> >
> >Wait a second, howcome pf and vfs have the same PCI address?  
> 
> Oh, I think you have these as eswitch port representors. Confusing...

FWIW I don't like the word representor, its a port. We don't call
physical ports "representors" even though from ASIC's point of view
they are exactly the same.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-27 12:37   ` Jiri Pirko
@ 2019-02-27 18:30     ` Jakub Kicinski
  2019-02-28  8:56       ` Jiri Pirko
  2019-03-04 16:15       ` Jason Gunthorpe
  2019-03-04  5:00     ` Parav Pandit
  1 sibling, 2 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-27 18:30 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev, parav, jgg

On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote:
> Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote:
> >PCI endpoint corresponds to a PCI device, but such device
> >can have one more more logical device ports associated with it.
> >We need a way to distinguish those. Add a PCI subport in the
> >dumps and print the info in phys_port_name appropriately.
> >
> >This is not equivalent to port splitting, there is no split
> >group. It's just a way of representing multiple netdevs on
> >a single PCI function.
> >
> >Note that the quality of being multiport pertains only to
> >the PCI function itself. A PF having multiple netdevs does
> >not mean that its VFs will also have multiple, or that VFs
> >are associated with any particular port of a multiport VF.
> 
> We've been discussing the problem of subport (we call it "subfunction"
> or "SF") for some time internally. Turned out, this is probably harder
> task to model. Please prove me wrong.
> 
> The nature of VF makes it a logically separate entity. It has a separate
> PCI address, it should therefore have a separate devlink instance.
> You can pass it through to VM, then the same devlink instance should be
> created inside the VM and disappear from the host.

Depends what a devlink instance represents :/  On one hand you may want
to create an instance for a VF to allow it to spawn soft ports, on the
other you may want to group multiple functions together.

IOW if devlink instance is for an ASIC, there should be one per device
per host.  So if we start connecting multiple functions (PFs and/or VFs)
to one host we should probably introduce the notion of devlink aliases
or some such (so that multiple bus addresses can target the same
devlink instance).  Those less pipelined NICs can forward between
ports, but still want a function per port (otherwise user space
sometimes gets confused).  If we have multiple functions which are on
the same "switchid" they should have a single devlink instance if you
ask me.  That instance will have all the ports of the device.

You say disappear from the host - what do you mean.  Are you referring
to the VF port disappearing?  But on the switch the port is still
there, and you should show the subports on the PF side IMHO.  Devlink
ports should allow users to understand the topology of the switch.

Is spawning VMDq sub-instances the only thing we can think of that VMs
may want to do?  Are there any other uses?

> SF (or subport) feels similar to that. Basically it is exactly the same
> thing as VF, only does reside under PF PCI function.
> 
> That is why I think, for sake of consistency, it should have a separate
> devlink entity as well. The problem is correct sysfs modelling and
> devlink handle derived from that. Parav is working on a simple soft
> bus for this purpose called "subbus". There is a RFC floating around on
> Mellanox internal mailing list, looks like it is time to send it
> upstream.
> 
> Then each PF driver which have SFs would register subbus devices
> according to SFs/subports and they would be properly handled by bus
> probe, devlink and devlink port and netdev instances created.
> 
> Ccing Parav and Jason.

You guys come from the RDMA side of the world, with which I'm less
familiar, and the soft bus + spawning devices seems to be a popular
design there.  Could you describe the advantages of that model for 
the sake of the netdev-only folks? :)

Another term that gets thrown into the mix here is mediated devices,
right?  If you wanna pass the sub-spawn-soft-port to a VM.  Or run 
DPDK on some queues.

To state the obvious AF_XDP and macvlan offload were are previous
answers to some of those use cases.  What is the forwarding model
for those subports?  Are we going to allow flower rules from VMs?
Is it going to be dst MAC only?  Or is the hypervisor going to forward
as it sees appropriate (OvS + "repr"/port netdev)?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 6/8] devlink: introduce port's peer netdevs
  2019-02-27 13:08   ` Jiri Pirko
@ 2019-02-27 18:47     ` Jakub Kicinski
  2019-02-28  9:00       ` Jiri Pirko
  2019-03-04  5:07     ` Parav Pandit
  1 sibling, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-27 18:47 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev

On Wed, 27 Feb 2019 14:08:29 +0100, Jiri Pirko wrote:
> Tue, Feb 26, 2019 at 07:24:34PM CET, jakub.kicinski@netronome.com wrote:
> >Devlink ports represent ports of a switch device (or SR-IOV
> >NIC which has an embedded switch). In case of SR-IOV when
> >PCIe PFs are exposed the PFs which are directly connected
> >to the local machine may also spawn PF netdev (much like
> >VFs have a port/"repr" and an actual VF netdev).
> >
> >Allow devlink to expose such linking. There is currently no
> >way to find out which netdev corresponds to which PF.
> >
> >Example:
> >
> >$ devlink port
> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
> >pci/0000:82:00.0/10000: type eth netdev eth1 flavour pci_pf pf 0 peer_netdev enp130s0
> >pci/0000:82:00.0/10001: type eth netdev eth0 flavour pci_vf pf 0 vf 0
> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1  
> 
> Peer as the other side of a "virtual cable". For PF, that is probably
> sufficient. But I think what a "peer of devlink port" should be "a
> devlink port".

Maybe I'm not clear on what devlink port is - to me its a port of the
ASIC.  The notion of devlink port connected to devlink port seems
to counter such definition :S  

I do not think that every netdev should have a devlink port associated.

> Not sure about VF.
> 
> Consider a simple problem of setting up a VF mac address. In legacy, you
> do it like this:
> $ ip link set eth2 vf 1 mac 00:52:44:11:22:33
> However, in new model, you so far cannot do that.

Why?

$ devlink port set pci/0000:82:00.0/10001 peer_eth_addr 00:52:44:11:22:33

It's more of a neighbour info situation than a local port situation.

> What I was thinking about was some "dummy peer" which would be on the
> host. Not sure if only as a "dummy peer devlink port" or even as some
> sort of "dummy netdev".
> 
> One way or another, it would provide the user some info about which VF
> representor is connected to which VF in VM (mac mapping).

Ack, but isn't the MAC setting is the only thing we're missing from
"switchdev SR-IOV"?  Would the "dummy netdev" be used for anything
else?  I would rather not introduce new netdev just to do that 
(that'd be a third for that port.)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-27 17:23       ` Jakub Kicinski
@ 2019-02-27 20:17         ` Jiri Pirko
  2019-02-27 22:42           ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-02-27 20:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Wed, Feb 27, 2019 at 06:23:26PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 27 Feb 2019 13:41:35 +0100, Jiri Pirko wrote:
>> Wed, Feb 27, 2019 at 01:23:27PM CET, jiri@resnulli.us wrote:
>> >Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:  
>> >>Current port flavours cover simple switches and DSA.  Add PF
>> >>and VF flavours to cover "switchdev" SR-IOV NICs.
>> >>
>> >>Example devlink user space output:
>> >>
>> >>$ devlink port
>> >>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>> >>pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
>> >>pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
>> >>pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1  
>> >
>> >Wait a second, howcome pf and vfs have the same PCI address?  
>> 
>> Oh, I think you have these as eswitch port representors. Confusing...
>
>FWIW I don't like the word representor, its a port. We don't call
>physical ports "representors" even though from ASIC's point of view
>they are exactly the same.

My point is, they are not PFs and VFs. We have to find a way to clearly
see what's what.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-27 20:17         ` Jiri Pirko
@ 2019-02-27 22:42           ` Jakub Kicinski
  2019-02-28  8:44             ` Jiri Pirko
  2019-02-28 16:24             ` David Ahern
  0 siblings, 2 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-27 22:42 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev

On Wed, 27 Feb 2019 21:17:27 +0100, Jiri Pirko wrote:
> Wed, Feb 27, 2019 at 06:23:26PM CET, jakub.kicinski@netronome.com wrote:
> >On Wed, 27 Feb 2019 13:41:35 +0100, Jiri Pirko wrote:  
> >> Wed, Feb 27, 2019 at 01:23:27PM CET, jiri@resnulli.us wrote:  
> >> >Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:    
> >> >>Current port flavours cover simple switches and DSA.  Add PF
> >> >>and VF flavours to cover "switchdev" SR-IOV NICs.
> >> >>
> >> >>Example devlink user space output:
> >> >>
> >> >>$ devlink port
> >> >>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
> >> >>pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
> >> >>pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
> >> >>pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1    
> >> >
> >> >Wait a second, howcome pf and vfs have the same PCI address?    
> >> 
> >> Oh, I think you have these as eswitch port representors. Confusing...  
> >
> >FWIW I don't like the word representor, its a port. We don't call
> >physical ports "representors" even though from ASIC's point of view
> >they are exactly the same.  
> 
> My point is, they are not PFs and VFs. We have to find a way to clearly
> see what's what.

Okay, so let me explain the way I see it, and you can explain your way
or tell me where you disagree.  Those devlink ports and netdevs are pf
ports and vf ports, which most refer to as "representor".  If one sends
packets to the netdev indicated in DEVLINK_ATTR_PORT_NETDEV_*
attributes they will _egress_ the switch from that port.  For physical
port that means going onto the Ethernet or IB wire.  For PCIe it means
getting DMAed over the PCIe link to host memory.

There is a netdev construct on the host which is in charge of that 
host memory.  Maybe we shall call that host netdev?

(I said I don't like "representor" for the reason that people don't
refer to the physical port as "representor" even though it has exactly
the semantics we are following.  This distinction between behaviour of
physical and PCI ports is what leads to confusion, I think.)

Let me bring out the moose :)

                    HOST A             ||          HOST B                  
                                       ||                                  
        PF A       | V | V | V | V     ||       PF B        | V | V | V               
                   |*F |*F |*F |*F ... ||                   |*F |*F |*F ...           
*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2            
                                       ||                                  
             PCI Express link          ||        PCI Express link          
        \      \      \  |   |   |          |       |      /   /   /         
         \      \      \ |   |   |          |       |     /   /   /          
      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||  
  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
d n H ||  |               <<==========                              |   || d n H
e s O ||  |                            ==========>>                 |   || e s O
v t S ||  |                    SR-IOV e-switch                      |   || v t S
l a T ||  |               <<==========                              |   || l a T
i n   ||  |                            ==========>>                 |   || i n
n c A ||  |               ________ _________ ________               |   || n c B
k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
      ||  \---------------------------------------------------------/   ||                 
      \/                      |        |         |                      \/    
                              |        |         |                          
                                 ||         ||                               
                          MAC 0  ||  MAC 1  || MAC 2                   
                                 ||         ||                              

Things marked with + are devlink ports and have port (-repr-) netdevs
(including physical ports).
Things marked with * are host netdevs, don't have devlink ports.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-27 22:42           ` Jakub Kicinski
@ 2019-02-28  8:44             ` Jiri Pirko
  2019-02-28 16:08               ` Jakub Kicinski
  2019-02-28 16:24             ` David Ahern
  1 sibling, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-02-28  8:44 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Wed, Feb 27, 2019 at 11:42:39PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 27 Feb 2019 21:17:27 +0100, Jiri Pirko wrote:
>> Wed, Feb 27, 2019 at 06:23:26PM CET, jakub.kicinski@netronome.com wrote:
>> >On Wed, 27 Feb 2019 13:41:35 +0100, Jiri Pirko wrote:  
>> >> Wed, Feb 27, 2019 at 01:23:27PM CET, jiri@resnulli.us wrote:  
>> >> >Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:    
>> >> >>Current port flavours cover simple switches and DSA.  Add PF
>> >> >>and VF flavours to cover "switchdev" SR-IOV NICs.
>> >> >>
>> >> >>Example devlink user space output:
>> >> >>
>> >> >>$ devlink port
>> >> >>pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>> >> >>pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
>> >> >>pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
>> >> >>pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1    
>> >> >
>> >> >Wait a second, howcome pf and vfs have the same PCI address?    
>> >> 
>> >> Oh, I think you have these as eswitch port representors. Confusing...  
>> >
>> >FWIW I don't like the word representor, its a port. We don't call
>> >physical ports "representors" even though from ASIC's point of view
>> >they are exactly the same.  
>> 
>> My point is, they are not PFs and VFs. We have to find a way to clearly
>> see what's what.
>
>Okay, so let me explain the way I see it, and you can explain your way
>or tell me where you disagree.  Those devlink ports and netdevs are pf
>ports and vf ports, which most refer to as "representor".  If one sends
>packets to the netdev indicated in DEVLINK_ATTR_PORT_NETDEV_*
>attributes they will _egress_ the switch from that port.  For physical
>port that means going onto the Ethernet or IB wire.  For PCIe it means
>getting DMAed over the PCIe link to host memory.
>
>There is a netdev construct on the host which is in charge of that 
>host memory.  Maybe we shall call that host netdev?
>
>(I said I don't like "representor" for the reason that people don't
>refer to the physical port as "representor" even though it has exactly
>the semantics we are following.  This distinction between behaviour of
>physical and PCI ports is what leads to confusion, I think.)
>
>Let me bring out the moose :)
>
>                    HOST A             ||          HOST B                  
>                                       ||                                  
>        PF A       | V | V | V | V     ||       PF B        | V | V | V               
>                   |*F |*F |*F |*F ... ||                   |*F |*F |*F ...           
>*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2            
>                                       ||                                  
>             PCI Express link          ||        PCI Express link          
>        \      \      \  |   |   |          |       |      /   /   /         
>         \      \      \ |   |   |          |       |     /   /   /          
>      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
>      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||  
>  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
>d n H ||  |               <<==========                              |   || d n H
>e s O ||  |                            ==========>>                 |   || e s O
>v t S ||  |                    SR-IOV e-switch                      |   || v t S
>l a T ||  |               <<==========                              |   || l a T
>i n   ||  |                            ==========>>                 |   || i n
>n c A ||  |               ________ _________ ________               |   || n c B
>k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
>      ||  \---------------------------------------------------------/   ||                 
>      \/                      |        |         |                      \/    
>                              |        |         |                          
>                                 ||         ||                               
>                          MAC 0  ||  MAC 1  || MAC 2                   
>                                 ||         ||                              
>
>Things marked with + are devlink ports and have port (-repr-) netdevs
>(including physical ports).
>Things marked with * are host netdevs, don't have devlink ports.

Okay, I got it. So you say that devlink ports should always be only
ports of eswitch.

PF host netdev should have "devlink port" instance, correct?
But it still "belongs" under the ASIC represented by the devlink
instance...



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-27 18:30     ` Jakub Kicinski
@ 2019-02-28  8:56       ` Jiri Pirko
  2019-02-28 13:32         ` Jiri Pirko
  2019-02-28 16:24         ` Jakub Kicinski
  2019-03-04 16:15       ` Jason Gunthorpe
  1 sibling, 2 replies; 44+ messages in thread
From: Jiri Pirko @ 2019-02-28  8:56 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev, parav, jgg

Wed, Feb 27, 2019 at 07:30:00PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote:
>> Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote:
>> >PCI endpoint corresponds to a PCI device, but such device
>> >can have one more more logical device ports associated with it.
>> >We need a way to distinguish those. Add a PCI subport in the
>> >dumps and print the info in phys_port_name appropriately.
>> >
>> >This is not equivalent to port splitting, there is no split
>> >group. It's just a way of representing multiple netdevs on
>> >a single PCI function.
>> >
>> >Note that the quality of being multiport pertains only to
>> >the PCI function itself. A PF having multiple netdevs does
>> >not mean that its VFs will also have multiple, or that VFs
>> >are associated with any particular port of a multiport VF.
>> 
>> We've been discussing the problem of subport (we call it "subfunction"
>> or "SF") for some time internally. Turned out, this is probably harder
>> task to model. Please prove me wrong.
>> 
>> The nature of VF makes it a logically separate entity. It has a separate
>> PCI address, it should therefore have a separate devlink instance.
>> You can pass it through to VM, then the same devlink instance should be
>> created inside the VM and disappear from the host.
>
>Depends what a devlink instance represents :/  On one hand you may want
>to create an instance for a VF to allow it to spawn soft ports, on the
>other you may want to group multiple functions together.
>
>IOW if devlink instance is for an ASIC, there should be one per device
>per host.  So if we start connecting multiple functions (PFs and/or VFs)
>to one host we should probably introduce the notion of devlink aliases
>or some such (so that multiple bus addresses can target the same

Hmm. Like VF address -> PF address alias? That would be confusing to see
eswitch ports under VF devlink instance... I probably did not get you
right.


>devlink instance).  Those less pipelined NICs can forward between
>ports, but still want a function per port (otherwise user space
>sometimes gets confused).  If we have multiple functions which are on
>the same "switchid" they should have a single devlink instance if you
>ask me.  That instance will have all the ports of the device.

Okay, that makes sense. But the question it, can the same devlink
instance contain ports that does not have "Switchid"?

I think it would be beneficial to have the switchid shown for devlink
ports too. Then it is clean that the devlink ports with the same
switchid belong to the same switch, and other ports under the same
devlink instance (like PF itself) is separate, but still under the same
ASIC.


>
>You say disappear from the host - what do you mean.  Are you referring
>to the VF port disappearing?  But on the switch the port is still

No, VF itself. eswitch port will be still there on the host.


>there, and you should show the subports on the PF side IMHO.  Devlink
>ports should allow users to understand the topology of the switch.

What do you mean by "topology"?


>
>Is spawning VMDq sub-instances the only thing we can think of that VMs
>may want to do?  Are there any other uses?
>
>> SF (or subport) feels similar to that. Basically it is exactly the same
>> thing as VF, only does reside under PF PCI function.
>> 
>> That is why I think, for sake of consistency, it should have a separate
>> devlink entity as well. The problem is correct sysfs modelling and
>> devlink handle derived from that. Parav is working on a simple soft
>> bus for this purpose called "subbus". There is a RFC floating around on
>> Mellanox internal mailing list, looks like it is time to send it
>> upstream.
>> 
>> Then each PF driver which have SFs would register subbus devices
>> according to SFs/subports and they would be properly handled by bus
>> probe, devlink and devlink port and netdev instances created.
>> 
>> Ccing Parav and Jason.
>
>You guys come from the RDMA side of the world, with which I'm less
>familiar, and the soft bus + spawning devices seems to be a popular
>design there.  Could you describe the advantages of that model for 
>the sake of the netdev-only folks? :)

I'll try to draw some ascii art :)

>
>Another term that gets thrown into the mix here is mediated devices,
>right?  If you wanna pass the sub-spawn-soft-port to a VM.  Or run 
>DPDK on some queues.
>
>To state the obvious AF_XDP and macvlan offload were are previous
>answers to some of those use cases.  What is the forwarding model
>for those subports?  Are we going to allow flower rules from VMs?
>Is it going to be dst MAC only?  Or is the hypervisor going to forward
>as it sees appropriate (OvS + "repr"/port netdev)?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 6/8] devlink: introduce port's peer netdevs
  2019-02-27 18:47     ` Jakub Kicinski
@ 2019-02-28  9:00       ` Jiri Pirko
  2019-02-28 16:36         ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-02-28  9:00 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Wed, Feb 27, 2019 at 07:47:42PM CET, jakub.kicinski@netronome.com wrote:
>On Wed, 27 Feb 2019 14:08:29 +0100, Jiri Pirko wrote:
>> Tue, Feb 26, 2019 at 07:24:34PM CET, jakub.kicinski@netronome.com wrote:
>> >Devlink ports represent ports of a switch device (or SR-IOV
>> >NIC which has an embedded switch). In case of SR-IOV when
>> >PCIe PFs are exposed the PFs which are directly connected
>> >to the local machine may also spawn PF netdev (much like
>> >VFs have a port/"repr" and an actual VF netdev).
>> >
>> >Allow devlink to expose such linking. There is currently no
>> >way to find out which netdev corresponds to which PF.
>> >
>> >Example:
>> >
>> >$ devlink port
>> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>> >pci/0000:82:00.0/10000: type eth netdev eth1 flavour pci_pf pf 0 peer_netdev enp130s0
>> >pci/0000:82:00.0/10001: type eth netdev eth0 flavour pci_vf pf 0 vf 0
>> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1  
>> 
>> Peer as the other side of a "virtual cable". For PF, that is probably
>> sufficient. But I think what a "peer of devlink port" should be "a
>> devlink port".
>
>Maybe I'm not clear on what devlink port is - to me its a port of the
>ASIC.  The notion of devlink port connected to devlink port seems
>to counter such definition :S  

"port of the ASIC" in a sence of "eswitch ports"?


>
>I do not think that every netdev should have a devlink port associated.
>
>> Not sure about VF.
>> 
>> Consider a simple problem of setting up a VF mac address. In legacy, you
>> do it like this:
>> $ ip link set eth2 vf 1 mac 00:52:44:11:22:33
>> However, in new model, you so far cannot do that.
>
>Why?
>
>$ devlink port set pci/0000:82:00.0/10001 peer_eth_addr 00:52:44:11:22:33

Yeah. That is not yet implemented. I agree it is most straightforward.
The question is, is it fine to have set of:
peer_eth_addr
peer_mtu
peer_something_else
Or rather to have some object to pin this on. Something like:

$ devlink port peer set pci/0000:82:00.0/10001 eth_addr 00:52:44:11:22:33


>
>It's more of a neighbour info situation than a local port situation.
>
>> What I was thinking about was some "dummy peer" which would be on the
>> host. Not sure if only as a "dummy peer devlink port" or even as some
>> sort of "dummy netdev".
>> 
>> One way or another, it would provide the user some info about which VF
>> representor is connected to which VF in VM (mac mapping).
>
>Ack, but isn't the MAC setting is the only thing we're missing from
>"switchdev SR-IOV"?  Would the "dummy netdev" be used for anything
>else?  I would rather not introduce new netdev just to do that 

Agreed. It was just a wild idea :)


>(that'd be a third for that port.)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-28  8:56       ` Jiri Pirko
@ 2019-02-28 13:32         ` Jiri Pirko
  2019-02-28 16:24         ` Jakub Kicinski
  1 sibling, 0 replies; 44+ messages in thread
From: Jiri Pirko @ 2019-02-28 13:32 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev, parav, jgg

Thu, Feb 28, 2019 at 09:56:24AM CET, jiri@resnulli.us wrote:

[...]


>>devlink instance).  Those less pipelined NICs can forward between
>>ports, but still want a function per port (otherwise user space
>>sometimes gets confused).  If we have multiple functions which are on
>>the same "switchid" they should have a single devlink instance if you
>>ask me.  That instance will have all the ports of the device.
>
>Okay, that makes sense. But the question it, can the same devlink
>instance contain ports that does not have "Switchid"?
>
>I think it would be beneficial to have the switchid shown for devlink
>ports too. Then it is clean that the devlink ports with the same
>switchid belong to the same switch, and other ports under the same
>devlink instance (like PF itself) is separate, but still under the same
>ASIC.

Working on this. Will sent patchset later today/tmrw.

[...]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-28  8:44             ` Jiri Pirko
@ 2019-02-28 16:08               ` Jakub Kicinski
  0 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-28 16:08 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev

On Thu, 28 Feb 2019 09:44:29 +0100, Jiri Pirko wrote:
> >Okay, so let me explain the way I see it, and you can explain your way
> >or tell me where you disagree.  Those devlink ports and netdevs are pf
> >ports and vf ports, which most refer to as "representor".  If one sends
> >packets to the netdev indicated in DEVLINK_ATTR_PORT_NETDEV_*
> >attributes they will _egress_ the switch from that port.  For physical
> >port that means going onto the Ethernet or IB wire.  For PCIe it means
> >getting DMAed over the PCIe link to host memory.
> >
> >There is a netdev construct on the host which is in charge of that 
> >host memory.  Maybe we shall call that host netdev?
> >
> >(I said I don't like "representor" for the reason that people don't
> >refer to the physical port as "representor" even though it has exactly
> >the semantics we are following.  This distinction between behaviour of
> >physical and PCI ports is what leads to confusion, I think.)
> >
> >Let me bring out the moose :)
> >
> >                    HOST A             ||          HOST B                  
> >                                       ||                                  
> >        PF A       | V | V | V | V     ||       PF B        | V | V | V               
> >                   |*F |*F |*F |*F ... ||                   |*F |*F |*F ...           
> >*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2            
> >                                       ||                                  
> >             PCI Express link          ||        PCI Express link          
> >        \      \      \  |   |   |          |       |      /   /   /         
> >         \      \      \ |   |   |          |       |     /   /   /          
> >      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
> >      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||  
> >  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
> >d n H ||  |               <<==========                              |   || d n H
> >e s O ||  |                            ==========>>                 |   || e s O
> >v t S ||  |                    SR-IOV e-switch                      |   || v t S
> >l a T ||  |               <<==========                              |   || l a T
> >i n   ||  |                            ==========>>                 |   || i n
> >n c A ||  |               ________ _________ ________               |   || n c B
> >k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
> >      ||  \---------------------------------------------------------/   ||
> >      \/                      |        |         |                      \/    
> >                              |        |         |                          
> >                                 ||         ||                               
> >                          MAC 0  ||  MAC 1  || MAC 2                   
> >                                 ||         ||                              
> >
> >Things marked with + are devlink ports and have port (-repr-) netdevs
> >(including physical ports).
> >Things marked with * are host netdevs, don't have devlink ports.  
> 
> Okay, I got it. So you say that devlink ports should always be only
> ports of eswitch.
> 
> PF host netdev should have "devlink port" instance, correct?
> But it still "belongs" under the ASIC represented by the devlink
> instance...

Yes, I think so.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-28  8:56       ` Jiri Pirko
  2019-02-28 13:32         ` Jiri Pirko
@ 2019-02-28 16:24         ` Jakub Kicinski
  2019-03-01  7:25           ` Jiri Pirko
  1 sibling, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-28 16:24 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev, parav, jgg

On Thu, 28 Feb 2019 09:56:24 +0100, Jiri Pirko wrote:
> Wed, Feb 27, 2019 at 07:30:00PM CET, jakub.kicinski@netronome.com wrote:
> >On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote:  
> >> Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote:  
> >> >PCI endpoint corresponds to a PCI device, but such device
> >> >can have one more more logical device ports associated with it.
> >> >We need a way to distinguish those. Add a PCI subport in the
> >> >dumps and print the info in phys_port_name appropriately.
> >> >
> >> >This is not equivalent to port splitting, there is no split
> >> >group. It's just a way of representing multiple netdevs on
> >> >a single PCI function.
> >> >
> >> >Note that the quality of being multiport pertains only to
> >> >the PCI function itself. A PF having multiple netdevs does
> >> >not mean that its VFs will also have multiple, or that VFs
> >> >are associated with any particular port of a multiport VF.  
> >> 
> >> We've been discussing the problem of subport (we call it "subfunction"
> >> or "SF") for some time internally. Turned out, this is probably harder
> >> task to model. Please prove me wrong.
> >> 
> >> The nature of VF makes it a logically separate entity. It has a separate
> >> PCI address, it should therefore have a separate devlink instance.
> >> You can pass it through to VM, then the same devlink instance should be
> >> created inside the VM and disappear from the host.  
> >
> >Depends what a devlink instance represents :/  On one hand you may want
> >to create an instance for a VF to allow it to spawn soft ports, on the
> >other you may want to group multiple functions together.
> >
> >IOW if devlink instance is for an ASIC, there should be one per device
> >per host.  So if we start connecting multiple functions (PFs and/or VFs)
> >to one host we should probably introduce the notion of devlink aliases
> >or some such (so that multiple bus addresses can target the same  
> 
> Hmm. Like VF address -> PF address alias? That would be confusing to see
> eswitch ports under VF devlink instance... I probably did not get you
> right.

No eswitch ports under VF, more in case of mutli-PF.  Bus addresses of
all PFs aliasing to the same devlink instance.

> >devlink instance).  Those less pipelined NICs can forward between
> >ports, but still want a function per port (otherwise user space
> >sometimes gets confused).  If we have multiple functions which are on
> >the same "switchid" they should have a single devlink instance if you
> >ask me.  That instance will have all the ports of the device.  
> 
> Okay, that makes sense. But the question it, can the same devlink
> instance contain ports that does not have "Switchid"?

No strong preference if switchid is different.  To me devlink is an ASIC
instance, if the multiport card is constructed by copy-pasting the same
IP twice onto a die, and the ports really are completely separate, there
is no reason to require single devlink instance.

> I think it would be beneficial to have the switchid shown for devlink
> ports too. Then it is clean that the devlink ports with the same
> switchid belong to the same switch, and other ports under the same
> devlink instance (like PF itself) is separate, but still under the same
> ASIC.

Sure, you mean in terms of UI - user space can do a link dump or get
that from sysfs, right?

> >You say disappear from the host - what do you mean.  Are you referring
> >to the VF port disappearing?  But on the switch the port is still  
> 
> No, VF itself. eswitch port will be still there on the host.
> 
> 
> >there, and you should show the subports on the PF side IMHO.  Devlink
> >ports should allow users to understand the topology of the switch.  
> 
> What do you mean by "topology"?

Mostly which ports are part of the switch and what's their "flavour".
Also (less importantly) which host netdevs are "peers" of eswitch ports.

> >Is spawning VMDq sub-instances the only thing we can think of that VMs
> >may want to do?  Are there any other uses?
> >  
> >> SF (or subport) feels similar to that. Basically it is exactly the same
> >> thing as VF, only does reside under PF PCI function.
> >> 
> >> That is why I think, for sake of consistency, it should have a separate
> >> devlink entity as well. The problem is correct sysfs modelling and
> >> devlink handle derived from that. Parav is working on a simple soft
> >> bus for this purpose called "subbus". There is a RFC floating around on
> >> Mellanox internal mailing list, looks like it is time to send it
> >> upstream.
> >> 
> >> Then each PF driver which have SFs would register subbus devices
> >> according to SFs/subports and they would be properly handled by bus
> >> probe, devlink and devlink port and netdev instances created.
> >> 
> >> Ccing Parav and Jason.  
> >
> >You guys come from the RDMA side of the world, with which I'm less
> >familiar, and the soft bus + spawning devices seems to be a popular
> >design there.  Could you describe the advantages of that model for 
> >the sake of the netdev-only folks? :)  
> 
> I'll try to draw some ascii art :)

Yess :)

> >Another term that gets thrown into the mix here is mediated devices,
> >right?  If you wanna pass the sub-spawn-soft-port to a VM.  Or run 
> >DPDK on some queues.
> >
> >To state the obvious AF_XDP and macvlan offload were are previous
> >answers to some of those use cases.  What is the forwarding model
> >for those subports?  Are we going to allow flower rules from VMs?
> >Is it going to be dst MAC only?  Or is the hypervisor going to forward
> >as it sees appropriate (OvS + "repr"/port netdev)?  

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-27 22:42           ` Jakub Kicinski
  2019-02-28  8:44             ` Jiri Pirko
@ 2019-02-28 16:24             ` David Ahern
  1 sibling, 0 replies; 44+ messages in thread
From: David Ahern @ 2019-02-28 16:24 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: davem, oss-drivers, netdev

On 2/27/19 3:42 PM, Jakub Kicinski wrote:
> On Wed, 27 Feb 2019 21:17:27 +0100, Jiri Pirko wrote:
>> Wed, Feb 27, 2019 at 06:23:26PM CET, jakub.kicinski@netronome.com wrote:
>>> On Wed, 27 Feb 2019 13:41:35 +0100, Jiri Pirko wrote:  
>>>> Wed, Feb 27, 2019 at 01:23:27PM CET, jiri@resnulli.us wrote:  
>>>>> Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:    
>>>>>> Current port flavours cover simple switches and DSA.  Add PF
>>>>>> and VF flavours to cover "switchdev" SR-IOV NICs.
>>>>>>
>>>>>> Example devlink user space output:
>>>>>>
>>>>>> $ devlink port
>>>>>> pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>>>>>> pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
>>>>>> pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
>>>>>> pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1    
>>>>>
>>>>> Wait a second, howcome pf and vfs have the same PCI address?    
>>>>
>>>> Oh, I think you have these as eswitch port representors. Confusing...  
>>>
>>> FWIW I don't like the word representor, its a port. We don't call
>>> physical ports "representors" even though from ASIC's point of view
>>> they are exactly the same.  
>>
>> My point is, they are not PFs and VFs. We have to find a way to clearly
>> see what's what.
> 
> Okay, so let me explain the way I see it, and you can explain your way
> or tell me where you disagree.  Those devlink ports and netdevs are pf
> ports and vf ports, which most refer to as "representor".  If one sends
> packets to the netdev indicated in DEVLINK_ATTR_PORT_NETDEV_*
> attributes they will _egress_ the switch from that port.  For physical
> port that means going onto the Ethernet or IB wire.  For PCIe it means
> getting DMAed over the PCIe link to host memory.
> 
> There is a netdev construct on the host which is in charge of that 
> host memory.  Maybe we shall call that host netdev?
> 
> (I said I don't like "representor" for the reason that people don't
> refer to the physical port as "representor" even though it has exactly
> the semantics we are following.  This distinction between behaviour of
> physical and PCI ports is what leads to confusion, I think.)
> 
> Let me bring out the moose :)
> 
>                     HOST A             ||          HOST B                  
>                                        ||                                  
>         PF A       | V | V | V | V     ||       PF B        | V | V | V               
>                    |*F |*F |*F |*F ... ||                   |*F |*F |*F ...           
> *port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2            
>                                        ||                                  
>              PCI Express link          ||        PCI Express link          
>         \      \      \  |   |   |          |       |      /   /   /         
>          \      \      \ |   |   |          |       |     /   /   /          
>       /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
>       ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||  
>   i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
> d n H ||  |               <<==========                              |   || d n H
> e s O ||  |                            ==========>>                 |   || e s O
> v t S ||  |                    SR-IOV e-switch                      |   || v t S
> l a T ||  |               <<==========                              |   || l a T
> i n   ||  |                            ==========>>                 |   || i n
> n c A ||  |               ________ _________ ________               |   || n c B
> k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
>       ||  \---------------------------------------------------------/   ||                 
>       \/                      |        |         |                      \/    
>                               |        |         |                          
>                                  ||         ||                               
>                           MAC 0  ||  MAC 1  || MAC 2                   
>                                  ||         ||                              
> 
> Things marked with + are devlink ports and have port (-repr-) netdevs
> (including physical ports).
> Things marked with * are host netdevs, don't have devlink ports.
> 

That would a good update to the commit message or cover letter.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 6/8] devlink: introduce port's peer netdevs
  2019-02-28  9:00       ` Jiri Pirko
@ 2019-02-28 16:36         ` Jakub Kicinski
  2019-03-01  7:37           ` Jiri Pirko
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-02-28 16:36 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev

On Thu, 28 Feb 2019 10:00:54 +0100, Jiri Pirko wrote:
> Wed, Feb 27, 2019 at 07:47:42PM CET, jakub.kicinski@netronome.com wrote:
> >On Wed, 27 Feb 2019 14:08:29 +0100, Jiri Pirko wrote:  
> >> Tue, Feb 26, 2019 at 07:24:34PM CET, jakub.kicinski@netronome.com wrote:  
> >> >Devlink ports represent ports of a switch device (or SR-IOV
> >> >NIC which has an embedded switch). In case of SR-IOV when
> >> >PCIe PFs are exposed the PFs which are directly connected
> >> >to the local machine may also spawn PF netdev (much like
> >> >VFs have a port/"repr" and an actual VF netdev).
> >> >
> >> >Allow devlink to expose such linking. There is currently no
> >> >way to find out which netdev corresponds to which PF.
> >> >
> >> >Example:
> >> >
> >> >$ devlink port
> >> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
> >> >pci/0000:82:00.0/10000: type eth netdev eth1 flavour pci_pf pf 0 peer_netdev enp130s0
> >> >pci/0000:82:00.0/10001: type eth netdev eth0 flavour pci_vf pf 0 vf 0
> >> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1    
> >> 
> >> Peer as the other side of a "virtual cable". For PF, that is probably
> >> sufficient. But I think what a "peer of devlink port" should be "a
> >> devlink port".  
> >
> >Maybe I'm not clear on what devlink port is - to me its a port of the
> >ASIC.  The notion of devlink port connected to devlink port seems
> >to counter such definition :S    
> 
> "port of the ASIC" in a sence of "eswitch ports"?

Yes.

> >I do not think that every netdev should have a devlink port associated.
> >  
> >> Not sure about VF.
> >> 
> >> Consider a simple problem of setting up a VF mac address. In legacy, you
> >> do it like this:
> >> $ ip link set eth2 vf 1 mac 00:52:44:11:22:33
> >> However, in new model, you so far cannot do that.  
> >
> >Why?
> >
> >$ devlink port set pci/0000:82:00.0/10001 peer_eth_addr 00:52:44:11:22:33  
> 
> Yeah. That is not yet implemented. I agree it is most straightforward.
> The question is, is it fine to have set of:
> peer_eth_addr
> peer_mtu
> peer_something_else
> Or rather to have some object to pin this on. Something like:
> 
> $ devlink port peer set pci/0000:82:00.0/10001 eth_addr 00:52:44:11:22:33

I do like the object one better, would this mean I should restructure
the peer stuff somehow (netlink attribute structure)?

The MTU stuff is tricky, perhaps best left for its own series ;)

> >It's more of a neighbour info situation than a local port situation.
> >  
> >> What I was thinking about was some "dummy peer" which would be on the
> >> host. Not sure if only as a "dummy peer devlink port" or even as some
> >> sort of "dummy netdev".
> >> 
> >> One way or another, it would provide the user some info about which VF
> >> representor is connected to which VF in VM (mac mapping).  
> >
> >Ack, but isn't the MAC setting is the only thing we're missing from
> >"switchdev SR-IOV"?  Would the "dummy netdev" be used for anything
> >else?  I would rather not introduce new netdev just to do that   
> 
> Agreed. It was just a wild idea :)

:)

> >(that'd be a third for that port.)  

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-28 16:24         ` Jakub Kicinski
@ 2019-03-01  7:25           ` Jiri Pirko
  2019-03-01 16:04             ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-03-01  7:25 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev, parav, jgg

Thu, Feb 28, 2019 at 05:24:04PM CET, jakub.kicinski@netronome.com wrote:
>On Thu, 28 Feb 2019 09:56:24 +0100, Jiri Pirko wrote:
>> Wed, Feb 27, 2019 at 07:30:00PM CET, jakub.kicinski@netronome.com wrote:
>> >On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote:  
>> >> Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote:  
>> >> >PCI endpoint corresponds to a PCI device, but such device
>> >> >can have one more more logical device ports associated with it.
>> >> >We need a way to distinguish those. Add a PCI subport in the
>> >> >dumps and print the info in phys_port_name appropriately.
>> >> >
>> >> >This is not equivalent to port splitting, there is no split
>> >> >group. It's just a way of representing multiple netdevs on
>> >> >a single PCI function.
>> >> >
>> >> >Note that the quality of being multiport pertains only to
>> >> >the PCI function itself. A PF having multiple netdevs does
>> >> >not mean that its VFs will also have multiple, or that VFs
>> >> >are associated with any particular port of a multiport VF.  
>> >> 
>> >> We've been discussing the problem of subport (we call it "subfunction"
>> >> or "SF") for some time internally. Turned out, this is probably harder
>> >> task to model. Please prove me wrong.
>> >> 
>> >> The nature of VF makes it a logically separate entity. It has a separate
>> >> PCI address, it should therefore have a separate devlink instance.
>> >> You can pass it through to VM, then the same devlink instance should be
>> >> created inside the VM and disappear from the host.  
>> >
>> >Depends what a devlink instance represents :/  On one hand you may want
>> >to create an instance for a VF to allow it to spawn soft ports, on the
>> >other you may want to group multiple functions together.
>> >
>> >IOW if devlink instance is for an ASIC, there should be one per device
>> >per host.  So if we start connecting multiple functions (PFs and/or VFs)
>> >to one host we should probably introduce the notion of devlink aliases
>> >or some such (so that multiple bus addresses can target the same  
>> 
>> Hmm. Like VF address -> PF address alias? That would be confusing to see
>> eswitch ports under VF devlink instance... I probably did not get you
>> right.
>
>No eswitch ports under VF, more in case of mutli-PF.  Bus addresses of
>all PFs aliasing to the same devlink instance.

The multi-PF aliasing makes sense to me.


>
>> >devlink instance).  Those less pipelined NICs can forward between
>> >ports, but still want a function per port (otherwise user space
>> >sometimes gets confused).  If we have multiple functions which are on
>> >the same "switchid" they should have a single devlink instance if you
>> >ask me.  That instance will have all the ports of the device.  
>> 
>> Okay, that makes sense. But the question it, can the same devlink
>> instance contain ports that does not have "Switchid"?
>
>No strong preference if switchid is different.  To me devlink is an ASIC
>instance, if the multiport card is constructed by copy-pasting the same
>IP twice onto a die, and the ports really are completely separate, there
>is no reason to require single devlink instance.

Okay.


>
>> I think it would be beneficial to have the switchid shown for devlink
>> ports too. Then it is clean that the devlink ports with the same
>> switchid belong to the same switch, and other ports under the same
>> devlink instance (like PF itself) is separate, but still under the same
>> ASIC.
>
>Sure, you mean in terms of UI - user space can do a link dump or get
>that from sysfs, right?

I thinking about moving it to devlink. I'll work on it more today.


>
>> >You say disappear from the host - what do you mean.  Are you referring
>> >to the VF port disappearing?  But on the switch the port is still  
>> 
>> No, VF itself. eswitch port will be still there on the host.
>> 
>> 
>> >there, and you should show the subports on the PF side IMHO.  Devlink
>> >ports should allow users to understand the topology of the switch.  
>> 
>> What do you mean by "topology"?
>
>Mostly which ports are part of the switch and what's their "flavour".
>Also (less importantly) which host netdevs are "peers" of eswitch ports.

Makes sense.


>
>> >Is spawning VMDq sub-instances the only thing we can think of that VMs
>> >may want to do?  Are there any other uses?
>> >  
>> >> SF (or subport) feels similar to that. Basically it is exactly the same
>> >> thing as VF, only does reside under PF PCI function.
>> >> 
>> >> That is why I think, for sake of consistency, it should have a separate
>> >> devlink entity as well. The problem is correct sysfs modelling and
>> >> devlink handle derived from that. Parav is working on a simple soft
>> >> bus for this purpose called "subbus". There is a RFC floating around on
>> >> Mellanox internal mailing list, looks like it is time to send it
>> >> upstream.
>> >> 
>> >> Then each PF driver which have SFs would register subbus devices
>> >> according to SFs/subports and they would be properly handled by bus
>> >> probe, devlink and devlink port and netdev instances created.
>> >> 
>> >> Ccing Parav and Jason.  
>> >
>> >You guys come from the RDMA side of the world, with which I'm less
>> >familiar, and the soft bus + spawning devices seems to be a popular
>> >design there.  Could you describe the advantages of that model for 
>> >the sake of the netdev-only folks? :)  
>> 
>> I'll try to draw some ascii art :)
>
>Yess :)
>
>> >Another term that gets thrown into the mix here is mediated devices,
>> >right?  If you wanna pass the sub-spawn-soft-port to a VM.  Or run 
>> >DPDK on some queues.
>> >
>> >To state the obvious AF_XDP and macvlan offload were are previous
>> >answers to some of those use cases.  What is the forwarding model
>> >for those subports?  Are we going to allow flower rules from VMs?
>> >Is it going to be dst MAC only?  Or is the hypervisor going to forward
>> >as it sees appropriate (OvS + "repr"/port netdev)?  

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 6/8] devlink: introduce port's peer netdevs
  2019-02-28 16:36         ` Jakub Kicinski
@ 2019-03-01  7:37           ` Jiri Pirko
  2019-03-01 16:05             ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-03-01  7:37 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev

Thu, Feb 28, 2019 at 05:36:44PM CET, jakub.kicinski@netronome.com wrote:
>On Thu, 28 Feb 2019 10:00:54 +0100, Jiri Pirko wrote:
>> Wed, Feb 27, 2019 at 07:47:42PM CET, jakub.kicinski@netronome.com wrote:
>> >On Wed, 27 Feb 2019 14:08:29 +0100, Jiri Pirko wrote:  
>> >> Tue, Feb 26, 2019 at 07:24:34PM CET, jakub.kicinski@netronome.com wrote:  
>> >> >Devlink ports represent ports of a switch device (or SR-IOV
>> >> >NIC which has an embedded switch). In case of SR-IOV when
>> >> >PCIe PFs are exposed the PFs which are directly connected
>> >> >to the local machine may also spawn PF netdev (much like
>> >> >VFs have a port/"repr" and an actual VF netdev).
>> >> >
>> >> >Allow devlink to expose such linking. There is currently no
>> >> >way to find out which netdev corresponds to which PF.
>> >> >
>> >> >Example:
>> >> >
>> >> >$ devlink port
>> >> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>> >> >pci/0000:82:00.0/10000: type eth netdev eth1 flavour pci_pf pf 0 peer_netdev enp130s0
>> >> >pci/0000:82:00.0/10001: type eth netdev eth0 flavour pci_vf pf 0 vf 0
>> >> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1    
>> >> 
>> >> Peer as the other side of a "virtual cable". For PF, that is probably
>> >> sufficient. But I think what a "peer of devlink port" should be "a
>> >> devlink port".  
>> >
>> >Maybe I'm not clear on what devlink port is - to me its a port of the
>> >ASIC.  The notion of devlink port connected to devlink port seems
>> >to counter such definition :S    
>> 
>> "port of the ASIC" in a sence of "eswitch ports"?
>
>Yes.
>
>> >I do not think that every netdev should have a devlink port associated.
>> >  
>> >> Not sure about VF.
>> >> 
>> >> Consider a simple problem of setting up a VF mac address. In legacy, you
>> >> do it like this:
>> >> $ ip link set eth2 vf 1 mac 00:52:44:11:22:33
>> >> However, in new model, you so far cannot do that.  
>> >
>> >Why?
>> >
>> >$ devlink port set pci/0000:82:00.0/10001 peer_eth_addr 00:52:44:11:22:33  
>> 
>> Yeah. That is not yet implemented. I agree it is most straightforward.
>> The question is, is it fine to have set of:
>> peer_eth_addr
>> peer_mtu
>> peer_something_else
>> Or rather to have some object to pin this on. Something like:
>> 
>> $ devlink port peer set pci/0000:82:00.0/10001 eth_addr 00:52:44:11:22:33
>
>I do like the object one better, would this mean I should restructure
>the peer stuff somehow (netlink attribute structure)?

Well we can introduce separate commands:
DEVLINK_CMD_PORT_PEER_GET
DEVLINK_CMD_PORT_PEER_SET

For "set" part, this would work nice. However for the "get" part, we
would have to call both DEVLINK_CMD_PORT_GET and
DEVLINK_CMD_PORT_PEER_GET. So probably better to add a nest attr:
DEVLINK_ATTR_PORT_PEER
and have attrs like:
DEVLINK_ATTR_PORT_PEER_HW_ADDR  (does not have to be always eth, right?)
DEVLINK_ATTR_PORT_PEER_TYPE (DEVLINK_PORT_TYPE_NOTSET/DEVLINK_PORT_TYPE_ETH/DEVLINK_PORT_TYPE_IB)
DEVLINK_ATTR_PORT_PEER_NETDEV_IFINDEX
DEVLINK_ATTR_PORT_PEER_NETDEV_NAME
DEVLINK_ATTR_PORT_PEER_NETDEV_IBDEV_NAME
in the nest.

The userspace part can stay as I described previously:
$ devlink port peer set pci/0000:82:00.0/10001 hw_addr 00:52:44:11:22:33

Not sure about "port show" output. In json, the "peer" things should be
under "peer" dictionary.



>
>The MTU stuff is tricky, perhaps best left for its own series ;)
>
>> >It's more of a neighbour info situation than a local port situation.
>> >  
>> >> What I was thinking about was some "dummy peer" which would be on the
>> >> host. Not sure if only as a "dummy peer devlink port" or even as some
>> >> sort of "dummy netdev".
>> >> 
>> >> One way or another, it would provide the user some info about which VF
>> >> representor is connected to which VF in VM (mac mapping).  
>> >
>> >Ack, but isn't the MAC setting is the only thing we're missing from
>> >"switchdev SR-IOV"?  Would the "dummy netdev" be used for anything
>> >else?  I would rather not introduce new netdev just to do that   
>> 
>> Agreed. It was just a wild idea :)
>
>:)
>
>> >(that'd be a third for that port.)  

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-03-01  7:25           ` Jiri Pirko
@ 2019-03-01 16:04             ` Jakub Kicinski
  2019-03-01 16:20               ` Jiri Pirko
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-03-01 16:04 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev, parav, jgg

On Fri, 1 Mar 2019 08:25:57 +0100, Jiri Pirko wrote:
> >> I think it would be beneficial to have the switchid shown for devlink
> >> ports too. Then it is clean that the devlink ports with the same
> >> switchid belong to the same switch, and other ports under the same
> >> devlink instance (like PF itself) is separate, but still under the same
> >> ASIC.  
> >
> >Sure, you mean in terms of UI - user space can do a link dump or get
> >that from sysfs, right?  
> 
> I thinking about moving it to devlink. I'll work on it more today.

At the kernel level?  Are we sure this is appropriate?  I'd think
switchid is in essence a forwarding attribute, so it belongs with
netdevs, no?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 6/8] devlink: introduce port's peer netdevs
  2019-03-01  7:37           ` Jiri Pirko
@ 2019-03-01 16:05             ` Jakub Kicinski
  0 siblings, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2019-03-01 16:05 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev

On Fri, 1 Mar 2019 08:37:00 +0100, Jiri Pirko wrote:
> >> >I do not think that every netdev should have a devlink port associated.
> >> >    
> >> >> Not sure about VF.
> >> >> 
> >> >> Consider a simple problem of setting up a VF mac address. In legacy, you
> >> >> do it like this:
> >> >> $ ip link set eth2 vf 1 mac 00:52:44:11:22:33
> >> >> However, in new model, you so far cannot do that.    
> >> >
> >> >Why?
> >> >
> >> >$ devlink port set pci/0000:82:00.0/10001 peer_eth_addr 00:52:44:11:22:33    
> >> 
> >> Yeah. That is not yet implemented. I agree it is most straightforward.
> >> The question is, is it fine to have set of:
> >> peer_eth_addr
> >> peer_mtu
> >> peer_something_else
> >> Or rather to have some object to pin this on. Something like:
> >> 
> >> $ devlink port peer set pci/0000:82:00.0/10001 eth_addr 00:52:44:11:22:33  
> >
> >I do like the object one better, would this mean I should restructure
> >the peer stuff somehow (netlink attribute structure)?  
> 
> Well we can introduce separate commands:
> DEVLINK_CMD_PORT_PEER_GET
> DEVLINK_CMD_PORT_PEER_SET
> 
> For "set" part, this would work nice. However for the "get" part, we
> would have to call both DEVLINK_CMD_PORT_GET and
> DEVLINK_CMD_PORT_PEER_GET. So probably better to add a nest attr:
> DEVLINK_ATTR_PORT_PEER
> and have attrs like:
> DEVLINK_ATTR_PORT_PEER_HW_ADDR  (does not have to be always eth, right?)
> DEVLINK_ATTR_PORT_PEER_TYPE (DEVLINK_PORT_TYPE_NOTSET/DEVLINK_PORT_TYPE_ETH/DEVLINK_PORT_TYPE_IB)
> DEVLINK_ATTR_PORT_PEER_NETDEV_IFINDEX
> DEVLINK_ATTR_PORT_PEER_NETDEV_NAME
> DEVLINK_ATTR_PORT_PEER_NETDEV_IBDEV_NAME
> in the nest.
> 
> The userspace part can stay as I described previously:
> $ devlink port peer set pci/0000:82:00.0/10001 hw_addr 00:52:44:11:22:33
> 
> Not sure about "port show" output. In json, the "peer" things should be
> under "peer" dictionary.

I'll make it so.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-03-01 16:04             ` Jakub Kicinski
@ 2019-03-01 16:20               ` Jiri Pirko
  0 siblings, 0 replies; 44+ messages in thread
From: Jiri Pirko @ 2019-03-01 16:20 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev, parav, jgg

Fri, Mar 01, 2019 at 05:04:01PM CET, jakub.kicinski@netronome.com wrote:
>On Fri, 1 Mar 2019 08:25:57 +0100, Jiri Pirko wrote:
>> >> I think it would be beneficial to have the switchid shown for devlink
>> >> ports too. Then it is clean that the devlink ports with the same
>> >> switchid belong to the same switch, and other ports under the same
>> >> devlink instance (like PF itself) is separate, but still under the same
>> >> ASIC.  
>> >
>> >Sure, you mean in terms of UI - user space can do a link dump or get
>> >that from sysfs, right?  
>> 
>> I thinking about moving it to devlink. I'll work on it more today.
>
>At the kernel level?  Are we sure this is appropriate?  I'd think
>switchid is in essence a forwarding attribute, so it belongs with
>netdevs, no?

It's an id which identifies ports which belong to the same switch.
Internally in kernel it is used for forwarding purposes. But it is
exposed to user too. I believe it is a nice way to see which devlink
ports belong to switch and which not. Please see the RFC I send today.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-02-27 12:16   ` Jiri Pirko
@ 2019-03-04  4:59     ` Parav Pandit
  2019-03-04  7:30       ` Jiri Pirko
  0 siblings, 1 reply; 44+ messages in thread
From: Parav Pandit @ 2019-03-04  4:59 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski; +Cc: davem, oss-drivers, netdev



> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jiri Pirko
> Sent: Wednesday, February 27, 2019 6:17 AM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: davem@davemloft.net; oss-drivers@netronome.com;
> netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
> 
> Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:
> >Current port flavours cover simple switches and DSA.  Add PF and VF
> >flavours to cover "switchdev" SR-IOV NICs.
> >
> >Example devlink user space output:
> >
> >$ devlink port
> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
> >pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
> >pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1
>
A given port is of its parent device.
In current scenario, its PF or VF.
Hence it should be device attribute and not a port attribute.
So devlink dev show command have to show what device flavour is.
Is it well known PCI VF or PF or something else.
It will show subdev device attribute and its parent PCI (PF/VF) devlink device.
So we should have device flovour as PCI_PF or PCI_VF or SUBDEV.

Again VF number showcasing here is very restrictive model.
Every PF/VF/Subdev represents its own 'port' and it is connected to eswitch 'port'.
Instead of showing VF here, it must be this 'port' or 'link' number that gives right view.
Which netdev represents which VF is already linked in the VF rep-netdev sysfs property.

So flavour should be something like 'hostport' and when port is registered for the eswitch side it should be 'switchport'.
With this there is very clear picture of which hostport is connected to which eswitch port.
Just like how we see in the physical world.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-27 12:37   ` Jiri Pirko
  2019-02-27 18:30     ` Jakub Kicinski
@ 2019-03-04  5:00     ` Parav Pandit
  1 sibling, 0 replies; 44+ messages in thread
From: Parav Pandit @ 2019-03-04  5:00 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski; +Cc: davem, oss-drivers, netdev, Jason Gunthorpe


> -----Original Message-----
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Wednesday, February 27, 2019 6:38 AM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: davem@davemloft.net; oss-drivers@netronome.com;
> netdev@vger.kernel.org; Parav Pandit <parav@mellanox.com>; Jason
> Gunthorpe <jgg@mellanox.com>
> Subject: Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI
> ports
> 
> Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote:
> >PCI endpoint corresponds to a PCI device, but such device can have one
> >more more logical device ports associated with it.
> >We need a way to distinguish those. Add a PCI subport in the dumps and
> >print the info in phys_port_name appropriately.
> >
> >This is not equivalent to port splitting, there is no split group. It's
> >just a way of representing multiple netdevs on a single PCI function.
> >
> >Note that the quality of being multiport pertains only to the PCI
> >function itself. A PF having multiple netdevs does not mean that its
> >VFs will also have multiple, or that VFs are associated with any
> >particular port of a multiport VF.
> >
> 
> We've been discussing the problem of subport (we call it "subfunction"
> or "SF") for some time internally. Turned out, this is probably harder task to
> model. Please prove me wrong.
> 
> The nature of VF makes it a logically separate entity. It has a separate PCI
> address, it should therefore have a separate devlink instance.
> You can pass it through to VM, then the same devlink instance should be
> created inside the VM and disappear from the host.
> 
> SF (or subport) feels similar to that. Basically it is exactly the same thing as
> VF, only does reside under PF PCI function.
> 
> That is why I think, for sake of consistency, it should have a separate devlink
> entity as well. The problem is correct sysfs modelling and devlink handle
> derived from that. Parav is working on a simple soft bus for this purpose
> called "subbus". There is a RFC floating around on Mellanox internal mailing
> list, looks like it is time to send it upstream.
> 
> Then each PF driver which have SFs would register subbus devices according
> to SFs/subports and they would be properly handled by bus probe, devlink
> and devlink port and netdev instances created.
> 
> Ccing Parav and Jason.

Thanks Jiri for the heads up. I sent the RFC in thread [1].

[1] https://lkml.org/lkml/2019/3/1/19



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 6/8] devlink: introduce port's peer netdevs
  2019-02-27 13:08   ` Jiri Pirko
  2019-02-27 18:47     ` Jakub Kicinski
@ 2019-03-04  5:07     ` Parav Pandit
  1 sibling, 0 replies; 44+ messages in thread
From: Parav Pandit @ 2019-03-04  5:07 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski; +Cc: davem, oss-drivers, netdev



> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jiri Pirko
> Sent: Wednesday, February 27, 2019 7:08 AM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: davem@davemloft.net; oss-drivers@netronome.com;
> netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 6/8] devlink: introduce port's peer netdevs
> 
> Tue, Feb 26, 2019 at 07:24:34PM CET, jakub.kicinski@netronome.com wrote:
> >Devlink ports represent ports of a switch device (or SR-IOV NIC which
> >has an embedded switch).
Need not be.
As you know,
In switchdev mode, there are two ports. hostport and switchport.
Switch port represented by representor netdevice and services ingress traffic from hostport.

hostport is connected to NIC txq, rxq and rdma queues.

Each should be managed/configured separately.


>  In case of SR-IOV when PCIe PFs are exposed
> >the PFs which are directly connected to the local machine may also
> >spawn PF netdev (much like VFs have a port/"repr" and an actual VF
> >netdev).
> >
> >Allow devlink to expose such linking. There is currently no way to find
> >out which netdev corresponds to which PF.
> >
> >Example:
> >
> >$ devlink port
> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
> >pci/0000:82:00.0/10000: type eth netdev eth1 flavour pci_pf pf 0
> >peer_netdev enp130s0
> >pci/0000:82:00.0/10001: type eth netdev eth0 flavour pci_vf pf 0 vf 0
> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pci_vf pf 0 vf 1
> 
> Peer as the other side of a "virtual cable". For PF, that is probably sufficient.
> But I think what a "peer of devlink port" should be "a devlink port".
> 
> Not sure about VF.
> 
> Consider a simple problem of setting up a VF mac address. In legacy, you do
> it like this:
> $ ip link set eth2 vf 1 mac 00:52:44:11:22:33 However, in new model, you so
> far cannot do that.
> 
> What I was thinking about was some "dummy peer" which would be on the
> host. Not sure if only as a "dummy peer devlink port" or even as some sort
> of "dummy netdev".
>
I think we shouldn't bring up this convoluted concept of dummy/peer netdev.
A simpler scheme is to represent and configure what port does.
Rep-netdev is well defined switchport.
hostport should be used to configured hostport configurations using devlink.
netdev object for hostport config is probably too heavy and we should just continue with hostport.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-03-04  4:59     ` Parav Pandit
@ 2019-03-04  7:30       ` Jiri Pirko
  2019-03-20 17:29         ` Abodunrin, Akeem G
  0 siblings, 1 reply; 44+ messages in thread
From: Jiri Pirko @ 2019-03-04  7:30 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jakub Kicinski, davem, oss-drivers, netdev

Mon, Mar 04, 2019 at 05:59:04AM CET, parav@mellanox.com wrote:
>
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
>> Behalf Of Jiri Pirko
>> Sent: Wednesday, February 27, 2019 6:17 AM
>> To: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Cc: davem@davemloft.net; oss-drivers@netronome.com;
>> netdev@vger.kernel.org
>> Subject: Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
>> 
>> Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:
>> >Current port flavours cover simple switches and DSA.  Add PF and VF
>> >flavours to cover "switchdev" SR-IOV NICs.
>> >
>> >Example devlink user space output:
>> >
>> >$ devlink port
>> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>> >pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
>> >pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf 0
>> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf 1
>>
>A given port is of its parent device.
>In current scenario, its PF or VF.
>Hence it should be device attribute and not a port attribute.

I think that this works. You have VF_rep ports, PF_rep ports and
PHYSICAL ports. In mlxsw for example, there are only PHYSICAL ports.
In sr-iov world, there is also a PHYSICAL port on the eswitch. The
others are either facing PF of VF. Looks accurate. I don't see any need
for "devlink dev" flavour.


>So devlink dev show command have to show what device flavour is.
>Is it well known PCI VF or PF or something else.
>It will show subdev device attribute and its parent PCI (PF/VF) devlink device.
>So we should have device flovour as PCI_PF or PCI_VF or SUBDEV.
>
>Again VF number showcasing here is very restrictive model.
>Every PF/VF/Subdev represents its own 'port' and it is connected to eswitch 'port'.
>Instead of showing VF here, it must be this 'port' or 'link' number that gives right view.
>Which netdev represents which VF is already linked in the VF rep-netdev sysfs property.

I think you confuse the eswtich ports (in Jakub's output it's them) and
the actual VF.


>
>So flavour should be something like 'hostport' and when port is registered for the eswitch side it should be 'switchport'.
>With this there is very clear picture of which hostport is connected to which eswitch port.
>Just like how we see in the physical world.
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-02-27 18:30     ` Jakub Kicinski
  2019-02-28  8:56       ` Jiri Pirko
@ 2019-03-04 16:15       ` Jason Gunthorpe
  2019-03-05  1:03         ` Jakub Kicinski
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2019-03-04 16:15 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, davem, oss-drivers, netdev, Parav Pandit

On Wed, Feb 27, 2019 at 10:30:00AM -0800, Jakub Kicinski wrote:
> On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote:
> > Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote:
> > >PCI endpoint corresponds to a PCI device, but such device
> > >can have one more more logical device ports associated with it.
> > >We need a way to distinguish those. Add a PCI subport in the
> > >dumps and print the info in phys_port_name appropriately.
> > >
> > >This is not equivalent to port splitting, there is no split
> > >group. It's just a way of representing multiple netdevs on
> > >a single PCI function.
> > >
> > >Note that the quality of being multiport pertains only to
> > >the PCI function itself. A PF having multiple netdevs does
> > >not mean that its VFs will also have multiple, or that VFs
> > >are associated with any particular port of a multiport VF.
> > 
> > We've been discussing the problem of subport (we call it "subfunction"
> > or "SF") for some time internally. Turned out, this is probably harder
> > task to model. Please prove me wrong.
> > 
> > The nature of VF makes it a logically separate entity. It has a separate
> > PCI address, it should therefore have a separate devlink instance.
> > You can pass it through to VM, then the same devlink instance should be
> > created inside the VM and disappear from the host.
> 
> Depends what a devlink instance represents :/  On one hand you may want
> to create an instance for a VF to allow it to spawn soft ports, on the
> other you may want to group multiple functions together.
> 
> IOW if devlink instance is for an ASIC, there should be one per device
> per host.  

Don't we already have devlink instances for every mlx5 physical port
and VF as they are unique PCI functions?

> You guys come from the RDMA side of the world, with which I'm less
> familiar, and the soft bus + spawning devices seems to be a popular
> design there.  Could you describe the advantages of that model for 
> the sake of the netdev-only folks? :)

I don't think we do this in RDMA at all yet, or maybe I'm not sure
what you are thinking of?

The forward looking discussion is mainly to create something like
macvlan that can be offloaded, so we have something like a 'rdma
offload HW context' for each 'rdma macvlan'

.. and that 'rdma offload HW context' has all the same knobs as an
offload context for a VF that would normally be tied to a unique PCI
BDF (via devlink). But in this case there is no unique BDF.

From another angle this is broadly similar to the scalable IOV stuff,
but without placing requirements on the host IOMMU to implement it.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-03-04 16:15       ` Jason Gunthorpe
@ 2019-03-05  1:03         ` Jakub Kicinski
  2019-03-05  1:30           ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-03-05  1:03 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Jiri Pirko, davem, oss-drivers, netdev, Parav Pandit

On Mon, 4 Mar 2019 16:15:14 +0000, Jason Gunthorpe wrote:
> On Wed, Feb 27, 2019 at 10:30:00AM -0800, Jakub Kicinski wrote:
> > On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote:  
> > > Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote:  
> > > >PCI endpoint corresponds to a PCI device, but such device
> > > >can have one more more logical device ports associated with it.
> > > >We need a way to distinguish those. Add a PCI subport in the
> > > >dumps and print the info in phys_port_name appropriately.
> > > >
> > > >This is not equivalent to port splitting, there is no split
> > > >group. It's just a way of representing multiple netdevs on
> > > >a single PCI function.
> > > >
> > > >Note that the quality of being multiport pertains only to
> > > >the PCI function itself. A PF having multiple netdevs does
> > > >not mean that its VFs will also have multiple, or that VFs
> > > >are associated with any particular port of a multiport VF.  
> > > 
> > > We've been discussing the problem of subport (we call it "subfunction"
> > > or "SF") for some time internally. Turned out, this is probably harder
> > > task to model. Please prove me wrong.
> > > 
> > > The nature of VF makes it a logically separate entity. It has a separate
> > > PCI address, it should therefore have a separate devlink instance.
> > > You can pass it through to VM, then the same devlink instance should be
> > > created inside the VM and disappear from the host.  
> > 
> > Depends what a devlink instance represents :/  On one hand you may want
> > to create an instance for a VF to allow it to spawn soft ports, on the
> > other you may want to group multiple functions together.
> > 
> > IOW if devlink instance is for an ASIC, there should be one per device
> > per host.    
> 
> Don't we already have devlink instances for every mlx5 physical port
> and VF as they are unique PCI functions?

That's a very NIC-centric view of the world, though.  Equating devlink
instances to ports, and further to PCI devices.  Its fundamentally
different from what switches and some NICs do, where all ports are under
single devlink instance.

> > You guys come from the RDMA side of the world, with which I'm less
> > familiar, and the soft bus + spawning devices seems to be a popular
> > design there.  Could you describe the advantages of that model for 
> > the sake of the netdev-only folks? :)  
> 
> I don't think we do this in RDMA at all yet, or maybe I'm not sure
> what you are thinking of?

Mm.. I caught an Intel patch set recently which was talking about buses
and spawning devices.  It must have been a different kettle of fish.

> The forward looking discussion is mainly to create something like
> macvlan that can be offloaded, so we have something like a 'rdma
> offload HW context' for each 'rdma macvlan'
> 
> .. and that 'rdma offload HW context' has all the same knobs as an
> offload context for a VF that would normally be tied to a unique PCI
> BDF (via devlink). But in this case there is no unique BDF.
> 
> From another angle this is broadly similar to the scalable IOV stuff,
> but without placing requirements on the host IOMMU to implement it.

Understood, thanks for clarifying.  The question becomes how do we
square this SR-IOV world with world of switches.  Is the hypervisor
supposed to know that the VM has partitioned its VF?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-03-05  1:03         ` Jakub Kicinski
@ 2019-03-05  1:30           ` Jason Gunthorpe
  2019-03-05  2:11             ` Jakub Kicinski
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2019-03-05  1:30 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, davem, oss-drivers, netdev, Parav Pandit

On Mon, Mar 04, 2019 at 05:03:20PM -0800, Jakub Kicinski wrote:

> > Don't we already have devlink instances for every mlx5 physical port
> > and VF as they are unique PCI functions?
> 
> That's a very NIC-centric view of the world, though.  Equating devlink
> instances to ports, and further to PCI devices.  Its fundamentally
> different from what switches and some NICs do, where all ports are under
> single devlink instance.

I think, as a practical matter, it is a bit hard to recombine an asic
that presents multiple PCI BDFs into a single SW object. It is tricky
to give stable labels to things, to leave gaps to allow for uncertain
discovey, to co-ordinate between multiple struct pci_device drivers
probe functions, etc.

And at least with devlink, if you have a object layer that is broader
then PCI BDF, how do the devlink commands work? Are all BDFs just an
alias for this hidden super object?

Do any drivers attempt to provide single instant made up of merged
BDFs?

In other words, is a PCI BDF really the largest granularity that
devlink can address today?

At least in RDMA we have drivers doing all combinations of this:
multiple ports per BDF, one port per BDF, and one composite RDMA
device formed by combining multiple BDFs worth of ports together.

> > > You guys come from the RDMA side of the world, with which I'm less
> > > familiar, and the soft bus + spawning devices seems to be a popular
> > > design there.  Could you describe the advantages of that model for 
> > > the sake of the netdev-only folks? :)  
> > 
> > I don't think we do this in RDMA at all yet, or maybe I'm not sure
> > what you are thinking of?
> 
> Mm.. I caught an Intel patch set recently which was talking about buses
> and spawning devices.  It must have been a different kettle of fish.

That sounds like scalable iov..

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-03-05  1:30           ` Jason Gunthorpe
@ 2019-03-05  2:11             ` Jakub Kicinski
  2019-03-05 22:11               ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2019-03-05  2:11 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Jiri Pirko, davem, oss-drivers, netdev, Parav Pandit

On Tue, 5 Mar 2019 01:30:19 +0000, Jason Gunthorpe wrote:
> On Mon, Mar 04, 2019 at 05:03:20PM -0800, Jakub Kicinski wrote:
> 
> > > Don't we already have devlink instances for every mlx5 physical port
> > > and VF as they are unique PCI functions?  
> > 
> > That's a very NIC-centric view of the world, though.  Equating devlink
> > instances to ports, and further to PCI devices.  Its fundamentally
> > different from what switches and some NICs do, where all ports are under
> > single devlink instance.  
> 
> I think, as a practical matter, it is a bit hard to recombine an asic
> that presents multiple PCI BDFs into a single SW object. It is tricky
> to give stable labels to things, to leave gaps to allow for uncertain
> discovey, to co-ordinate between multiple struct pci_device drivers
> probe functions, etc.

It is tricky indeed, hence my so far unsuccessful search for a stable
handle :/  One thing which would not make things easier tho, is if we
objects we use to model this scenario don't have clear meanings...

> And at least with devlink, if you have a object layer that is broader
> then PCI BDF, how do the devlink commands work? Are all BDFs just an
> alias for this hidden super object?

My thinking was that they'd alias.

> Do any drivers attempt to provide single instant made up of merged
> BDFs?

Not yet, but our NFP can do it.  NFP used to be single PF per host,
which made life easier, but the silicon team was persuaded to remove
that comfort :)

> In other words, is a PCI BDF really the largest granularity that
> devlink can address today?

Yes, DBDF is the largest today, _but_ most advanced devices (mlxsw, nfp)
have only one PF per host.  IOW we existed blissfully in a world where
devices either pipelined from port to PF or had only one PF.

> At least in RDMA we have drivers doing all combinations of this:
> multiple ports per BDF, one port per BDF, and one composite RDMA
> device formed by combining multiple BDFs worth of ports together.

Right, last but not least we have the case where there is one port but
multiple links (for NUMA, or just because 1 PCIe link can't really cope
with 200Gbps).  In that case which DBDF would the port go to? :(
Do all internal info of the ASIC (health, regions, sbs) get registered
twice?

> > > > You guys come from the RDMA side of the world, with which I'm less
> > > > familiar, and the soft bus + spawning devices seems to be a popular
> > > > design there.  Could you describe the advantages of that model for 
> > > > the sake of the netdev-only folks? :)    
> > > 
> > > I don't think we do this in RDMA at all yet, or maybe I'm not sure
> > > what you are thinking of?  
> > 
> > Mm.. I caught an Intel patch set recently which was talking about buses
> > and spawning devices.  It must have been a different kettle of fish.  
> 
> That sounds like scalable iov..
> 
> Jason


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports
  2019-03-05  2:11             ` Jakub Kicinski
@ 2019-03-05 22:11               ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2019-03-05 22:11 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, davem, oss-drivers, netdev, Parav Pandit

On Mon, Mar 04, 2019 at 06:11:07PM -0800, Jakub Kicinski wrote:

> > At least in RDMA we have drivers doing all combinations of this:
> > multiple ports per BDF, one port per BDF, and one composite RDMA
> > device formed by combining multiple BDFs worth of ports together.
> 
> Right, last but not least we have the case where there is one port but
> multiple links (for NUMA, or just because 1 PCIe link can't really cope
> with 200Gbps).  In that case which DBDF would the port go to? :(
> Do all internal info of the ASIC (health, regions, sbs) get registered
> twice?

This I don't know, at least for RDMA this configuration gets confusing
very fast and devlink is the least of the worries..

Personally I would advocate for a master/slave kind of arrangement
where the master BDF has a different PCI DID from the slaves. devlink
and other kernel objects hang off the master.

The slave port is then only used to carry selected NUMA aware data
path traffic and doesn't show in devlink.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-03-04  7:30       ` Jiri Pirko
@ 2019-03-20 17:29         ` Abodunrin, Akeem G
  2019-03-21 12:26           ` Jiri Pirko
  0 siblings, 1 reply; 44+ messages in thread
From: Abodunrin, Akeem G @ 2019-03-20 17:29 UTC (permalink / raw)
  To: Jiri Pirko, Parav Pandit; +Cc: Jakub Kicinski, davem, oss-drivers, netdev



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Jiri Pirko
> Sent: Sunday, March 03, 2019 11:30 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; davem@davemloft.net;
> oss-drivers@netronome.com; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
> 
> Mon, Mar 04, 2019 at 05:59:04AM CET, parav@mellanox.com wrote:
> >
> >
> >> -----Original Message-----
> >> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> >> Behalf Of Jiri Pirko
> >> Sent: Wednesday, February 27, 2019 6:17 AM
> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> >> Cc: davem@davemloft.net; oss-drivers@netronome.com;
> >> netdev@vger.kernel.org
> >> Subject: Re: [PATCH net-next 2/8] devlink: add PF and VF port
> >> flavours
> >>
> >> Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:
> >> >Current port flavours cover simple switches and DSA.  Add PF and VF
> >> >flavours to cover "switchdev" SR-IOV NICs.
> >> >
> >> >Example devlink user space output:
> >> >
> >> >$ devlink port
> >> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
> >> >pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
> >> >pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf
> >> >0
> >> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf
> >> >1
> >>
> >A given port is of its parent device.
> >In current scenario, its PF or VF.
> >Hence it should be device attribute and not a port attribute.
> 
> I think that this works. You have VF_rep ports, PF_rep ports and PHYSICAL ports.
> In mlxsw for example, there are only PHYSICAL ports.
> In sr-iov world, there is also a PHYSICAL port on the eswitch. The others are
> either facing PF of VF. Looks accurate. I don't see any need for "devlink dev"
> flavour.

I see what you're trying to do here, with VF_rep ports being independent of PF_rep ports and PHYSICAL ports - however, my question is how do you categorize VF_rep ports of the same parent PF physical ports (say you have multi-port device, with 2 or more PFs), at least for identification purposes per physical port? Do we need to have pci_vf_number appended to physical port number?

Thanks,
~Akeem
> 
> 
> >So devlink dev show command have to show what device flavour is.
> >Is it well known PCI VF or PF or something else.
> >It will show subdev device attribute and its parent PCI (PF/VF) devlink device.
> >So we should have device flovour as PCI_PF or PCI_VF or SUBDEV.
> >
> >Again VF number showcasing here is very restrictive model.
> >Every PF/VF/Subdev represents its own 'port' and it is connected to eswitch
> 'port'.
> >Instead of showing VF here, it must be this 'port' or 'link' number that gives
> right view.
> >Which netdev represents which VF is already linked in the VF rep-netdev sysfs
> property.
> 
> I think you confuse the eswtich ports (in Jakub's output it's them) and the actual
> VF.
> 
> 
> >
> >So flavour should be something like 'hostport' and when port is registered for
> the eswitch side it should be 'switchport'.
> >With this there is very clear picture of which hostport is connected to which
> eswitch port.
> >Just like how we see in the physical world.
> >

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
  2019-03-20 17:29         ` Abodunrin, Akeem G
@ 2019-03-21 12:26           ` Jiri Pirko
  0 siblings, 0 replies; 44+ messages in thread
From: Jiri Pirko @ 2019-03-21 12:26 UTC (permalink / raw)
  To: Abodunrin, Akeem G
  Cc: Parav Pandit, Jakub Kicinski, davem, oss-drivers, netdev

Wed, Mar 20, 2019 at 06:29:44PM CET, akeem.g.abodunrin@intel.com wrote:
>
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>> On Behalf Of Jiri Pirko
>> Sent: Sunday, March 03, 2019 11:30 PM
>> To: Parav Pandit <parav@mellanox.com>
>> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>; davem@davemloft.net;
>> oss-drivers@netronome.com; netdev@vger.kernel.org
>> Subject: Re: [PATCH net-next 2/8] devlink: add PF and VF port flavours
>> 
>> Mon, Mar 04, 2019 at 05:59:04AM CET, parav@mellanox.com wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
>> >> Behalf Of Jiri Pirko
>> >> Sent: Wednesday, February 27, 2019 6:17 AM
>> >> To: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >> Cc: davem@davemloft.net; oss-drivers@netronome.com;
>> >> netdev@vger.kernel.org
>> >> Subject: Re: [PATCH net-next 2/8] devlink: add PF and VF port
>> >> flavours
>> >>
>> >> Tue, Feb 26, 2019 at 07:24:30PM CET, jakub.kicinski@netronome.com wrote:
>> >> >Current port flavours cover simple switches and DSA.  Add PF and VF
>> >> >flavours to cover "switchdev" SR-IOV NICs.
>> >> >
>> >> >Example devlink user space output:
>> >> >
>> >> >$ devlink port
>> >> >pci/0000:82:00.0/0: type eth netdev p4p1 flavour physical
>> >> >pci/0000:82:00.0/10000: type eth netdev eth0 flavour pcie_pf pf 0
>> >> >pci/0000:82:00.0/10001: type eth netdev eth1 flavour pcie_vf pf 0 vf
>> >> >0
>> >> >pci/0000:82:00.0/10002: type eth netdev eth2 flavour pcie_vf pf 0 vf
>> >> >1
>> >>
>> >A given port is of its parent device.
>> >In current scenario, its PF or VF.
>> >Hence it should be device attribute and not a port attribute.
>> 
>> I think that this works. You have VF_rep ports, PF_rep ports and PHYSICAL ports.
>> In mlxsw for example, there are only PHYSICAL ports.
>> In sr-iov world, there is also a PHYSICAL port on the eswitch. The others are
>> either facing PF of VF. Looks accurate. I don't see any need for "devlink dev"
>> flavour.
>
>I see what you're trying to do here, with VF_rep ports being independent of PF_rep ports and PHYSICAL ports - however, my question is how do you categorize VF_rep ports of the same parent PF physical ports (say you have multi-port device, with 2 or more PFs), at least for identification purposes per physical port? Do we need to have pci_vf_number appended to physical port number?

Please wrap your messages at 80 cols.


>
>Thanks,
>~Akeem
>> 
>> 
>> >So devlink dev show command have to show what device flavour is.
>> >Is it well known PCI VF or PF or something else.
>> >It will show subdev device attribute and its parent PCI (PF/VF) devlink device.
>> >So we should have device flovour as PCI_PF or PCI_VF or SUBDEV.
>> >
>> >Again VF number showcasing here is very restrictive model.
>> >Every PF/VF/Subdev represents its own 'port' and it is connected to eswitch
>> 'port'.
>> >Instead of showing VF here, it must be this 'port' or 'link' number that gives
>> right view.
>> >Which netdev represents which VF is already linked in the VF rep-netdev sysfs
>> property.
>> 
>> I think you confuse the eswtich ports (in Jakub's output it's them) and the actual
>> VF.
>> 
>> 
>> >
>> >So flavour should be something like 'hostport' and when port is registered for
>> the eswitch side it should be 'switchport'.
>> >With this there is very clear picture of which hostport is connected to which
>> eswitch port.
>> >Just like how we see in the physical world.
>> >

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-03-21 12:37 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 18:24 [PATCH net-next 0/8] devlink: add PF and VF port flavours Jakub Kicinski
2019-02-26 18:24 ` [PATCH net-next 1/8] nfp: split devlink port init from registration Jakub Kicinski
2019-02-26 18:24 ` [PATCH net-next 2/8] devlink: add PF and VF port flavours Jakub Kicinski
2019-02-27 12:16   ` Jiri Pirko
2019-03-04  4:59     ` Parav Pandit
2019-03-04  7:30       ` Jiri Pirko
2019-03-20 17:29         ` Abodunrin, Akeem G
2019-03-21 12:26           ` Jiri Pirko
2019-02-27 12:23   ` Jiri Pirko
2019-02-27 12:41     ` Jiri Pirko
2019-02-27 17:23       ` Jakub Kicinski
2019-02-27 20:17         ` Jiri Pirko
2019-02-27 22:42           ` Jakub Kicinski
2019-02-28  8:44             ` Jiri Pirko
2019-02-28 16:08               ` Jakub Kicinski
2019-02-28 16:24             ` David Ahern
2019-02-26 18:24 ` [PATCH net-next 3/8] nfp: register devlink ports of all reprs Jakub Kicinski
2019-02-26 18:24 ` [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports Jakub Kicinski
2019-02-27 12:37   ` Jiri Pirko
2019-02-27 18:30     ` Jakub Kicinski
2019-02-28  8:56       ` Jiri Pirko
2019-02-28 13:32         ` Jiri Pirko
2019-02-28 16:24         ` Jakub Kicinski
2019-03-01  7:25           ` Jiri Pirko
2019-03-01 16:04             ` Jakub Kicinski
2019-03-01 16:20               ` Jiri Pirko
2019-03-04 16:15       ` Jason Gunthorpe
2019-03-05  1:03         ` Jakub Kicinski
2019-03-05  1:30           ` Jason Gunthorpe
2019-03-05  2:11             ` Jakub Kicinski
2019-03-05 22:11               ` Jason Gunthorpe
2019-03-04  5:00     ` Parav Pandit
2019-02-26 18:24 ` [PATCH net-next 5/8] nfp: switch to devlink_port_get_phys_port_name() Jakub Kicinski
2019-02-26 18:24 ` [PATCH net-next 6/8] devlink: introduce port's peer netdevs Jakub Kicinski
2019-02-27 13:08   ` Jiri Pirko
2019-02-27 18:47     ` Jakub Kicinski
2019-02-28  9:00       ` Jiri Pirko
2019-02-28 16:36         ` Jakub Kicinski
2019-03-01  7:37           ` Jiri Pirko
2019-03-01 16:05             ` Jakub Kicinski
2019-03-04  5:07     ` Parav Pandit
2019-02-26 18:24 ` [PATCH net-next 7/8] nfp: expose PF " Jakub Kicinski
2019-02-26 18:24 ` [PATCH net-next 8/8] devlink: fix kdoc Jakub Kicinski
2019-02-27 13:13   ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).