All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch iproute2 0/6] iproute2: add changes for switchdev
@ 2014-12-04  8:57 Jiri Pirko
  2014-12-04  8:57 ` [patch iproute2 1/6] iproute2: ipa: show switch id Jiri Pirko
                   ` (7 more replies)
  0 siblings, 8 replies; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04  8:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

Jiri Pirko (1):
  iproute2: ipa: show switch id

Scott Feldman (5):
  bridge/fdb: fix statistics output spacing
  bridge/fdb: add flag/indication for FDB entry synced from offload
    device
  bridge/link: add new offload hwmode swdev
  link: add missing IFLA_BRPORT_PROXYARP
  bridge/link: add learning_sync policy flag

 bridge/fdb.c              |  4 +++-
 bridge/link.c             | 17 +++++++++++++++--
 include/linux/if_bridge.h |  1 +
 include/linux/if_link.h   |  3 +++
 include/linux/neighbour.h |  1 +
 ip/ipaddress.c            |  8 ++++++++
 man/man8/bridge.8         | 19 ++++++++++++++-----
 7 files changed, 45 insertions(+), 8 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
@ 2014-12-04  8:57 ` Jiri Pirko
  2014-12-04 13:17   ` Jamal Hadi Salim
                     ` (2 more replies)
  2014-12-04  8:57 ` [patch iproute2 2/6] bridge/fdb: fix statistics output spacing Jiri Pirko
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04  8:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/if_link.h | 1 +
 ip/ipaddress.c          | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 4732063..a6e2594 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_PHYS_SWITCH_ID,
 	__IFLA_MAX
 };
 
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 4d99324..bd36a07 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
 				      b1, sizeof(b1)));
 	}
 
+	if (tb[IFLA_PHYS_SWITCH_ID]) {
+		SPRINT_BUF(b1);
+		fprintf(fp, "switchid %s ",
+			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
+				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
+				      b1, sizeof(b1)));
+	}
+
 	if (tb[IFLA_OPERSTATE])
 		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [patch iproute2 2/6] bridge/fdb: fix statistics output spacing
  2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
  2014-12-04  8:57 ` [patch iproute2 1/6] iproute2: ipa: show switch id Jiri Pirko
@ 2014-12-04  8:57 ` Jiri Pirko
  2014-12-10  0:32   ` Stephen Hemminger
  2014-12-04  8:57 ` [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device Jiri Pirko
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04  8:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

From: Scott Feldman <sfeldma@gmail.com>

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 bridge/fdb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index a55fac1..d678342 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -142,7 +142,7 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 		struct nda_cacheinfo *ci = RTA_DATA(tb[NDA_CACHEINFO]);
 		int hz = get_user_hz();
 
-		fprintf(fp, " used %d/%d", ci->ndm_used/hz,
+		fprintf(fp, "used %d/%d ", ci->ndm_used/hz,
 		       ci->ndm_updated/hz);
 	}
 	if (r->ndm_flags & NTF_SELF)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device
  2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
  2014-12-04  8:57 ` [patch iproute2 1/6] iproute2: ipa: show switch id Jiri Pirko
  2014-12-04  8:57 ` [patch iproute2 2/6] bridge/fdb: fix statistics output spacing Jiri Pirko
@ 2014-12-04  8:57 ` Jiri Pirko
  2014-12-04 13:19   ` Jamal Hadi Salim
  2014-12-24 20:39   ` Stephen Hemminger
  2014-12-04  8:57 ` [patch iproute2 4/6] bridge/link: add new offload hwmode swdev Jiri Pirko
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04  8:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

From: Scott Feldman <sfeldma@gmail.com>

Add NTF_EXT_LEARNED flag to neigh flags to indicate FDB entry learned by
device has been learned externally to bridge FDB.  For these entries,
add "external" annotation in bridge fdb show output:

  00:02:00:00:03:00 dev swp2 used 2/2 master br0 external
  00:02:00:00:03:00 dev swp2 self permanent

In the example above, 00:02:00:00:03:00 is shown twice on dev swp2.  The
first entry if from the bridge (master) and is marked as "external" by
the offload device.  The second entry is from the brport offload device (self),
and was learned by the device.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 bridge/fdb.c              | 2 ++
 include/linux/neighbour.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index d678342..c01a502 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -154,6 +154,8 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 		fprintf(fp, "master ");
 	if (r->ndm_flags & NTF_ROUTER)
 		fprintf(fp, "router ");
+	if (r->ndm_flags & NTF_EXT_LEARNED)
+		fprintf(fp, "external ");
 
 	fprintf(fp, "%s\n", state_n2a(r->ndm_state));
 	return 0;
diff --git a/include/linux/neighbour.h b/include/linux/neighbour.h
index 4a1d7e9..3a9b0df 100644
--- a/include/linux/neighbour.h
+++ b/include/linux/neighbour.h
@@ -40,6 +40,7 @@ enum {
 
 #define NTF_SELF	0x02
 #define NTF_MASTER	0x04
+#define NTF_EXT_LEARNED	0x10
 
 /*
  *	Neighbor Cache Entry States.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [patch iproute2 4/6] bridge/link: add new offload hwmode swdev
  2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
                   ` (2 preceding siblings ...)
  2014-12-04  8:57 ` [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device Jiri Pirko
@ 2014-12-04  8:57 ` Jiri Pirko
  2014-12-04 13:23   ` Jamal Hadi Salim
  2014-12-24 20:37   ` Stephen Hemminger
  2014-12-04  8:57 ` [patch iproute2 5/6] link: add missing IFLA_BRPORT_PROXYARP Jiri Pirko
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04  8:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

From: Scott Feldman <sfeldma@gmail.com>

To support full-featured switch devices offloading bridge funtionality,
add new hwmode 'swdev'.  Like 'vepa' and 'veb', 'swdev' indicated bridge
port functionality is being offloaded to hardware.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 bridge/link.c             |  6 ++++--
 include/linux/if_bridge.h |  1 +
 man/man8/bridge.8         | 13 ++++++++-----
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index 90d9e7f..efe0b8c 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -65,7 +65,7 @@ static const char *oper_states[] = {
 	"TESTING", "DORMANT",	 "UP"
 };
 
-static const char *hw_mode[] = {"VEB", "VEPA"};
+static const char *hw_mode[] = {"VEB", "VEPA", "swdev"};
 
 static void print_operstate(FILE *f, __u8 state)
 {
@@ -315,10 +315,12 @@ static int brlink_modify(int argc, char **argv)
 				mode = BRIDGE_MODE_VEPA;
 			else if (strcmp(*argv, "veb") == 0)
 				mode = BRIDGE_MODE_VEB;
+			else if (strcmp(*argv, "swdev") == 0)
+				mode = BRIDGE_MODE_SWDEV;
 			else {
 				fprintf(stderr,
 					"Mode argument must be \"vepa\" or "
-					"\"veb\".\n");
+					"\"veb\" or \"swdev\".\n");
 				exit(-1);
 			}
 		} else {
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index ed6868e..6b4eb66 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -105,6 +105,7 @@ struct __fdb_entry {
 
 #define BRIDGE_MODE_VEB		0	/* Default loopback mode */
 #define BRIDGE_MODE_VEPA	1	/* 802.1Qbg defined VEPA mode */
+#define BRIDGE_MODE_SWDEV	2	/* Full switch device offload */
 
 /* Bridge management nested attributes
  * [IFLA_AF_SPEC] = {
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index af31d41..d3d64d1 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -38,7 +38,7 @@ bridge \- show / manipulate bridge addresses and devices
 .BR root_block " { " on " | " off " } ] [ "
 .BR learning " { " on " | " off " } ] [ "
 .BR flood " { " on " | " off " } ] [ "
-.BR hwmode " { " vepa " | " veb " } ] "
+.BR hwmode " { " vepa " | " veb " | " swdev " } ] "
 
 .ti -8
 .BR "bridge link" " [ " show " ] [ "
@@ -247,15 +247,18 @@ Controls whether a given port will flood unicast traffic for which there is no F
 
 .TP
 .BI hwmode
-Some network interface cards support HW bridge functionality and they may be
+Some port devices support HW bridge functionality and they may be
 configured in different modes.  Currently support modes are:
 
 .B vepa
-- Data sent between HW ports is sent on the wire to the external
-switch.
+- NIC interface supports VEPA: data sent between HW ports is sent on
+the wire to the external switch.
 
 .B veb
-- bridging happens in hardware.
+- NIC interface supports VEB: bridging happens in hardware.
+
+.B swdev
+- Full Ethernet switch offload device: bridging happens in hardware.
 
 .SS bridge link show - list bridge port configuration.
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [patch iproute2 5/6] link: add missing IFLA_BRPORT_PROXYARP
  2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
                   ` (3 preceding siblings ...)
  2014-12-04  8:57 ` [patch iproute2 4/6] bridge/link: add new offload hwmode swdev Jiri Pirko
@ 2014-12-04  8:57 ` Jiri Pirko
  2014-12-04 18:53   ` Stephen Hemminger
  2014-12-04  8:57 ` [patch iproute2 6/6] bridge/link: add learning_sync policy flag Jiri Pirko
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04  8:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

From: Scott Feldman <sfeldma@gmail.com>

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/if_link.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index a6e2594..06efa2d 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -242,6 +242,7 @@ enum {
 	IFLA_BRPORT_FAST_LEAVE,	/* multicast fast leave    */
 	IFLA_BRPORT_LEARNING,	/* mac learning */
 	IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
+	IFLA_BRPORT_PROXYARP,   /* proxy ARP */
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [patch iproute2 6/6] bridge/link: add learning_sync policy flag
  2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
                   ` (4 preceding siblings ...)
  2014-12-04  8:57 ` [patch iproute2 5/6] link: add missing IFLA_BRPORT_PROXYARP Jiri Pirko
@ 2014-12-04  8:57 ` Jiri Pirko
  2014-12-04 13:26   ` Jamal Hadi Salim
  2014-12-04 13:16 ` [patch iproute2 0/6] iproute2: add changes for switchdev Jamal Hadi Salim
  2014-12-04 14:26 ` Roopa Prabhu
  7 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04  8:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

From: Scott Feldman <sfeldma@gmail.com>

Add 'learned_sync' flag to turn on/off syncing of learned FDB entries from
offload device to bridge's FDB.   Flag would be set/cleared in on SELF using
hwmode qualifier 'swdev'.  E.g.:

  $ sudo bridge link set dev swp1 hwmode swdev learning_sync on

  $ bridge -d link show dev swp1
  2: swp1 state UNKNOWN : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 2
      hairpin off guard off root_block off fastleave off learning off flood off
  2: swp1 state UNKNOWN : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0
      learning on learning_sync on hwmode swdev

Adds new IFLA_BRPORT_LEARNING_SYNC attribute for IFLA_PROTINFO on the SELF
brport.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 bridge/link.c           | 11 +++++++++++
 include/linux/if_link.h |  1 +
 man/man8/bridge.8       |  6 ++++++
 3 files changed, 18 insertions(+)

diff --git a/bridge/link.c b/bridge/link.c
index efe0b8c..0f8649f 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -188,6 +188,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
 				if (prtb[IFLA_BRPORT_LEARNING])
 					print_onoff(fp, "learning",
 						    rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING]));
+				if (prtb[IFLA_BRPORT_LEARNING_SYNC])
+					print_onoff(fp, "learning_sync",
+						    rta_getattr_u8(prtb[IFLA_BRPORT_LEARNING_SYNC]));
 				if (prtb[IFLA_BRPORT_UNICAST_FLOOD])
 					print_onoff(fp, "flood",
 						    rta_getattr_u8(prtb[IFLA_BRPORT_UNICAST_FLOOD]));
@@ -221,6 +224,7 @@ static void usage(void)
 	fprintf(stderr, "                               [ fastleave {on | off} ]\n");
 	fprintf(stderr,	"                               [ root_block {on | off} ]\n");
 	fprintf(stderr,	"                               [ learning {on | off} ]\n");
+	fprintf(stderr,	"                               [ learning_sync {on | off} ]\n");
 	fprintf(stderr,	"                               [ flood {on | off} ]\n");
 	fprintf(stderr, "                               [ hwmode {vepa | veb} ]\n");
 	fprintf(stderr, "       bridge link show [dev DEV]\n");
@@ -252,6 +256,7 @@ static int brlink_modify(int argc, char **argv)
 	} req;
 	char *d = NULL;
 	__s8 learning = -1;
+	__s8 learning_sync = -1;
 	__s8 flood = -1;
 	__s8 hairpin = -1;
 	__s8 bpdu_guard = -1;
@@ -295,6 +300,10 @@ static int brlink_modify(int argc, char **argv)
 			NEXT_ARG();
 			if (!on_off("learning", &learning, *argv))
 				exit(-1);
+		} else if (strcmp(*argv, "learning_sync") == 0) {
+			NEXT_ARG();
+			if (!on_off("learning_sync", &learning_sync, *argv))
+				exit(-1);
 		} else if (strcmp(*argv, "flood") == 0) {
 			NEXT_ARG();
 			if (!on_off("flood", &flood, *argv))
@@ -359,6 +368,8 @@ static int brlink_modify(int argc, char **argv)
 		addattr8(&req.n, sizeof(req), IFLA_BRPORT_UNICAST_FLOOD, flood);
 	if (learning >= 0)
 		addattr8(&req.n, sizeof(req), IFLA_BRPORT_LEARNING, learning);
+	if (learning_sync >= 0)
+		addattr8(&req.n, sizeof(req), IFLA_BRPORT_LEARNING_SYNC, learning_sync);
 
 	if (cost > 0)
 		addattr32(&req.n, sizeof(req), IFLA_BRPORT_COST, cost);
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 06efa2d..9947f1b 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -243,6 +243,7 @@ enum {
 	IFLA_BRPORT_LEARNING,	/* mac learning */
 	IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
 	IFLA_BRPORT_PROXYARP,   /* proxy ARP */
+	IFLA_BRPORT_LEARNING_SYNC, /* mac learning sync from device */
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index d3d64d1..83c049a 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -37,6 +37,7 @@ bridge \- show / manipulate bridge addresses and devices
 .BR fastleave " { " on " | " off " } ] [ "
 .BR root_block " { " on " | " off " } ] [ "
 .BR learning " { " on " | " off " } ] [ "
+.BR learning_sync " { " on " | " off " } ] [ "
 .BR flood " { " on " | " off " } ] [ "
 .BR hwmode " { " vepa " | " veb " | " swdev " } ] "
 
@@ -242,6 +243,11 @@ not.  If learning if off, the bridge will end up flooding any traffic for which
 it has no FDB entry.  By default this flag is on.
 
 .TP
+.BR "learning_sync on " or " learning_sync off "
+Controls whether a given port will sync MAC addresses learned on device port
+to bridge FDB.  This flags applies to device ports in "swdev" hwmode.
+
+.TP
 .BR "flooding on " or " flooding off "
 Controls whether a given port will flood unicast traffic for which there is no FDB entry.  By default this flag is on.
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
                   ` (5 preceding siblings ...)
  2014-12-04  8:57 ` [patch iproute2 6/6] bridge/link: add learning_sync policy flag Jiri Pirko
@ 2014-12-04 13:16 ` Jamal Hadi Salim
  2014-12-04 13:56   ` Andy Gospodarek
  2014-12-04 14:26 ` Roopa Prabhu
  7 siblings, 1 reply; 53+ messages in thread
From: Jamal Hadi Salim @ 2014-12-04 13:16 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal


General comment:
iproute2 patches need to be addressed to the maintainer
stephen@networkplumber.org
And are we mainstream enough now we can drop the 1000 people on the
To: list? I doubt most of these people are even following the discussion 
(I planted an easter egg for Aviad and didnt see him
reacting) ;->

cheers,
jamal

On 12/04/14 03:57, Jiri Pirko wrote:
> Jiri Pirko (1):
>    iproute2: ipa: show switch id
>
> Scott Feldman (5):
>    bridge/fdb: fix statistics output spacing
>    bridge/fdb: add flag/indication for FDB entry synced from offload
>      device
>    bridge/link: add new offload hwmode swdev
>    link: add missing IFLA_BRPORT_PROXYARP
>    bridge/link: add learning_sync policy flag
>
>   bridge/fdb.c              |  4 +++-
>   bridge/link.c             | 17 +++++++++++++++--
>   include/linux/if_bridge.h |  1 +
>   include/linux/if_link.h   |  3 +++
>   include/linux/neighbour.h |  1 +
>   ip/ipaddress.c            |  8 ++++++++
>   man/man8/bridge.8         | 19 ++++++++++++++-----
>   7 files changed, 45 insertions(+), 8 deletions(-)
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04  8:57 ` [patch iproute2 1/6] iproute2: ipa: show switch id Jiri Pirko
@ 2014-12-04 13:17   ` Jamal Hadi Salim
  2014-12-04 14:20   ` Andy Gospodarek
  2014-12-04 16:15   ` Eric W. Biederman
  2 siblings, 0 replies; 53+ messages in thread
From: Jamal Hadi Salim @ 2014-12-04 13:17 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On 12/04/14 03:57, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/linux/if_link.h | 1 +
>   ip/ipaddress.c          | 8 ++++++++
>   2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 4732063..a6e2594 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>   	IFLA_CARRIER,
>   	IFLA_PHYS_PORT_ID,
>   	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,
>   	__IFLA_MAX
>   };
>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device
  2014-12-04  8:57 ` [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device Jiri Pirko
@ 2014-12-04 13:19   ` Jamal Hadi Salim
  2014-12-24 20:39   ` Stephen Hemminger
  1 sibling, 0 replies; 53+ messages in thread
From: Jamal Hadi Salim @ 2014-12-04 13:19 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On 12/04/14 03:57, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>

Did i say i like it already?;->

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 4/6] bridge/link: add new offload hwmode swdev
  2014-12-04  8:57 ` [patch iproute2 4/6] bridge/link: add new offload hwmode swdev Jiri Pirko
@ 2014-12-04 13:23   ` Jamal Hadi Salim
  2014-12-04 20:55     ` Scott Feldman
  2014-12-24 20:37   ` Stephen Hemminger
  1 sibling, 1 reply; 53+ messages in thread
From: Jamal Hadi Salim @ 2014-12-04 13:23 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On 12/04/14 03:57, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> To support full-featured switch devices offloading bridge funtionality,
> add new hwmode 'swdev'.  Like 'vepa' and 'veb', 'swdev' indicated bridge
> port functionality is being offloaded to hardware.
>

Unhappy with the name swdev ..
Ok, go ahead and beat up on me.

cheers,
jamal

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 6/6] bridge/link: add learning_sync policy flag
  2014-12-04  8:57 ` [patch iproute2 6/6] bridge/link: add learning_sync policy flag Jiri Pirko
@ 2014-12-04 13:26   ` Jamal Hadi Salim
  2014-12-04 14:15     ` Jamal Hadi Salim
  0 siblings, 1 reply; 53+ messages in thread
From: Jamal Hadi Salim @ 2014-12-04 13:26 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On 12/04/14 03:57, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> Add 'learned_sync' flag to turn on/off syncing of learned FDB entries from
> offload device to bridge's FDB.   Flag would be set/cleared in on SELF using
> hwmode qualifier 'swdev'.  E.g.:
>
>    $ sudo bridge link set dev swp1 hwmode swdev learning_sync on
>
>    $ bridge -d link show dev swp1
>    2: swp1 state UNKNOWN : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 2
>        hairpin off guard off root_block off fastleave off learning off flood off
>    2: swp1 state UNKNOWN : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0
>        learning on learning_sync on hwmode swdev
>
> Adds new IFLA_BRPORT_LEARNING_SYNC attribute for IFLA_PROTINFO on the SELF
> brport.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

I actually dont think any of these patches needed an ack because they
are off. My ack is just to show support (despite the noun challenges).

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 13:16 ` [patch iproute2 0/6] iproute2: add changes for switchdev Jamal Hadi Salim
@ 2014-12-04 13:56   ` Andy Gospodarek
  2014-12-04 14:22     ` Roopa Prabhu
  0 siblings, 1 reply; 53+ messages in thread
From: Andy Gospodarek @ 2014-12-04 13:56 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hemal

On Thu, Dec 04, 2014 at 08:16:38AM -0500, Jamal Hadi Salim wrote:
> 
> General comment:
> iproute2 patches need to be addressed to the maintainer
> stephen@networkplumber.org
> And are we mainstream enough now we can drop the 1000 people on the
> To: list? I doubt most of these people are even following the discussion (I
> planted an easter egg for Aviad and didnt see him
> reacting) ;->

I actually don't mind them coming here since it is a change to netlink.
Thanks for sending them to netdev, Jiri.

> 
> cheers,
> jamal
> 
> On 12/04/14 03:57, Jiri Pirko wrote:
> >Jiri Pirko (1):
> >   iproute2: ipa: show switch id
> >
> >Scott Feldman (5):
> >   bridge/fdb: fix statistics output spacing
> >   bridge/fdb: add flag/indication for FDB entry synced from offload
> >     device
> >   bridge/link: add new offload hwmode swdev
> >   link: add missing IFLA_BRPORT_PROXYARP
> >   bridge/link: add learning_sync policy flag
> >
> >  bridge/fdb.c              |  4 +++-
> >  bridge/link.c             | 17 +++++++++++++++--
> >  include/linux/if_bridge.h |  1 +
> >  include/linux/if_link.h   |  3 +++
> >  include/linux/neighbour.h |  1 +
> >  ip/ipaddress.c            |  8 ++++++++
> >  man/man8/bridge.8         | 19 ++++++++++++++-----
> >  7 files changed, 45 insertions(+), 8 deletions(-)
> >
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 6/6] bridge/link: add learning_sync policy flag
  2014-12-04 13:26   ` Jamal Hadi Salim
@ 2014-12-04 14:15     ` Jamal Hadi Salim
  0 siblings, 0 replies; 53+ messages in thread
From: Jamal Hadi Salim @ 2014-12-04 14:15 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On 12/04/14 08:26, Jamal Hadi Salim wrote:

> I actually dont think any of these patches needed an ack because they
> are off.
       ^^^ meant ok ;-> or obvious.

cheers,
jamal

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04  8:57 ` [patch iproute2 1/6] iproute2: ipa: show switch id Jiri Pirko
  2014-12-04 13:17   ` Jamal Hadi Salim
@ 2014-12-04 14:20   ` Andy Gospodarek
  2014-12-04 14:29     ` Roopa Prabhu
  2014-12-04 14:33     ` Jiri Pirko
  2014-12-04 16:15   ` Eric W. Biederman
  2 siblings, 2 replies; 53+ messages in thread
From: Andy Gospodarek @ 2014-12-04 14:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hemal

On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/if_link.h | 1 +
>  ip/ipaddress.c          | 8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 4732063..a6e2594 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>  	IFLA_CARRIER,
>  	IFLA_PHYS_PORT_ID,
>  	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,

Serious question for Stephen et al, once we take this to iproute2 are we
going to be able to change the name but the string diplayed if needed?

I had a patch that called this IFLA_PARENT_ID that I was using with the
older github tree used by Jiri and Scott before these were in net-next.
I wanted to submit that as a change to what became
82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
be accepted and the dust settled.

I like the parent/child/sibling nomenclature better for 4 reasons:

- Most did not seem to like the term 'offload' since that term would be
  confusing with, GRO, GSO, etc.
- A *significant* use case for many of the high-end ASICs in datacenters
  is routing.
- switchid does not make sense in the OVS/flow case because it is all
  about flows, not switches or routers, and parent made sense there.
- I wanted to combine this for use with SR-IOV use case, so one can more
  easily map PF->VF using this.

Can you give me a bit (a day) to clean-up that patch and submit an
alternative proposal to these?

>  	__IFLA_MAX
>  };
>  
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 4d99324..bd36a07 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>  				      b1, sizeof(b1)));
>  	}
>  
> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> +		SPRINT_BUF(b1);
> +		fprintf(fp, "switchid %s ",
> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> +				      b1, sizeof(b1)));
> +	}
> +
>  	if (tb[IFLA_OPERSTATE])
>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>  
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 13:56   ` Andy Gospodarek
@ 2014-12-04 14:22     ` Roopa Prabhu
  2014-12-04 14:31       ` Jamal Hadi Salim
  0 siblings, 1 reply; 53+ messages in thread
From: Roopa Prabhu @ 2014-12-04 14:22 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Jamal Hadi Salim, Jiri Pirko, netdev, davem, nhorman, andy,
	tgraf, dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, bcrl

On 12/4/14, 5:56 AM, Andy Gospodarek wrote:
> On Thu, Dec 04, 2014 at 08:16:38AM -0500, Jamal Hadi Salim wrote:
>> General comment:
>> iproute2 patches need to be addressed to the maintainer
>> stephen@networkplumber.org
>> And are we mainstream enough now we can drop the 1000 people on the
>> To: list? I doubt most of these people are even following the discussion (I
>> planted an easter egg for Aviad and didnt see him
>> reacting) ;->
> I actually don't mind them coming here since it is a change to netlink.
> Thanks for sending them to netdev, Jiri.

I think jamal meant the people in the cc (not netdev).
I actually like the cc, but that probably cant last long because it is a 
lot of people.
>
>> cheers,
>> jamal
>>
>> On 12/04/14 03:57, Jiri Pirko wrote:
>>> Jiri Pirko (1):
>>>    iproute2: ipa: show switch id
>>>
>>> Scott Feldman (5):
>>>    bridge/fdb: fix statistics output spacing
>>>    bridge/fdb: add flag/indication for FDB entry synced from offload
>>>      device
>>>    bridge/link: add new offload hwmode swdev
>>>    link: add missing IFLA_BRPORT_PROXYARP
>>>    bridge/link: add learning_sync policy flag
>>>
>>>   bridge/fdb.c              |  4 +++-
>>>   bridge/link.c             | 17 +++++++++++++++--
>>>   include/linux/if_bridge.h |  1 +
>>>   include/linux/if_link.h   |  3 +++
>>>   include/linux/neighbour.h |  1 +
>>>   ip/ipaddress.c            |  8 ++++++++
>>>   man/man8/bridge.8         | 19 ++++++++++++++-----
>>>   7 files changed, 45 insertions(+), 8 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
                   ` (6 preceding siblings ...)
  2014-12-04 13:16 ` [patch iproute2 0/6] iproute2: add changes for switchdev Jamal Hadi Salim
@ 2014-12-04 14:26 ` Roopa Prabhu
  2014-12-04 14:34   ` Jiri Pirko
  7 siblings, 1 reply; 53+ messages in thread
From: Roopa Prabhu @ 2014-12-04 14:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

On 12/4/14, 12:57 AM, Jiri Pirko wrote:
> Jiri Pirko (1):
>    iproute2: ipa: show switch id
>
> Scott Feldman (5):
>    bridge/fdb: fix statistics output spacing
>    bridge/fdb: add flag/indication for FDB entry synced from offload
>      device
>    bridge/link: add new offload hwmode swdev
Ack to most patches but nack on this one. The todo list still has a note 
to revist the flag to indicate switchdev offloads.
Exposing this to userspace does not help that.

>    link: add missing IFLA_BRPORT_PROXYARP
>    bridge/link: add learning_sync policy flag
>
>   bridge/fdb.c              |  4 +++-
>   bridge/link.c             | 17 +++++++++++++++--
>   include/linux/if_bridge.h |  1 +
>   include/linux/if_link.h   |  3 +++
>   include/linux/neighbour.h |  1 +
>   ip/ipaddress.c            |  8 ++++++++
>   man/man8/bridge.8         | 19 ++++++++++++++-----
>   7 files changed, 45 insertions(+), 8 deletions(-)
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 14:20   ` Andy Gospodarek
@ 2014-12-04 14:29     ` Roopa Prabhu
  2014-12-04 14:33     ` Jiri Pirko
  1 sibling, 0 replies; 53+ messages in thread
From: Roopa Prabhu @ 2014-12-04 14:29 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, bcrl, hemal

On 12/4/14, 6:20 AM, Andy Gospodarek wrote:
> On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>   include/linux/if_link.h | 1 +
>>   ip/ipaddress.c          | 8 ++++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> index 4732063..a6e2594 100644
>> --- a/include/linux/if_link.h
>> +++ b/include/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>   	IFLA_CARRIER,
>>   	IFLA_PHYS_PORT_ID,
>>   	IFLA_CARRIER_CHANGES,
>> +	IFLA_PHYS_SWITCH_ID,
> Serious question for Stephen et al, once we take this to iproute2 are we
> going to be able to change the name but the string diplayed if needed?
>
> I had a patch that called this IFLA_PARENT_ID that I was using with the
> older github tree used by Jiri and Scott before these were in net-next.
> I wanted to submit that as a change to what became
> 82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> be accepted and the dust settled.
>
> I like the parent/child/sibling nomenclature better for 4 reasons:
>
> - Most did not seem to like the term 'offload' since that term would be
>    confusing with, GRO, GSO, etc.
> - A *significant* use case for many of the high-end ASICs in datacenters
>    is routing.
> - switchid does not make sense in the OVS/flow case because it is all
>    about flows, not switches or routers, and parent made sense there.
> - I wanted to combine this for use with SR-IOV use case, so one can more
>    easily map PF->VF using this.

ack.., I have voted for calling it a generic parent id in the past.
>
> Can you give me a bit (a day) to clean-up that patch and submit an
> alternative proposal to these?
>
>>   	__IFLA_MAX
>>   };
>>   
>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> index 4d99324..bd36a07 100644
>> --- a/ip/ipaddress.c
>> +++ b/ip/ipaddress.c
>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>   				      b1, sizeof(b1)));
>>   	}
>>   
>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> +		SPRINT_BUF(b1);
>> +		fprintf(fp, "switchid %s ",
>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      b1, sizeof(b1)));
>> +	}
>> +
>>   	if (tb[IFLA_OPERSTATE])
>>   		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>>   
>> -- 
>> 1.9.3
>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 14:22     ` Roopa Prabhu
@ 2014-12-04 14:31       ` Jamal Hadi Salim
  0 siblings, 0 replies; 53+ messages in thread
From: Jamal Hadi Salim @ 2014-12-04 14:31 UTC (permalink / raw)
  To: Roopa Prabhu, Andy Gospodarek
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hemal

On 12/04/14 09:22, Roopa Prabhu wrote:
> On 12/4/14, 5:56 AM, Andy Gospodarek wrote:

> I think jamal meant the people in the cc (not netdev).
> I actually like the cc, but that probably cant last long because it is a
> lot of people.

There are 34 people on that list. I support copying all stake holders
but it is too large.

cheers,
jamal

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 14:20   ` Andy Gospodarek
  2014-12-04 14:29     ` Roopa Prabhu
@ 2014-12-04 14:33     ` Jiri Pirko
  2014-12-04 14:57       ` Andy Gospodarek
  1 sibling, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 14:33 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hemal

Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  include/linux/if_link.h | 1 +
>>  ip/ipaddress.c          | 8 ++++++++
>>  2 files changed, 9 insertions(+)
>> 
>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> index 4732063..a6e2594 100644
>> --- a/include/linux/if_link.h
>> +++ b/include/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>  	IFLA_CARRIER,
>>  	IFLA_PHYS_PORT_ID,
>>  	IFLA_CARRIER_CHANGES,
>> +	IFLA_PHYS_SWITCH_ID,
>
>Serious question for Stephen et al, once we take this to iproute2 are we
>going to be able to change the name but the string diplayed if needed?
>
>I had a patch that called this IFLA_PARENT_ID that I was using with the
>older github tree used by Jiri and Scott before these were in net-next.
>I wanted to submit that as a change to what became
>82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>be accepted and the dust settled.
>
>I like the parent/child/sibling nomenclature better for 4 reasons:
>
>- Most did not seem to like the term 'offload' since that term would be
>  confusing with, GRO, GSO, etc.
>- A *significant* use case for many of the high-end ASICs in datacenters
>  is routing.
>- switchid does not make sense in the OVS/flow case because it is all
>  about flows, not switches or routers, and parent made sense there.

well ovs is all about flows and has the "switch" word in the name. I
believe that people are talking about "switches" in case of these "flow
devices" as well. I see nothing wrong in that. I think that "switch"
became generic name for "packet forwarding machines".

"parent" is very generic and may mean 100 things...


>- I wanted to combine this for use with SR-IOV use case, so one can more
>  easily map PF->VF using this.

Ugh, please don't mix this up with pf, vf. That is completely different
thing.

pf vf mapping is done in sysfs. In netlink, physportid is used for that
purpose. We can expose this phys port id for pf as well (as a different
attr) and we are done.

>
>Can you give me a bit (a day) to clean-up that patch and submit an
>alternative proposal to these?
>
>>  	__IFLA_MAX
>>  };
>>  
>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> index 4d99324..bd36a07 100644
>> --- a/ip/ipaddress.c
>> +++ b/ip/ipaddress.c
>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>  				      b1, sizeof(b1)));
>>  	}
>>  
>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> +		SPRINT_BUF(b1);
>> +		fprintf(fp, "switchid %s ",
>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      b1, sizeof(b1)));
>> +	}
>> +
>>  	if (tb[IFLA_OPERSTATE])
>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>>  
>> -- 
>> 1.9.3
>> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 14:26 ` Roopa Prabhu
@ 2014-12-04 14:34   ` Jiri Pirko
  2014-12-04 14:45     ` Roopa Prabhu
  0 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 14:34 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>Jiri Pirko (1):
>>   iproute2: ipa: show switch id
>>
>>Scott Feldman (5):
>>   bridge/fdb: fix statistics output spacing
>>   bridge/fdb: add flag/indication for FDB entry synced from offload
>>     device
>>   bridge/link: add new offload hwmode swdev
>Ack to most patches but nack on this one. The todo list still has a note to
>revist the flag to indicate switchdev offloads.
>Exposing this to userspace does not help that.

Hmm, note that this is already exposed to userspace, this patchset is
for iproute2 (userspace tool).

>
>>   link: add missing IFLA_BRPORT_PROXYARP
>>   bridge/link: add learning_sync policy flag
>>
>>  bridge/fdb.c              |  4 +++-
>>  bridge/link.c             | 17 +++++++++++++++--
>>  include/linux/if_bridge.h |  1 +
>>  include/linux/if_link.h   |  3 +++
>>  include/linux/neighbour.h |  1 +
>>  ip/ipaddress.c            |  8 ++++++++
>>  man/man8/bridge.8         | 19 ++++++++++++++-----
>>  7 files changed, 45 insertions(+), 8 deletions(-)
>>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 14:34   ` Jiri Pirko
@ 2014-12-04 14:45     ` Roopa Prabhu
  2014-12-04 16:04       ` Jiri Pirko
  0 siblings, 1 reply; 53+ messages in thread
From: Roopa Prabhu @ 2014-12-04 14:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

On 12/4/14, 6:34 AM, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>> Jiri Pirko (1):
>>>    iproute2: ipa: show switch id
>>>
>>> Scott Feldman (5):
>>>    bridge/fdb: fix statistics output spacing
>>>    bridge/fdb: add flag/indication for FDB entry synced from offload
>>>      device
>>>    bridge/link: add new offload hwmode swdev
>> Ack to most patches but nack on this one. The todo list still has a note to
>> revist the flag to indicate switchdev offloads.
>> Exposing this to userspace does not help that.
> Hmm, note that this is already exposed to userspace, this patchset is
> for iproute2 (userspace tool).
hmmm, all feedback on the switchdev patches seemed to indicate we can 
change this later.
  I don't see swdev mode being used in the kernel anywhere today.
I will send a patch to remove it. Its still in net-next and so can be 
changed ?.
I was going to resend my patch to introduce a common offload flag for 
all link objects.
It would be nice if all of them had a consistent flag to indicate hw 
offload and iproute2 could display the same flag for all.
Including bonds and vxlan's.

>
>>>    link: add missing IFLA_BRPORT_PROXYARP
>>>    bridge/link: add learning_sync policy flag
>>>
>>>   bridge/fdb.c              |  4 +++-
>>>   bridge/link.c             | 17 +++++++++++++++--
>>>   include/linux/if_bridge.h |  1 +
>>>   include/linux/if_link.h   |  3 +++
>>>   include/linux/neighbour.h |  1 +
>>>   ip/ipaddress.c            |  8 ++++++++
>>>   man/man8/bridge.8         | 19 ++++++++++++++-----
>>>   7 files changed, 45 insertions(+), 8 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 14:33     ` Jiri Pirko
@ 2014-12-04 14:57       ` Andy Gospodarek
  2014-12-04 15:12         ` Jiri Pirko
                           ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Andy Gospodarek @ 2014-12-04 14:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hemal

On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> >> ---
> >>  include/linux/if_link.h | 1 +
> >>  ip/ipaddress.c          | 8 ++++++++
> >>  2 files changed, 9 insertions(+)
> >> 
> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> >> index 4732063..a6e2594 100644
> >> --- a/include/linux/if_link.h
> >> +++ b/include/linux/if_link.h
> >> @@ -145,6 +145,7 @@ enum {
> >>  	IFLA_CARRIER,
> >>  	IFLA_PHYS_PORT_ID,
> >>  	IFLA_CARRIER_CHANGES,
> >> +	IFLA_PHYS_SWITCH_ID,
> >
> >Serious question for Stephen et al, once we take this to iproute2 are we
> >going to be able to change the name but the string diplayed if needed?
> >
> >I had a patch that called this IFLA_PARENT_ID that I was using with the
> >older github tree used by Jiri and Scott before these were in net-next.
> >I wanted to submit that as a change to what became
> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> >be accepted and the dust settled.
> >
> >I like the parent/child/sibling nomenclature better for 4 reasons:
> >
> >- Most did not seem to like the term 'offload' since that term would be
> >  confusing with, GRO, GSO, etc.
> >- A *significant* use case for many of the high-end ASICs in datacenters
> >  is routing.
> >- switchid does not make sense in the OVS/flow case because it is all
> >  about flows, not switches or routers, and parent made sense there.
> 
> well ovs is all about flows and has the "switch" word in the name. I
> believe that people are talking about "switches" in case of these "flow
> devices" as well. I see nothing wrong in that. I think that "switch"
> became generic name for "packet forwarding machines".

Just because one chose to use it that way does not mean I agree with it
and that we should copy their bad decision.  :-)

> "parent" is very generic and may mean 100 things...

But in this case, it means 1 thing.  The netdev you are using is
connected to another device that controls it rather than being just a
NIC.

> 
> 
> >- I wanted to combine this for use with SR-IOV use case, so one can more
> >  easily map PF->VF using this.
> 
> Ugh, please don't mix this up with pf, vf. That is completely different
> thing.
> 
> pf vf mapping is done in sysfs. In netlink, physportid is used for that
> purpose. We can expose this phys port id for pf as well (as a different
> attr) and we are done.

I know that attribute is there, but I find it more valuable for
solutions like nPAR than for PF/VF use-case.  Parent/child relationship
makes more sense to me since for forwarding will be controlled by the
embedded switch on those devices.  (Notice I specifically chose not to
use 'master' since that is already overloaded by bridge, bonding,
teaming, etc.)

> 
> >
> >Can you give me a bit (a day) to clean-up that patch and submit an
> >alternative proposal to these?
> >
> >>  	__IFLA_MAX
> >>  };
> >>  
> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> >> index 4d99324..bd36a07 100644
> >> --- a/ip/ipaddress.c
> >> +++ b/ip/ipaddress.c
> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >>  				      b1, sizeof(b1)));
> >>  	}
> >>  
> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> >> +		SPRINT_BUF(b1);
> >> +		fprintf(fp, "switchid %s ",
> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> >> +				      b1, sizeof(b1)));
> >> +	}
> >> +
> >>  	if (tb[IFLA_OPERSTATE])
> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
> >>  
> >> -- 
> >> 1.9.3
> >> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 14:57       ` Andy Gospodarek
@ 2014-12-04 15:12         ` Jiri Pirko
  2014-12-04 15:15         ` Jiri Pirko
  2014-12-04 15:37         ` Thomas Graf
  2 siblings, 0 replies; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 15:12 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hemal

Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
>> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> >> ---
>> >>  include/linux/if_link.h | 1 +
>> >>  ip/ipaddress.c          | 8 ++++++++
>> >>  2 files changed, 9 insertions(+)
>> >> 
>> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> >> index 4732063..a6e2594 100644
>> >> --- a/include/linux/if_link.h
>> >> +++ b/include/linux/if_link.h
>> >> @@ -145,6 +145,7 @@ enum {
>> >>  	IFLA_CARRIER,
>> >>  	IFLA_PHYS_PORT_ID,
>> >>  	IFLA_CARRIER_CHANGES,
>> >> +	IFLA_PHYS_SWITCH_ID,
>> >
>> >Serious question for Stephen et al, once we take this to iproute2 are we
>> >going to be able to change the name but the string diplayed if needed?
>> >
>> >I had a patch that called this IFLA_PARENT_ID that I was using with the
>> >older github tree used by Jiri and Scott before these were in net-next.
>> >I wanted to submit that as a change to what became
>> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>> >be accepted and the dust settled.
>> >
>> >I like the parent/child/sibling nomenclature better for 4 reasons:
>> >
>> >- Most did not seem to like the term 'offload' since that term would be
>> >  confusing with, GRO, GSO, etc.
>> >- A *significant* use case for many of the high-end ASICs in datacenters
>> >  is routing.
>> >- switchid does not make sense in the OVS/flow case because it is all
>> >  about flows, not switches or routers, and parent made sense there.
>> 
>> well ovs is all about flows and has the "switch" word in the name. I
>> believe that people are talking about "switches" in case of these "flow
>> devices" as well. I see nothing wrong in that. I think that "switch"
>> became generic name for "packet forwarding machines".
>
>Just because one chose to use it that way does not mean I agree with it
>and that we should copy their bad decision.  :-)
>
>> "parent" is very generic and may mean 100 things...
>
>But in this case, it means 1 thing.  The netdev you are using is
>connected to another device that controls it rather than being just a
>NIC.

when we discussed linkage now called upper/lower devices, there was a
proposal to use "parent" as well. And then it ment only 1 thing as well.
Why for example pf is not parent of vf? Using "parent" word here and
assume that it just makes sense for this case is IMHO wrong. Too
generic. "switch_parent" - that I am okay with.


>
>> 
>> 
>> >- I wanted to combine this for use with SR-IOV use case, so one can more
>> >  easily map PF->VF using this.
>> 
>> Ugh, please don't mix this up with pf, vf. That is completely different
>> thing.
>> 
>> pf vf mapping is done in sysfs. In netlink, physportid is used for that
>> purpose. We can expose this phys port id for pf as well (as a different
>> attr) and we are done.
>
>I know that attribute is there, but I find it more valuable for
>solutions like nPAR than for PF/VF use-case.

Why it is not good enough for pf/vf? There are SR-IOV drivers using
this:
$ git grep ndo_get_phys_port_id
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:       .ndo_get_phys_port_id   = bnx2x_get_phys_port_id,
drivers/net/ethernet/intel/i40e/i40e_main.c:    .ndo_get_phys_port_id   = i40e_get_phys_port_id,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c:       .ndo_get_phys_port_id   = qlcnic_get_phys_port_id,


>Parent/child relationship
>makes more sense to me since for forwarding will be controlled by the
>embedded switch on those devices.  (Notice I specifically chose not to
>use 'master' since that is already overloaded by bridge, bonding,
>teaming, etc.)

But is that embedded switch the same thing as PF? I doubt that. Let the
aspect of PF/VF aside and look at it as on ordinary switch. The eswitch
driver needs te be implemented and add netdevs for internal ports (they
do not exist now).

You have to look at PF/VF and eSwitch as on separate things.

>
>> 
>> >
>> >Can you give me a bit (a day) to clean-up that patch and submit an
>> >alternative proposal to these?
>> >
>> >>  	__IFLA_MAX
>> >>  };
>> >>  
>> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> >> index 4d99324..bd36a07 100644
>> >> --- a/ip/ipaddress.c
>> >> +++ b/ip/ipaddress.c
>> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>> >>  				      b1, sizeof(b1)));
>> >>  	}
>> >>  
>> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> >> +		SPRINT_BUF(b1);
>> >> +		fprintf(fp, "switchid %s ",
>> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      b1, sizeof(b1)));
>> >> +	}
>> >> +
>> >>  	if (tb[IFLA_OPERSTATE])
>> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>> >>  
>> >> -- 
>> >> 1.9.3
>> >> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 14:57       ` Andy Gospodarek
  2014-12-04 15:12         ` Jiri Pirko
@ 2014-12-04 15:15         ` Jiri Pirko
  2014-12-04 15:28           ` Andy Gospodarek
  2014-12-04 15:37         ` Thomas Graf
  2 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 15:15 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hemal

Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
>> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> >> ---
>> >>  include/linux/if_link.h | 1 +
>> >>  ip/ipaddress.c          | 8 ++++++++
>> >>  2 files changed, 9 insertions(+)
>> >> 
>> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> >> index 4732063..a6e2594 100644
>> >> --- a/include/linux/if_link.h
>> >> +++ b/include/linux/if_link.h
>> >> @@ -145,6 +145,7 @@ enum {
>> >>  	IFLA_CARRIER,
>> >>  	IFLA_PHYS_PORT_ID,
>> >>  	IFLA_CARRIER_CHANGES,
>> >> +	IFLA_PHYS_SWITCH_ID,
>> >
>> >Serious question for Stephen et al, once we take this to iproute2 are we
>> >going to be able to change the name but the string diplayed if needed?
>> >
>> >I had a patch that called this IFLA_PARENT_ID that I was using with the
>> >older github tree used by Jiri and Scott before these were in net-next.
>> >I wanted to submit that as a change to what became
>> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>> >be accepted and the dust settled.
>> >
>> >I like the parent/child/sibling nomenclature better for 4 reasons:
>> >
>> >- Most did not seem to like the term 'offload' since that term would be
>> >  confusing with, GRO, GSO, etc.
>> >- A *significant* use case for many of the high-end ASICs in datacenters
>> >  is routing.
>> >- switchid does not make sense in the OVS/flow case because it is all
>> >  about flows, not switches or routers, and parent made sense there.
>> 
>> well ovs is all about flows and has the "switch" word in the name. I
>> believe that people are talking about "switches" in case of these "flow
>> devices" as well. I see nothing wrong in that. I think that "switch"
>> became generic name for "packet forwarding machines".
>
>Just because one chose to use it that way does not mean I agree with it
>and that we should copy their bad decision.  :-)
>
>> "parent" is very generic and may mean 100 things...
>
>But in this case, it means 1 thing.  The netdev you are using is
>connected to another device that controls it rather than being just a
>NIC.
>
>> 
>> 
>> >- I wanted to combine this for use with SR-IOV use case, so one can more
>> >  easily map PF->VF using this.
>> 
>> Ugh, please don't mix this up with pf, vf. That is completely different
>> thing.
>> 
>> pf vf mapping is done in sysfs. In netlink, physportid is used for that
>> purpose. We can expose this phys port id for pf as well (as a different
>> attr) and we are done.
>
>I know that attribute is there, but I find it more valuable for
>solutions like nPAR than for PF/VF use-case.  Parent/child relationship
>makes more sense to me since for forwarding will be controlled by the
>embedded switch on those devices.  (Notice I specifically chose not to

see, "embedded switch", not "embedded parent" or "embedded offload". And
yes, the "embedded switch" also may (and some of them do today) work
with flows.


>use 'master' since that is already overloaded by bridge, bonding,
>teaming, etc.)
>
>> 
>> >
>> >Can you give me a bit (a day) to clean-up that patch and submit an
>> >alternative proposal to these?
>> >
>> >>  	__IFLA_MAX
>> >>  };
>> >>  
>> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> >> index 4d99324..bd36a07 100644
>> >> --- a/ip/ipaddress.c
>> >> +++ b/ip/ipaddress.c
>> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>> >>  				      b1, sizeof(b1)));
>> >>  	}
>> >>  
>> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> >> +		SPRINT_BUF(b1);
>> >> +		fprintf(fp, "switchid %s ",
>> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      b1, sizeof(b1)));
>> >> +	}
>> >> +
>> >>  	if (tb[IFLA_OPERSTATE])
>> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>> >>  
>> >> -- 
>> >> 1.9.3
>> >> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 15:15         ` Jiri Pirko
@ 2014-12-04 15:28           ` Andy Gospodarek
  0 siblings, 0 replies; 53+ messages in thread
From: Andy Gospodarek @ 2014-12-04 15:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hemal

On Thu, Dec 04, 2014 at 04:15:12PM +0100, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
> >On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
> >> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
> >> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> >> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> >> >> ---
> >> >>  include/linux/if_link.h | 1 +
> >> >>  ip/ipaddress.c          | 8 ++++++++
> >> >>  2 files changed, 9 insertions(+)
> >> >> 
> >> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> >> >> index 4732063..a6e2594 100644
> >> >> --- a/include/linux/if_link.h
> >> >> +++ b/include/linux/if_link.h
> >> >> @@ -145,6 +145,7 @@ enum {
> >> >>  	IFLA_CARRIER,
> >> >>  	IFLA_PHYS_PORT_ID,
> >> >>  	IFLA_CARRIER_CHANGES,
> >> >> +	IFLA_PHYS_SWITCH_ID,
> >> >
> >> >Serious question for Stephen et al, once we take this to iproute2 are we
> >> >going to be able to change the name but the string diplayed if needed?
> >> >
> >> >I had a patch that called this IFLA_PARENT_ID that I was using with the
> >> >older github tree used by Jiri and Scott before these were in net-next.
> >> >I wanted to submit that as a change to what became
> >> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> >> >be accepted and the dust settled.
> >> >
> >> >I like the parent/child/sibling nomenclature better for 4 reasons:
> >> >
> >> >- Most did not seem to like the term 'offload' since that term would be
> >> >  confusing with, GRO, GSO, etc.
> >> >- A *significant* use case for many of the high-end ASICs in datacenters
> >> >  is routing.
> >> >- switchid does not make sense in the OVS/flow case because it is all
> >> >  about flows, not switches or routers, and parent made sense there.
> >> 
> >> well ovs is all about flows and has the "switch" word in the name. I
> >> believe that people are talking about "switches" in case of these "flow
> >> devices" as well. I see nothing wrong in that. I think that "switch"
> >> became generic name for "packet forwarding machines".
> >
> >Just because one chose to use it that way does not mean I agree with it
> >and that we should copy their bad decision.  :-)
> >
> >> "parent" is very generic and may mean 100 things...
> >
> >But in this case, it means 1 thing.  The netdev you are using is
> >connected to another device that controls it rather than being just a
> >NIC.
> >
> >> 
> >> 
> >> >- I wanted to combine this for use with SR-IOV use case, so one can more
> >> >  easily map PF->VF using this.
> >> 
> >> Ugh, please don't mix this up with pf, vf. That is completely different
> >> thing.
> >> 
> >> pf vf mapping is done in sysfs. In netlink, physportid is used for that
> >> purpose. We can expose this phys port id for pf as well (as a different
> >> attr) and we are done.
> >
> >I know that attribute is there, but I find it more valuable for
> >solutions like nPAR than for PF/VF use-case.  Parent/child relationship
> >makes more sense to me since for forwarding will be controlled by the
> >embedded switch on those devices.  (Notice I specifically chose not to
> 
> see, "embedded switch", not "embedded parent" or "embedded offload". And
> yes, the "embedded switch" also may (and some of them do today) work
> with flows.

It's fine to exclude the PF/VF part for now since you seem disagree with
that as a valid reason. The other 3 reasons are in my mind, so that was
why I asked to Stephen would mind waiting for me to post my set.  I'm
not asking for you to do anything here but to be patient and let me get
the set out.

> 
> 
> >use 'master' since that is already overloaded by bridge, bonding,
> >teaming, etc.)
> >
> >> 
> >> >
> >> >Can you give me a bit (a day) to clean-up that patch and submit an
> >> >alternative proposal to these?
> >> >
> >> >>  	__IFLA_MAX
> >> >>  };
> >> >>  
> >> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> >> >> index 4d99324..bd36a07 100644
> >> >> --- a/ip/ipaddress.c
> >> >> +++ b/ip/ipaddress.c
> >> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >> >>  				      b1, sizeof(b1)));
> >> >>  	}
> >> >>  
> >> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> >> >> +		SPRINT_BUF(b1);
> >> >> +		fprintf(fp, "switchid %s ",
> >> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> >> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> >> >> +				      b1, sizeof(b1)));
> >> >> +	}
> >> >> +
> >> >>  	if (tb[IFLA_OPERSTATE])
> >> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
> >> >>  
> >> >> -- 
> >> >> 1.9.3
> >> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 14:57       ` Andy Gospodarek
  2014-12-04 15:12         ` Jiri Pirko
  2014-12-04 15:15         ` Jiri Pirko
@ 2014-12-04 15:37         ` Thomas Graf
  2 siblings, 0 replies; 53+ messages in thread
From: Thomas Graf @ 2014-12-04 15:37 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, bcrl, hema

On 12/04/14 at 09:57am, Andy Gospodarek wrote:
> I know that attribute is there, but I find it more valuable for
> solutions like nPAR than for PF/VF use-case.  Parent/child relationship
> makes more sense to me since for forwarding will be controlled by the
> embedded switch on those devices.  (Notice I specifically chose not to
> use 'master' since that is already overloaded by bridge, bonding,
> teaming, etc.)

So that would make it link, master, and parent. Not sure we can make
it any more confusing ;-)

TBH, I'm really struggling on how to best name this. Trying to reach
for 100% correctness leads to generic, amgbigous names. 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 14:45     ` Roopa Prabhu
@ 2014-12-04 16:04       ` Jiri Pirko
  2014-12-04 16:55         ` Roopa Prabhu
  0 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 16:04 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Thu, Dec 04, 2014 at 03:45:44PM CET, roopa@cumulusnetworks.com wrote:
>On 12/4/14, 6:34 AM, Jiri Pirko wrote:
>>Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>>>On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>>>Jiri Pirko (1):
>>>>   iproute2: ipa: show switch id
>>>>
>>>>Scott Feldman (5):
>>>>   bridge/fdb: fix statistics output spacing
>>>>   bridge/fdb: add flag/indication for FDB entry synced from offload
>>>>     device
>>>>   bridge/link: add new offload hwmode swdev
>>>Ack to most patches but nack on this one. The todo list still has a note to
>>>revist the flag to indicate switchdev offloads.
>>>Exposing this to userspace does not help that.
>>Hmm, note that this is already exposed to userspace, this patchset is
>>for iproute2 (userspace tool).
>hmmm, all feedback on the switchdev patches seemed to indicate we can change
>this later.
> I don't see swdev mode being used in the kernel anywhere today.

Well, it is, in rocker:
$ git grep BRIDGE_MODE_SWDEV
drivers/net/ethernet/rocker/rocker.c:                   if (mode != BRIDGE_MODE_SWDEV)
drivers/net/ethernet/rocker/rocker.c:   u16 mode = BRIDGE_MODE_SWDEV;
include/uapi/linux/if_bridge.h:#define BRIDGE_MODE_SWDEV        2       /* Full switch device offload */

>I will send a patch to remove it. Its still in net-next and so can be changed
>?.
>I was going to resend my patch to introduce a common offload flag for all
>link objects.
>It would be nice if all of them had a consistent flag to indicate hw offload
>and iproute2 could display the same flag for all.
>Including bonds and vxlan's.

I do not understand the connection with BRIDGE_MODE_SWDEV. We discussed
this already. BRIDGE_MODE_SWDEV is a bridge mode, similar to for example
BRIDGE_MODE_VEPA and makes perfect sense to have it.

How vxlan and bonds come into the mixture, that is a puzzler for me.
Maybe I have to see patches.

>
>>
>>>>   link: add missing IFLA_BRPORT_PROXYARP
>>>>   bridge/link: add learning_sync policy flag
>>>>
>>>>  bridge/fdb.c              |  4 +++-
>>>>  bridge/link.c             | 17 +++++++++++++++--
>>>>  include/linux/if_bridge.h |  1 +
>>>>  include/linux/if_link.h   |  3 +++
>>>>  include/linux/neighbour.h |  1 +
>>>>  ip/ipaddress.c            |  8 ++++++++
>>>>  man/man8/bridge.8         | 19 ++++++++++++++-----
>>>>  7 files changed, 45 insertions(+), 8 deletions(-)
>>>>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04  8:57 ` [patch iproute2 1/6] iproute2: ipa: show switch id Jiri Pirko
  2014-12-04 13:17   ` Jamal Hadi Salim
  2014-12-04 14:20   ` Andy Gospodarek
@ 2014-12-04 16:15   ` Eric W. Biederman
  2014-12-04 16:30     ` Jiri Pirko
  2 siblings, 1 reply; 53+ messages in thread
From: Eric W. Biederman @ 2014-12-04 16:15 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Jiri Pirko <jiri@resnulli.us> writes:

Would someone please explain to me what a switch id is?

I looked in the kernel source, and I looked here and while I know
switches I don't have a clue what a switch id is.

My primary concern at this point is that you have introduced a global
identifier that is isn't a hardware property (it certainly does not look
like a mac address) and that is unique across network namespaces and
thus breaks checkpoint/restart (aka CRIU).

Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
mean we can't have a purely software implementation of this interface?
Given that we will want a software implementation at some point
including PHYS in the name seems completely wrong.

> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/if_link.h | 1 +
>  ip/ipaddress.c          | 8 ++++++++
>  2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 4732063..a6e2594 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>  	IFLA_CARRIER,
>  	IFLA_PHYS_PORT_ID,
>  	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,
>  	__IFLA_MAX
>  };
>  
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 4d99324..bd36a07 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>  				      b1, sizeof(b1)));
>  	}
>  
> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> +		SPRINT_BUF(b1);
> +		fprintf(fp, "switchid %s ",
> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> +				      b1, sizeof(b1)));
> +	}
> +
>  	if (tb[IFLA_OPERSTATE])
>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 16:15   ` Eric W. Biederman
@ 2014-12-04 16:30     ` Jiri Pirko
  2014-12-04 17:52       ` Eric W. Biederman
  0 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 16:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>Would someone please explain to me what a switch id is?
>
>I looked in the kernel source, and I looked here and while I know
>switches I don't have a clue what a switch id is.
>
>My primary concern at this point is that you have introduced a global
>identifier that is isn't a hardware property (it certainly does not look
>like a mac address) and that is unique across network namespaces and
>thus breaks checkpoint/restart (aka CRIU).

IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
generated by the driver and ensures that there is the same switch id for
all ports belonging to the same switch chip/asic. It is up to the driver
how to implement the id. I would like to point you to driver
implementing ndo_get_phys_port_id

>
>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>mean we can't have a purely software implementation of this interface?
>Given that we will want a software implementation at some point
>including PHYS in the name seems completely wrong.

We can remove the "PHYS", no problem. I do not understand what you say
about "software implementation". The point is to allow hw switch/ish
chips to be supported.


>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  include/linux/if_link.h | 1 +
>>  ip/ipaddress.c          | 8 ++++++++
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> index 4732063..a6e2594 100644
>> --- a/include/linux/if_link.h
>> +++ b/include/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>  	IFLA_CARRIER,
>>  	IFLA_PHYS_PORT_ID,
>>  	IFLA_CARRIER_CHANGES,
>> +	IFLA_PHYS_SWITCH_ID,
>>  	__IFLA_MAX
>>  };
>>  
>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> index 4d99324..bd36a07 100644
>> --- a/ip/ipaddress.c
>> +++ b/ip/ipaddress.c
>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>  				      b1, sizeof(b1)));
>>  	}
>>  
>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> +		SPRINT_BUF(b1);
>> +		fprintf(fp, "switchid %s ",
>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      b1, sizeof(b1)));
>> +	}
>> +
>>  	if (tb[IFLA_OPERSTATE])
>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 16:04       ` Jiri Pirko
@ 2014-12-04 16:55         ` Roopa Prabhu
  2014-12-04 20:49           ` Scott Feldman
  0 siblings, 1 reply; 53+ messages in thread
From: Roopa Prabhu @ 2014-12-04 16:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

On 12/4/14, 8:04 AM, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:45:44PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/4/14, 6:34 AM, Jiri Pirko wrote:
>>> Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>>>> Jiri Pirko (1):
>>>>>    iproute2: ipa: show switch id
>>>>>
>>>>> Scott Feldman (5):
>>>>>    bridge/fdb: fix statistics output spacing
>>>>>    bridge/fdb: add flag/indication for FDB entry synced from offload
>>>>>      device
>>>>>    bridge/link: add new offload hwmode swdev
>>>> Ack to most patches but nack on this one. The todo list still has a note to
>>>> revist the flag to indicate switchdev offloads.
>>>> Exposing this to userspace does not help that.
>>> Hmm, note that this is already exposed to userspace, this patchset is
>>> for iproute2 (userspace tool).
>> hmmm, all feedback on the switchdev patches seemed to indicate we can change
>> this later.
>> I don't see swdev mode being used in the kernel anywhere today.
> Well, it is, in rocker:
> $ git grep BRIDGE_MODE_SWDEV
> drivers/net/ethernet/rocker/rocker.c:                   if (mode != BRIDGE_MODE_SWDEV)
> drivers/net/ethernet/rocker/rocker.c:   u16 mode = BRIDGE_MODE_SWDEV;
> include/uapi/linux/if_bridge.h:#define BRIDGE_MODE_SWDEV        2       /* Full switch device offload */

The problem is rocker is not the only one who is going to be using this. 
And so, we need something that fits everybody.
And i am not going to make my user set a mode for him to enable offload 
to hw.

>
>> I will send a patch to remove it. Its still in net-next and so can be changed
>> ?.
>> I was going to resend my patch to introduce a common offload flag for all
>> link objects.
>> It would be nice if all of them had a consistent flag to indicate hw offload
>> and iproute2 could display the same flag for all.
>> Including bonds and vxlan's.
> I do not understand the connection with BRIDGE_MODE_SWDEV. We discussed
> this already. BRIDGE_MODE_SWDEV is a bridge mode, similar to for example
> BRIDGE_MODE_VEPA and makes perfect sense to have it.
I dont think everybody acked it. But it went in with a note saying that 
it can be changed.
>
> How vxlan and bonds come into the mixture, that is a puzzler for me.
> Maybe I have to see patches.

I had posted a version of the patch previously: 
http://www.spinics.net/lists/netdev/msg305472.html

I have a v2 patch in my stack which does not touch the netlink header.
But in the past hour, i have been thinking about it some more. Do we 
really need this set by the user ?. In my use case i don't need it.

We do need a feature flag (or net_device_flags), but it does not need to 
be set by the user explicitly.
This flag can be set by the switch port driver on the switch ports. And 
the logical device: bridge/bond/vxlan
can inherit it from the port. There was a need of a flag in some 
usecases, to control offloading of specific bridge port flags
to hw/sw (example learning in hw or sw). example patch: 
https://patchwork.ozlabs.org/patch/413211/

I will post something today.






>
>>>>>    link: add missing IFLA_BRPORT_PROXYARP
>>>>>    bridge/link: add learning_sync policy flag
>>>>>
>>>>>   bridge/fdb.c              |  4 +++-
>>>>>   bridge/link.c             | 17 +++++++++++++++--
>>>>>   include/linux/if_bridge.h |  1 +
>>>>>   include/linux/if_link.h   |  3 +++
>>>>>   include/linux/neighbour.h |  1 +
>>>>>   ip/ipaddress.c            |  8 ++++++++
>>>>>   man/man8/bridge.8         | 19 ++++++++++++++-----
>>>>>   7 files changed, 45 insertions(+), 8 deletions(-)
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 16:30     ` Jiri Pirko
@ 2014-12-04 17:52       ` Eric W. Biederman
  2014-12-04 17:59         ` Roopa Prabhu
  2014-12-04 18:24         ` Jiri Pirko
  0 siblings, 2 replies; 53+ messages in thread
From: Eric W. Biederman @ 2014-12-04 17:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Jiri Pirko <jiri@resnulli.us> writes:

> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>Jiri Pirko <jiri@resnulli.us> writes:
>>
>>Would someone please explain to me what a switch id is?
>>
>>I looked in the kernel source, and I looked here and while I know
>>switches I don't have a clue what a switch id is.
>>
>>My primary concern at this point is that you have introduced a global
>>identifier that is isn't a hardware property (it certainly does not look
>>like a mac address) and that is unique across network namespaces and
>>thus breaks checkpoint/restart (aka CRIU).
>
> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
> generated by the driver and ensures that there is the same switch id for
> all ports belonging to the same switch chip/asic. It is up to the driver
> how to implement the id. I would like to point you to driver
> implementing ndo_get_phys_port_id

Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
guid in the case of infiniband.  Which really makes me wonder why we
didn't use the same abstractions in the code for address types that we
do for hardware addresses.

Using mac address or other hardware addresses that are used for layer 2
addressing makes sense to me.  There is a long tradition of that and as
I recall protocols like STP actually requiring having a different mac
address per port.

When I asked the question I thought the switch id was going to be
something like the ifindex, the software index of a network device.


Finally having tracked down the rocker implementation of 
rocker_port_switch_parent_id_get I see it you are reading some 64bit
hardware register.

Which leads me to ask what are the semantics of switch_id?

Is the switch id an identifier with a prefix from IEEE and assigned by
the manufacture so that it is guaranteed to the tolerances of the
manufacturing process to be globally unique?

Is the switch id a random number that is statistically likely to be
globally unique because you have enough bits?   As I recall you need
at least 128 bits to have a reasonable chance of a random number
avoiding the birthday paradox.

Do we need some kind of manufacturer id to tell one switch id from
another?

Is the switch id persistent across reboots?

>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>mean we can't have a purely software implementation of this interface?
>>Given that we will want a software implementation at some point
>>including PHYS in the name seems completely wrong.
>
> We can remove the "PHYS", no problem. I do not understand what you say
> about "software implementation". The point is to allow hw switch/ish
> chips to be supported.

If we are talking about something typically stored in a eeprom like a
mac address phys seems appropriate.

Still having a definition of this switch id clean clear enough that
net/bridge and drivers/net/macvlan can implement it seems important.

Even more important is having a definition of switch id clear enough
that userspace can use the switch id to do something useful.

Right now switch id looks like one of those weird one manufacturer
properties that is fine to expose as a driver specific property
but I don't yet see it being a generic property I that can be used
usefully in userspace.

So can we please get some clear semantics or failing that can we please
not expose this to userspace as generic property.

Thanks,
Eric



>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>  include/linux/if_link.h | 1 +
>>>  ip/ipaddress.c          | 8 ++++++++
>>>  2 files changed, 9 insertions(+)
>>>
>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>> index 4732063..a6e2594 100644
>>> --- a/include/linux/if_link.h
>>> +++ b/include/linux/if_link.h
>>> @@ -145,6 +145,7 @@ enum {
>>>  	IFLA_CARRIER,
>>>  	IFLA_PHYS_PORT_ID,
>>>  	IFLA_CARRIER_CHANGES,
>>> +	IFLA_PHYS_SWITCH_ID,
>>>  	__IFLA_MAX
>>>  };
>>>  
>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>> index 4d99324..bd36a07 100644
>>> --- a/ip/ipaddress.c
>>> +++ b/ip/ipaddress.c
>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>  				      b1, sizeof(b1)));
>>>  	}
>>>  
>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>> +		SPRINT_BUF(b1);
>>> +		fprintf(fp, "switchid %s ",
>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>> +				      b1, sizeof(b1)));
>>> +	}
>>> +
>>>  	if (tb[IFLA_OPERSTATE])
>>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 17:52       ` Eric W. Biederman
@ 2014-12-04 17:59         ` Roopa Prabhu
  2014-12-04 18:24         ` Jiri Pirko
  1 sibling, 0 replies; 53+ messages in thread
From: Roopa Prabhu @ 2014-12-04 17:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

On 12/4/14, 9:52 AM, Eric W. Biederman wrote:
> Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>> Would someone please explain to me what a switch id is?
>>>
>>> I looked in the kernel source, and I looked here and while I know
>>> switches I don't have a clue what a switch id is.
>>>
>>> My primary concern at this point is that you have introduced a global
>>> identifier that is isn't a hardware property (it certainly does not look
>>> like a mac address) and that is unique across network namespaces and
>>> thus breaks checkpoint/restart (aka CRIU).
>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>> generated by the driver and ensures that there is the same switch id for
>> all ports belonging to the same switch chip/asic. It is up to the driver
>> how to implement the id. I would like to point you to driver
>> implementing ndo_get_phys_port_id
> Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
> guid in the case of infiniband.  Which really makes me wonder why we
> didn't use the same abstractions in the code for address types that we
> do for hardware addresses.
>
> Using mac address or other hardware addresses that are used for layer 2
> addressing makes sense to me.  There is a long tradition of that and as
> I recall protocols like STP actually requiring having a different mac
> address per port.
>
> When I asked the question I thought the switch id was going to be
> something like the ifindex, the software index of a network device.
>
>
> Finally having tracked down the rocker implementation of
> rocker_port_switch_parent_id_get I see it you are reading some 64bit
> hardware register.
>
> Which leads me to ask what are the semantics of switch_id?
>
> Is the switch id an identifier with a prefix from IEEE and assigned by
> the manufacture so that it is guaranteed to the tolerances of the
> manufacturing process to be globally unique?
>
> Is the switch id a random number that is statistically likely to be
> globally unique because you have enough bits?   As I recall you need
> at least 128 bits to have a reasonable chance of a random number
> avoiding the birthday paradox.
>
> Do we need some kind of manufacturer id to tell one switch id from
> another?
>
> Is the switch id persistent across reboots?
>
>>> Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>> mean we can't have a purely software implementation of this interface?
>>> Given that we will want a software implementation at some point
>>> including PHYS in the name seems completely wrong.
>> We can remove the "PHYS", no problem. I do not understand what you say
>> about "software implementation". The point is to allow hw switch/ish
>> chips to be supported.
> If we are talking about something typically stored in a eeprom like a
> mac address phys seems appropriate.
>
> Still having a definition of this switch id clean clear enough that
> net/bridge and drivers/net/macvlan can implement it seems important.
>
> Even more important is having a definition of switch id clear enough
> that userspace can use the switch id to do something useful.
>
> Right now switch id looks like one of those weird one manufacturer
> properties that is fine to expose as a driver specific property
> but I don't yet see it being a generic property I that can be used
> usefully in userspace.
>
> So can we please get some clear semantics or failing that can we please
> not expose this to userspace as generic property.

Agree..100%. This was my original concern as well and i have raised it 
earlier.
  Don't expose it to userspace if the semantics are still not clear.

Thanks.

>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>   include/linux/if_link.h | 1 +
>>>>   ip/ipaddress.c          | 8 ++++++++
>>>>   2 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>>> index 4732063..a6e2594 100644
>>>> --- a/include/linux/if_link.h
>>>> +++ b/include/linux/if_link.h
>>>> @@ -145,6 +145,7 @@ enum {
>>>>   	IFLA_CARRIER,
>>>>   	IFLA_PHYS_PORT_ID,
>>>>   	IFLA_CARRIER_CHANGES,
>>>> +	IFLA_PHYS_SWITCH_ID,
>>>>   	__IFLA_MAX
>>>>   };
>>>>   
>>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>>> index 4d99324..bd36a07 100644
>>>> --- a/ip/ipaddress.c
>>>> +++ b/ip/ipaddress.c
>>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>>   				      b1, sizeof(b1)));
>>>>   	}
>>>>   
>>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>>> +		SPRINT_BUF(b1);
>>>> +		fprintf(fp, "switchid %s ",
>>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      b1, sizeof(b1)));
>>>> +	}
>>>> +
>>>>   	if (tb[IFLA_OPERSTATE])
>>>>   		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 17:52       ` Eric W. Biederman
  2014-12-04 17:59         ` Roopa Prabhu
@ 2014-12-04 18:24         ` Jiri Pirko
  2014-12-04 18:57           ` Eric W. Biederman
  1 sibling, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 18:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Thu, Dec 04, 2014 at 06:52:49PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>>Would someone please explain to me what a switch id is?
>>>
>>>I looked in the kernel source, and I looked here and while I know
>>>switches I don't have a clue what a switch id is.
>>>
>>>My primary concern at this point is that you have introduced a global
>>>identifier that is isn't a hardware property (it certainly does not look
>>>like a mac address) and that is unique across network namespaces and
>>>thus breaks checkpoint/restart (aka CRIU).
>>
>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>> generated by the driver and ensures that there is the same switch id for
>> all ports belonging to the same switch chip/asic. It is up to the driver
>> how to implement the id. I would like to point you to driver
>> implementing ndo_get_phys_port_id
>
>Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
>guid in the case of infiniband.  Which really makes me wonder why we
>didn't use the same abstractions in the code for address types that we
>do for hardware addresses.
>
>Using mac address or other hardware addresses that are used for layer 2
>addressing makes sense to me.  There is a long tradition of that and as
>I recall protocols like STP actually requiring having a different mac
>address per port.
>
>When I asked the question I thought the switch id was going to be
>something like the ifindex, the software index of a network device.
>
>
>Finally having tracked down the rocker implementation of 
>rocker_port_switch_parent_id_get I see it you are reading some 64bit
>hardware register.
>
>Which leads me to ask what are the semantics of switch_id?
>
>Is the switch id an identifier with a prefix from IEEE and assigned by
>the manufacture so that it is guaranteed to the tolerances of the
>manufacturing process to be globally unique?

It is up to the driver what to use. It can use mac addr. This is same as
for phys port id.


>
>Is the switch id a random number that is statistically likely to be
>globally unique because you have enough bits?   As I recall you need
>at least 128 bits to have a reasonable chance of a random number
>avoiding the birthday paradox.
>
>Do we need some kind of manufacturer id to tell one switch id from
>another?
>
>Is the switch id persistent across reboots?

Yes it is (as for phys port id).

>
>>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>>mean we can't have a purely software implementation of this interface?
>>>Given that we will want a software implementation at some point
>>>including PHYS in the name seems completely wrong.
>>
>> We can remove the "PHYS", no problem. I do not understand what you say
>> about "software implementation". The point is to allow hw switch/ish
>> chips to be supported.
>
>If we are talking about something typically stored in a eeprom like a
>mac address phys seems appropriate.

Yes, we are.

>
>Still having a definition of this switch id clean clear enough that
>net/bridge and drivers/net/macvlan can implement it seems important.

I don't understand why would net/bridge or driver/net/macvlan want to
implement this. The purpose is to group ports/netdevs which are part of
the same hw switch.

>
>Even more important is having a definition of switch id clear enough
>that userspace can use the switch id to do something useful.

Userspace threats this the same it treats phys port id.

if two ports/netdevs has same switch id, they belong under same hw
switch. That's all.


>
>Right now switch id looks like one of those weird one manufacturer
>properties that is fine to expose as a driver specific property
>but I don't yet see it being a generic property I that can be used
>usefully in userspace.
>
>So can we please get some clear semantics or failing that can we please
>not expose this to userspace as generic property.

I thought that the semantics is clean. Looks like I will have to update
Documentation/networking/switchdev.txt adding some more info about this.

>
>Thanks,
>Eric
>
>
>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>  include/linux/if_link.h | 1 +
>>>>  ip/ipaddress.c          | 8 ++++++++
>>>>  2 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>>> index 4732063..a6e2594 100644
>>>> --- a/include/linux/if_link.h
>>>> +++ b/include/linux/if_link.h
>>>> @@ -145,6 +145,7 @@ enum {
>>>>  	IFLA_CARRIER,
>>>>  	IFLA_PHYS_PORT_ID,
>>>>  	IFLA_CARRIER_CHANGES,
>>>> +	IFLA_PHYS_SWITCH_ID,
>>>>  	__IFLA_MAX
>>>>  };
>>>>  
>>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>>> index 4d99324..bd36a07 100644
>>>> --- a/ip/ipaddress.c
>>>> +++ b/ip/ipaddress.c
>>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>>  				      b1, sizeof(b1)));
>>>>  	}
>>>>  
>>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>>> +		SPRINT_BUF(b1);
>>>> +		fprintf(fp, "switchid %s ",
>>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      b1, sizeof(b1)));
>>>> +	}
>>>> +
>>>>  	if (tb[IFLA_OPERSTATE])
>>>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 5/6] link: add missing IFLA_BRPORT_PROXYARP
  2014-12-04  8:57 ` [patch iproute2 5/6] link: add missing IFLA_BRPORT_PROXYARP Jiri Pirko
@ 2014-12-04 18:53   ` Stephen Hemminger
  0 siblings, 0 replies; 53+ messages in thread
From: Stephen Hemminger @ 2014-12-04 18:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On Thu,  4 Dec 2014 09:57:17 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> From: Scott Feldman <sfeldma@gmail.com>
> 
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/if_link.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index a6e2594..06efa2d 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -242,6 +242,7 @@ enum {
>  	IFLA_BRPORT_FAST_LEAVE,	/* multicast fast leave    */
>  	IFLA_BRPORT_LEARNING,	/* mac learning */
>  	IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
> +	IFLA_BRPORT_PROXYARP,   /* proxy ARP */
>  	__IFLA_BRPORT_MAX
>  };
>  #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)

Unnecessary patch since I pick up headers from upstream.
Already on net-next branch.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 18:24         ` Jiri Pirko
@ 2014-12-04 18:57           ` Eric W. Biederman
  2014-12-04 19:19             ` Jiri Pirko
  0 siblings, 1 reply; 53+ messages in thread
From: Eric W. Biederman @ 2014-12-04 18:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Jiri Pirko <jiri@resnulli.us> writes:

> Thu, Dec 04, 2014 at 06:52:49PM CET, ebiederm@xmission.com wrote:
>>Jiri Pirko <jiri@resnulli.us> writes:
>>
>>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>>
>>>>Would someone please explain to me what a switch id is?
>>>>
>>>>I looked in the kernel source, and I looked here and while I know
>>>>switches I don't have a clue what a switch id is.
>>>>
>>>>My primary concern at this point is that you have introduced a global
>>>>identifier that is isn't a hardware property (it certainly does not look
>>>>like a mac address) and that is unique across network namespaces and
>>>>thus breaks checkpoint/restart (aka CRIU).
>>>
>>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>>> generated by the driver and ensures that there is the same switch id for
>>> all ports belonging to the same switch chip/asic. It is up to the driver
>>> how to implement the id. I would like to point you to driver
>>> implementing ndo_get_phys_port_id
>>
>>Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
>>guid in the case of infiniband.  Which really makes me wonder why we
>>didn't use the same abstractions in the code for address types that we
>>do for hardware addresses.
>>
>>Using mac address or other hardware addresses that are used for layer 2
>>addressing makes sense to me.  There is a long tradition of that and as
>>I recall protocols like STP actually requiring having a different mac
>>address per port.
>>
>>When I asked the question I thought the switch id was going to be
>>something like the ifindex, the software index of a network device.
>>
>>
>>Finally having tracked down the rocker implementation of 
>>rocker_port_switch_parent_id_get I see it you are reading some 64bit
>>hardware register.
>>
>>Which leads me to ask what are the semantics of switch_id?
>>
>>Is the switch id an identifier with a prefix from IEEE and assigned by
>>the manufacture so that it is guaranteed to the tolerances of the
>>manufacturing process to be globally unique?
>
> It is up to the driver what to use. It can use mac addr. This is same as
> for phys port id.

My reading of the code says the phys port id is the layer 2 hardware
address of the port.  Certainly that is the case for all of the
implementations now and it would be insane for it to be anything else.

>>Is the switch id a random number that is statistically likely to be
>>globally unique because you have enough bits?   As I recall you need
>>at least 128 bits to have a reasonable chance of a random number
>>avoiding the birthday paradox.
>>
>>Do we need some kind of manufacturer id to tell one switch id from
>>another?
>>
>>Is the switch id persistent across reboots?
>
> Yes it is (as for phys port id).

>>>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>>>mean we can't have a purely software implementation of this interface?
>>>>Given that we will want a software implementation at some point
>>>>including PHYS in the name seems completely wrong.
>>>
>>> We can remove the "PHYS", no problem. I do not understand what you say
>>> about "software implementation". The point is to allow hw switch/ish
>>> chips to be supported.
>>
>>If we are talking about something typically stored in a eeprom like a
>>mac address phys seems appropriate.
>
> Yes, we are.
>
>>
>>Still having a definition of this switch id clean clear enough that
>>net/bridge and drivers/net/macvlan can implement it seems important.
>
> I don't understand why would net/bridge or driver/net/macvlan want to
> implement this. The purpose is to group ports/netdevs which are part of
> the same hw switch.

Certainly all of the macvlan devices are part of the same logical
switch, and are all the same hardware.

>>Even more important is having a definition of switch id clear enough
>>that userspace can use the switch id to do something useful.
>
> Userspace threats this the same it treats phys port id.
>
> if two ports/netdevs has same switch id, they belong under same hw
> switch. That's all.

So this id needs to be globally unique?

How does the rocket driver achieve global uniquness?  Is the rocket
driver using an IEEE assigned prefix based id like the ethernet mac
address?

This is an important detail as ensuring global uniqueness can take a lot
of work so there needs to be a general agreement on how global
uniqueness is achieved.

Saying it is up to the driver how to achieve a globally unique id across
every different switch driver is really insufficient.  If I have two
drivers that each implement an id that is 64bit long and they use two
completely different methods there is a real chance of collision.

So either I need a driver id that I also use in the comparison between
switch ids or how a driver generates a globally unique id need to be
specified.

>>Right now switch id looks like one of those weird one manufacturer
>>properties that is fine to expose as a driver specific property
>>but I don't yet see it being a generic property I that can be used
>>usefully in userspace.
>>
>>So can we please get some clear semantics or failing that can we please
>>not expose this to userspace as generic property.
>
> I thought that the semantics is clean. Looks like I will have to update
> Documentation/networking/switchdev.txt adding some more info about
> this.

The semantics as described so far will not achieve a useful id, which
makes them rubbish.

Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 18:57           ` Eric W. Biederman
@ 2014-12-04 19:19             ` Jiri Pirko
  2014-12-04 19:26               ` Eric W. Biederman
  0 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 19:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Thu, Dec 04, 2014 at 07:57:41PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 06:52:49PM CET, ebiederm@xmission.com wrote:
>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>>>
>>>>>Would someone please explain to me what a switch id is?
>>>>>
>>>>>I looked in the kernel source, and I looked here and while I know
>>>>>switches I don't have a clue what a switch id is.
>>>>>
>>>>>My primary concern at this point is that you have introduced a global
>>>>>identifier that is isn't a hardware property (it certainly does not look
>>>>>like a mac address) and that is unique across network namespaces and
>>>>>thus breaks checkpoint/restart (aka CRIU).
>>>>
>>>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>>>> generated by the driver and ensures that there is the same switch id for
>>>> all ports belonging to the same switch chip/asic. It is up to the driver
>>>> how to implement the id. I would like to point you to driver
>>>> implementing ndo_get_phys_port_id
>>>
>>>Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
>>>guid in the case of infiniband.  Which really makes me wonder why we
>>>didn't use the same abstractions in the code for address types that we
>>>do for hardware addresses.
>>>
>>>Using mac address or other hardware addresses that are used for layer 2
>>>addressing makes sense to me.  There is a long tradition of that and as
>>>I recall protocols like STP actually requiring having a different mac
>>>address per port.
>>>
>>>When I asked the question I thought the switch id was going to be
>>>something like the ifindex, the software index of a network device.
>>>
>>>
>>>Finally having tracked down the rocker implementation of 
>>>rocker_port_switch_parent_id_get I see it you are reading some 64bit
>>>hardware register.
>>>
>>>Which leads me to ask what are the semantics of switch_id?
>>>
>>>Is the switch id an identifier with a prefix from IEEE and assigned by
>>>the manufacture so that it is guaranteed to the tolerances of the
>>>manufacturing process to be globally unique?
>>
>> It is up to the driver what to use. It can use mac addr. This is same as
>> for phys port id.
>
>My reading of the code says the phys port id is the layer 2 hardware
>address of the port.  Certainly that is the case for all of the
>implementations now and it would be insane for it to be anything else.
>
>>>Is the switch id a random number that is statistically likely to be
>>>globally unique because you have enough bits?   As I recall you need
>>>at least 128 bits to have a reasonable chance of a random number
>>>avoiding the birthday paradox.
>>>
>>>Do we need some kind of manufacturer id to tell one switch id from
>>>another?
>>>
>>>Is the switch id persistent across reboots?
>>
>> Yes it is (as for phys port id).
>
>>>>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>>>>mean we can't have a purely software implementation of this interface?
>>>>>Given that we will want a software implementation at some point
>>>>>including PHYS in the name seems completely wrong.
>>>>
>>>> We can remove the "PHYS", no problem. I do not understand what you say
>>>> about "software implementation". The point is to allow hw switch/ish
>>>> chips to be supported.
>>>
>>>If we are talking about something typically stored in a eeprom like a
>>>mac address phys seems appropriate.
>>
>> Yes, we are.
>>
>>>
>>>Still having a definition of this switch id clean clear enough that
>>>net/bridge and drivers/net/macvlan can implement it seems important.
>>
>> I don't understand why would net/bridge or driver/net/macvlan want to
>> implement this. The purpose is to group ports/netdevs which are part of
>> the same hw switch.
>
>Certainly all of the macvlan devices are part of the same logical
>switch, and are all the same hardware.
>
>>>Even more important is having a definition of switch id clear enough
>>>that userspace can use the switch id to do something useful.
>>
>> Userspace threats this the same it treats phys port id.
>>
>> if two ports/netdevs has same switch id, they belong under same hw
>> switch. That's all.
>
>So this id needs to be globally unique?

No. It is enough to be unique within a single system. It serves for no
more than to find out 2 ids are same or not, no other info value.

So when the drivers uses sane ids (like mac for example, or in case of
rocker an id which is passed by qemu command line), the chances of
collision are very very close to none (never say never).

>
>How does the rocket driver achieve global uniquness?  Is the rocket
>driver using an IEEE assigned prefix based id like the ethernet mac
>address?
>
>This is an important detail as ensuring global uniqueness can take a lot
>of work so there needs to be a general agreement on how global
>uniqueness is achieved.
>
>Saying it is up to the driver how to achieve a globally unique id across
>every different switch driver is really insufficient.  If I have two
>drivers that each implement an id that is 64bit long and they use two
>completely different methods there is a real chance of collision.
>
>So either I need a driver id that I also use in the comparison between
>switch ids or how a driver generates a globally unique id need to be
>specified.
>
>>>Right now switch id looks like one of those weird one manufacturer
>>>properties that is fine to expose as a driver specific property
>>>but I don't yet see it being a generic property I that can be used
>>>usefully in userspace.
>>>
>>>So can we please get some clear semantics or failing that can we please
>>>not expose this to userspace as generic property.
>>
>> I thought that the semantics is clean. Looks like I will have to update
>> Documentation/networking/switchdev.txt adding some more info about
>> this.
>
>The semantics as described so far will not achieve a useful id, which
>makes them rubbish.
>
>Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 19:19             ` Jiri Pirko
@ 2014-12-04 19:26               ` Eric W. Biederman
  2014-12-04 19:54                 ` Jiri Pirko
  2014-12-04 20:06                 ` Eric W. Biederman
  0 siblings, 2 replies; 53+ messages in thread
From: Eric W. Biederman @ 2014-12-04 19:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Jiri Pirko <jiri@resnulli.us> writes:

>>So this id needs to be globally unique?
>
> No. It is enough to be unique within a single system. It serves for no
> more than to find out 2 ids are same or not, no other info value.
>
> So when the drivers uses sane ids (like mac for example, or in case of
> rocker an id which is passed by qemu command line), the chances of
> collision are very very close to none (never say never).

So the switch id isn't necessarily even as good as a manufacturers
serial number that changes between different models, and anyone who uses
this id must look at the driver to get uniqueness.

Yes what is needed in the comparison to get uniqueness in a single
system badly needs to be documented because that switch id is very much
not enough on it's own.

phys_port_id on the other hand is globally unique.  So please stop
saying that this is anything like phys_port_id.

Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 19:26               ` Eric W. Biederman
@ 2014-12-04 19:54                 ` Jiri Pirko
  2014-12-04 20:06                 ` Eric W. Biederman
  1 sibling, 0 replies; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 19:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Thu, Dec 04, 2014 at 08:26:14PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>>>So this id needs to be globally unique?
>>
>> No. It is enough to be unique within a single system. It serves for no
>> more than to find out 2 ids are same or not, no other info value.
>>
>> So when the drivers uses sane ids (like mac for example, or in case of
>> rocker an id which is passed by qemu command line), the chances of
>> collision are very very close to none (never say never).
>
>So the switch id isn't necessarily even as good as a manufacturers
>serial number that changes between different models, and anyone who uses
>this id must look at the driver to get uniqueness.
>
>Yes what is needed in the comparison to get uniqueness in a single
>system badly needs to be documented because that switch id is very much
>not enough on it's own.
>
>phys_port_id on the other hand is globally unique.  So please stop
>saying that this is anything like phys_port_id.

I'm sorry. But is is the same. And the purpose is very similar. In case
of phys port id to find out if 2 netdevices share the same physical port.
In case of phys switch id to find out if 2 netdevices share the same
switch parent.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 19:26               ` Eric W. Biederman
  2014-12-04 19:54                 ` Jiri Pirko
@ 2014-12-04 20:06                 ` Eric W. Biederman
  2014-12-04 20:27                   ` Jiri Pirko
  1 sibling, 1 reply; 53+ messages in thread
From: Eric W. Biederman @ 2014-12-04 20:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

ebiederm@xmission.com (Eric W. Biederman) writes:

> Jiri Pirko <jiri@resnulli.us> writes:
>
>>>So this id needs to be globally unique?
>>
>> No. It is enough to be unique within a single system. It serves for no
>> more than to find out 2 ids are same or not, no other info value.
>>
>> So when the drivers uses sane ids (like mac for example, or in case of
>> rocker an id which is passed by qemu command line), the chances of
>> collision are very very close to none (never say never).

Thinking about what you said a little more.

Two different sources of persistent numbers picking numbers by
completely different algorithms can give you no assurance that you don't
produce conflicts.

The switch id as desisgned can not work.

There are expected to be between 2**36 to 2**40 devices in this world.
Your first switch id is a 64it number.  At the very best by the birthday
pardox predicts there will be a conflict ever 2**32 devices or between
2**4 and 2**8 devices in the world with conflicts.  If the ids are not
randomly distributed (which they won't be) things could easily be much
much worse.

That is just good enough the code could get out there and run for years
before you have the nightmare of having to fix all of userspace.   That
is a nightmare no one needs.

So please remove this broken code, and this broken concept from the
kernel and go back to the drawing board.

Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 20:06                 ` Eric W. Biederman
@ 2014-12-04 20:27                   ` Jiri Pirko
  2014-12-04 20:55                     ` Eric W. Biederman
  0 siblings, 1 reply; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 20:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
>ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Jiri Pirko <jiri@resnulli.us> writes:
>>
>>>>So this id needs to be globally unique?
>>>
>>> No. It is enough to be unique within a single system. It serves for no
>>> more than to find out 2 ids are same or not, no other info value.
>>>
>>> So when the drivers uses sane ids (like mac for example, or in case of
>>> rocker an id which is passed by qemu command line), the chances of
>>> collision are very very close to none (never say never).
>
>Thinking about what you said a little more.
>
>Two different sources of persistent numbers picking numbers by
>completely different algorithms can give you no assurance that you don't
>produce conflicts.
>
>The switch id as desisgned can not work.
>
>There are expected to be between 2**36 to 2**40 devices in this world.
>Your first switch id is a 64it number.  At the very best by the birthday
>pardox predicts there will be a conflict ever 2**32 devices or between
>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
>randomly distributed (which they won't be) things could easily be much
>much worse.
>
>That is just good enough the code could get out there and run for years
>before you have the nightmare of having to fix all of userspace.   That
>is a nightmare no one needs.
>
>So please remove this broken code, and this broken concept from the
>kernel and go back to the drawing board.

In that case the phys port id is broken in the same way. Let's rather
think about how to avoid conflicts for both. Given the fact the
conflicts should be avoided only on a single baremetal, that should be
doable (for (bad) example using driver name mixed with driver created id).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 16:55         ` Roopa Prabhu
@ 2014-12-04 20:49           ` Scott Feldman
  2014-12-05  2:28             ` Roopa Prabhu
  0 siblings, 1 reply; 53+ messages in thread
From: Scott Feldman @ 2014-12-04 20:49 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Jamal Hadi Salim, Florian Fainelli,
	John Linville

On Thu, Dec 4, 2014 at 8:55 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> On 12/4/14, 8:04 AM, Jiri Pirko wrote:
>>
>> Thu, Dec 04, 2014 at 03:45:44PM CET, roopa@cumulusnetworks.com wrote:
>>>
>>> On 12/4/14, 6:34 AM, Jiri Pirko wrote:
>>>>
>>>> Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>>>>>
>>>>> On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>>>>>
>>>>>> Jiri Pirko (1):
>>>>>>    iproute2: ipa: show switch id
>>>>>>
>>>>>> Scott Feldman (5):
>>>>>>    bridge/fdb: fix statistics output spacing
>>>>>>    bridge/fdb: add flag/indication for FDB entry synced from offload
>>>>>>      device
>>>>>>    bridge/link: add new offload hwmode swdev
>>>>>
>>>>> Ack to most patches but nack on this one. The todo list still has a
>>>>> note to
>>>>> revist the flag to indicate switchdev offloads.
>>>>> Exposing this to userspace does not help that.
>>>>
>>>> Hmm, note that this is already exposed to userspace, this patchset is
>>>> for iproute2 (userspace tool).
>>>
>>> hmmm, all feedback on the switchdev patches seemed to indicate we can
>>> change
>>> this later.
>>> I don't see swdev mode being used in the kernel anywhere today.
>>
>> Well, it is, in rocker:
>> $ git grep BRIDGE_MODE_SWDEV
>> drivers/net/ethernet/rocker/rocker.c:                   if (mode !=
>> BRIDGE_MODE_SWDEV)
>> drivers/net/ethernet/rocker/rocker.c:   u16 mode = BRIDGE_MODE_SWDEV;
>> include/uapi/linux/if_bridge.h:#define BRIDGE_MODE_SWDEV        2       /*
>> Full switch device offload */
>
>
> The problem is rocker is not the only one who is going to be using this. And
> so, we need something that fits everybody.
> And i am not going to make my user set a mode for him to enable offload to
> hw.
>
>>
>>> I will send a patch to remove it. Its still in net-next and so can be
>>> changed
>>> ?.
>>> I was going to resend my patch to introduce a common offload flag for all
>>> link objects.
>>> It would be nice if all of them had a consistent flag to indicate hw
>>> offload
>>> and iproute2 could display the same flag for all.
>>> Including bonds and vxlan's.
>>
>> I do not understand the connection with BRIDGE_MODE_SWDEV. We discussed
>> this already. BRIDGE_MODE_SWDEV is a bridge mode, similar to for example
>> BRIDGE_MODE_VEPA and makes perfect sense to have it.
>
> I dont think everybody acked it. But it went in with a note saying that it
> can be changed.

I thought that was the plan: this new mode goes in now for net-next
and iproute2, and you would supply follow up patch for each to move to
your switch port flag.  That will give us time to review your work
without have net-next and iproute2 out-of-sync.

>>
>>
>> How vxlan and bonds come into the mixture, that is a puzzler for me.
>> Maybe I have to see patches.
>
>
> I had posted a version of the patch previously:
> http://www.spinics.net/lists/netdev/msg305472.html
>
> I have a v2 patch in my stack which does not touch the netlink header.
> But in the past hour, i have been thinking about it some more. Do we really
> need this set by the user ?. In my use case i don't need it.

Look at how iproute2 figures out if SELF should be set or not.  It's
only set if hwmode is set, otherwise it defaults to MASTER.  So with
SWDEV a new hwmode, we can push settings (learning, leraning_sync) to
port with SELF set.  It's probably not an ideal arrangement having to
set hwmode each time, but this was the low-touch change to iproute2 to
push port settings.

I'm hoping your new patch will kind of straighten this all out.  But
you've got extra work to make sure backward compat with older iproute2
still works, including this weirdness around hwmode.

>
> We do need a feature flag (or net_device_flags), but it does not need to be
> set by the user explicitly.
> This flag can be set by the switch port driver on the switch ports. And the
> logical device: bridge/bond/vxlan
> can inherit it from the port. There was a need of a flag in some usecases,
> to control offloading of specific bridge port flags
> to hw/sw (example learning in hw or sw). example patch:
> https://patchwork.ozlabs.org/patch/413211/
>
> I will post something today.

Can you include matching iproute2 changes?  (Assuming you'll building
on top of what's already in net-next and this iproute2 set Jiri sent
out).  It's helpful to see the iproute2 changes to see what the new
cmd structure is and how legacy is handled.

-scott

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 20:27                   ` Jiri Pirko
@ 2014-12-04 20:55                     ` Eric W. Biederman
  2014-12-04 21:10                       ` Jiri Pirko
  2014-12-05  9:54                       ` David Laight
  0 siblings, 2 replies; 53+ messages in thread
From: Eric W. Biederman @ 2014-12-04 20:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Jiri Pirko <jiri@resnulli.us> writes:

> Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
>>ebiederm@xmission.com (Eric W. Biederman) writes:
>>
>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>>>>So this id needs to be globally unique?
>>>>
>>>> No. It is enough to be unique within a single system. It serves for no
>>>> more than to find out 2 ids are same or not, no other info value.
>>>>
>>>> So when the drivers uses sane ids (like mac for example, or in case of
>>>> rocker an id which is passed by qemu command line), the chances of
>>>> collision are very very close to none (never say never).
>>
>>Thinking about what you said a little more.
>>
>>Two different sources of persistent numbers picking numbers by
>>completely different algorithms can give you no assurance that you don't
>>produce conflicts.
>>
>>The switch id as desisgned can not work.
>>
>>There are expected to be between 2**36 to 2**40 devices in this world.
>>Your first switch id is a 64it number.  At the very best by the birthday
>>pardox predicts there will be a conflict ever 2**32 devices or between
>>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
>>randomly distributed (which they won't be) things could easily be much
>>much worse.
>>
>>That is just good enough the code could get out there and run for years
>>before you have the nightmare of having to fix all of userspace.   That
>>is a nightmare no one needs.
>>
>>So please remove this broken code, and this broken concept from the
>>kernel and go back to the drawing board.
>
> In that case the phys port id is broken in the same way. Let's rather
> think about how to avoid conflicts for both. Given the fact the
> conflicts should be avoided only on a single baremetal, that should be
> doable (for (bad) example using driver name mixed with driver created
> id).

No.  phys_port_id is not broken in the same way, and phys_port_id does
not have the same set of properties.

phys_port_id's in practice all have an IEEE prefix that identifies the
manufacturer and a manufacture assigned serial number.  Aka a mac
address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
a 64bit EUID-64 I don't know.  If there are problems in the worst
case issues with phys_port_id are fixable by simple driver tweaks,
because fundamentally we are working with globally uniuqe identifiers.
Well globally unique baring manufacturing bugs in eeproms.



I agree with you that the switch id concept can be saved.  But I think
we should fix switch id before we export it to userspace so we don't
have to break userspace later.

My intuition says we want something like ifindex, but I am not at all
certain how switch id is planned to be used.  Given that it is single
box I don't expect you are sending it out over the wire.

*shrug*

Why does switch id need to be persistent?  Why can't switch id be
property like ifindex?

What are the actual requirements.

Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 4/6] bridge/link: add new offload hwmode swdev
  2014-12-04 13:23   ` Jamal Hadi Salim
@ 2014-12-04 20:55     ` Scott Feldman
  0 siblings, 0 replies; 53+ messages in thread
From: Scott Feldman @ 2014-12-04 20:55 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Florian Fainelli, Roopa Prabhu,
	John Linville

On Thu, Dec 4, 2014 at 5:23 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 12/04/14 03:57, Jiri Pirko wrote:
>>
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> To support full-featured switch devices offloading bridge funtionality,
>> add new hwmode 'swdev'.  Like 'vepa' and 'veb', 'swdev' indicated bridge
>> port functionality is being offloaded to hardware.
>>
>
> Unhappy with the name swdev ..
> Ok, go ahead and beat up on me.

Lack of creativity, I know.  But, I doubt anyone is confused by the name.

Hopefully it's short-lived and replace with what Roopa comes up with
for switch port offload flag.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 20:55                     ` Eric W. Biederman
@ 2014-12-04 21:10                       ` Jiri Pirko
  2014-12-04 21:24                         ` Eric W. Biederman
  2014-12-04 22:07                         ` Thomas Graf
  2014-12-05  9:54                       ` David Laight
  1 sibling, 2 replies; 53+ messages in thread
From: Jiri Pirko @ 2014-12-04 21:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Thu, Dec 04, 2014 at 09:55:07PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
>>>ebiederm@xmission.com (Eric W. Biederman) writes:
>>>
>>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>>
>>>>>>So this id needs to be globally unique?
>>>>>
>>>>> No. It is enough to be unique within a single system. It serves for no
>>>>> more than to find out 2 ids are same or not, no other info value.
>>>>>
>>>>> So when the drivers uses sane ids (like mac for example, or in case of
>>>>> rocker an id which is passed by qemu command line), the chances of
>>>>> collision are very very close to none (never say never).
>>>
>>>Thinking about what you said a little more.
>>>
>>>Two different sources of persistent numbers picking numbers by
>>>completely different algorithms can give you no assurance that you don't
>>>produce conflicts.
>>>
>>>The switch id as desisgned can not work.
>>>
>>>There are expected to be between 2**36 to 2**40 devices in this world.
>>>Your first switch id is a 64it number.  At the very best by the birthday
>>>pardox predicts there will be a conflict ever 2**32 devices or between
>>>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
>>>randomly distributed (which they won't be) things could easily be much
>>>much worse.
>>>
>>>That is just good enough the code could get out there and run for years
>>>before you have the nightmare of having to fix all of userspace.   That
>>>is a nightmare no one needs.
>>>
>>>So please remove this broken code, and this broken concept from the
>>>kernel and go back to the drawing board.
>>
>> In that case the phys port id is broken in the same way. Let's rather
>> think about how to avoid conflicts for both. Given the fact the
>> conflicts should be avoided only on a single baremetal, that should be
>> doable (for (bad) example using driver name mixed with driver created
>> id).
>
>No.  phys_port_id is not broken in the same way, and phys_port_id does
>not have the same set of properties.
>
>phys_port_id's in practice all have an IEEE prefix that identifies the
>manufacturer and a manufacture assigned serial number.  Aka a mac
>address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
>a 64bit EUID-64 I don't know.  If there are problems in the worst
>case issues with phys_port_id are fixable by simple driver tweaks,
>because fundamentally we are working with globally uniuqe identifiers.
>Well globally unique baring manufacturing bugs in eeproms.

Well the fact that phys_post_id's are now implemented mostly by putting
mac into it does not mean that other drivers cannot do it differently.
So once again, phys_port_id and phys_switch_id are the same in this
matter.

>
>
>
>I agree with you that the switch id concept can be saved.  But I think
>we should fix switch id before we export it to userspace so we don't
>have to break userspace later.
>
>My intuition says we want something like ifindex, but I am not at all
>certain how switch id is planned to be used.  Given that it is single
>box I don't expect you are sending it out over the wire.

No, it is not to be send out.

>
>*shrug*
>
>Why does switch id need to be persistent?  Why can't switch id be
>property like ifindex?

Well I can imagine that multiple ports of the same switch chip could be
passed through to the virtual machines (similar to SR-IOV pf/vf).

>
>What are the actual requirements.


They are actually very similar to phys_port_id. Therefore I made that
the same.



>
>Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 21:10                       ` Jiri Pirko
@ 2014-12-04 21:24                         ` Eric W. Biederman
  2014-12-04 22:07                         ` Thomas Graf
  1 sibling, 0 replies; 53+ messages in thread
From: Eric W. Biederman @ 2014-12-04 21:24 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl, hemal

Jiri Pirko <jiri@resnulli.us> writes:

>>No.  phys_port_id is not broken in the same way, and phys_port_id does
>>not have the same set of properties.
>>
>>phys_port_id's in practice all have an IEEE prefix that identifies the
>>manufacturer and a manufacture assigned serial number.  Aka a mac
>>address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
>>a 64bit EUID-64 I don't know.  If there are problems in the worst
>>case issues with phys_port_id are fixable by simple driver tweaks,
>>because fundamentally we are working with globally uniuqe identifiers.
>>Well globally unique baring manufacturing bugs in eeproms.
>
> Well the fact that phys_post_id's are now implemented mostly by putting
> mac into it does not mean that other drivers cannot do it differently.
> So once again, phys_port_id and phys_switch_id are the same in this
> matter.

No exclusively implemented that way.

And yes other drivers can implement bugs, that doesn't mean those bugs
will be toleraged and won't break userspace.

>>I agree with you that the switch id concept can be saved.  But I think
>>we should fix switch id before we export it to userspace so we don't
>>have to break userspace later.
>>
>>My intuition says we want something like ifindex, but I am not at all
>>certain how switch id is planned to be used.  Given that it is single
>>box I don't expect you are sending it out over the wire.
>
> No, it is not to be send out.
>
>>
>>*shrug*
>>
>>Why does switch id need to be persistent?  Why can't switch id be
>>property like ifindex?
>
> Well I can imagine that multiple ports of the same switch chip could be
> passed through to the virtual machines (similar to SR-IOV pf/vf).

In that case you do need something that is globally unique.  Because
if I migrate your virtual machine onto a different physical machine 
seeing the same switch id and I might make the mistaken assumption
that I am remain on the same switch if the ids are not global.

>>What are the actual requirements.
>
>
> They are actually very similar to phys_port_id. Therefore I made that
> the same.

Then please for rocket use a non-buggy implementation with a globally
unique id.

Given your descriptions of the requirements I can't see how any other
implementation isn't buggy.

Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 21:10                       ` Jiri Pirko
  2014-12-04 21:24                         ` Eric W. Biederman
@ 2014-12-04 22:07                         ` Thomas Graf
  1 sibling, 0 replies; 53+ messages in thread
From: Thomas Graf @ 2014-12-04 22:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Eric W. Biederman, netdev, davem, nhorman, andy, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, b

On 12/04/14 at 10:10pm, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 09:55:07PM CET, ebiederm@xmission.com wrote:
> >My intuition says we want something like ifindex, but I am not at all
> >certain how switch id is planned to be used.  Given that it is single
> >box I don't expect you are sending it out over the wire.
> 
> No, it is not to be send out.

We discussed this before and stated we would fix this as soon as it's
merged. Let's just define a simple and quick switch id prefix generator
ala ifindex which serves as a unique prefix to all switch ids.

I have time early next week if Jiri doesn't.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
  2014-12-04 20:49           ` Scott Feldman
@ 2014-12-05  2:28             ` Roopa Prabhu
  0 siblings, 0 replies; 53+ messages in thread
From: Roopa Prabhu @ 2014-12-05  2:28 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Jamal Hadi Salim, Florian Fainelli,
	John Linville

On 12/4/14, 12:49 PM, Scott Feldman wrote:
> On Thu, Dec 4, 2014 at 8:55 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>> On 12/4/14, 8:04 AM, Jiri Pirko wrote:
>>> Thu, Dec 04, 2014 at 03:45:44PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/4/14, 6:34 AM, Jiri Pirko wrote:
>>>>> Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>>>>>> On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>>>>>> Jiri Pirko (1):
>>>>>>>     iproute2: ipa: show switch id
>>>>>>>
>>>>>>> Scott Feldman (5):
>>>>>>>     bridge/fdb: fix statistics output spacing
>>>>>>>     bridge/fdb: add flag/indication for FDB entry synced from offload
>>>>>>>       device
>>>>>>>     bridge/link: add new offload hwmode swdev
>>>>>> Ack to most patches but nack on this one. The todo list still has a
>>>>>> note to
>>>>>> revist the flag to indicate switchdev offloads.
>>>>>> Exposing this to userspace does not help that.
>>>>> Hmm, note that this is already exposed to userspace, this patchset is
>>>>> for iproute2 (userspace tool).
>>>> hmmm, all feedback on the switchdev patches seemed to indicate we can
>>>> change
>>>> this later.
>>>> I don't see swdev mode being used in the kernel anywhere today.
>>> Well, it is, in rocker:
>>> $ git grep BRIDGE_MODE_SWDEV
>>> drivers/net/ethernet/rocker/rocker.c:                   if (mode !=
>>> BRIDGE_MODE_SWDEV)
>>> drivers/net/ethernet/rocker/rocker.c:   u16 mode = BRIDGE_MODE_SWDEV;
>>> include/uapi/linux/if_bridge.h:#define BRIDGE_MODE_SWDEV        2       /*
>>> Full switch device offload */
>>
>> The problem is rocker is not the only one who is going to be using this. And
>> so, we need something that fits everybody.
>> And i am not going to make my user set a mode for him to enable offload to
>> hw.
>>
>>>> I will send a patch to remove it. Its still in net-next and so can be
>>>> changed
>>>> ?.
>>>> I was going to resend my patch to introduce a common offload flag for all
>>>> link objects.
>>>> It would be nice if all of them had a consistent flag to indicate hw
>>>> offload
>>>> and iproute2 could display the same flag for all.
>>>> Including bonds and vxlan's.
>>> I do not understand the connection with BRIDGE_MODE_SWDEV. We discussed
>>> this already. BRIDGE_MODE_SWDEV is a bridge mode, similar to for example
>>> BRIDGE_MODE_VEPA and makes perfect sense to have it.
>> I dont think everybody acked it. But it went in with a note saying that it
>> can be changed.
> I thought that was the plan: this new mode goes in now for net-next
> and iproute2, and you would supply follow up patch for each to move to
> your switch port flag.  That will give us time to review your work
> without have net-next and iproute2 out-of-sync.
>
>>>
>>> How vxlan and bonds come into the mixture, that is a puzzler for me.
>>> Maybe I have to see patches.
>>
>> I had posted a version of the patch previously:
>> http://www.spinics.net/lists/netdev/msg305472.html
>>
>> I have a v2 patch in my stack which does not touch the netlink header.
>> But in the past hour, i have been thinking about it some more. Do we really
>> need this set by the user ?. In my use case i don't need it.
> Look at how iproute2 figures out if SELF should be set or not.  It's
> only set if hwmode is set, otherwise it defaults to MASTER.  So with
> SWDEV a new hwmode, we can push settings (learning, leraning_sync) to
> port with SELF set.  It's probably not an ideal arrangement having to
> set hwmode each time, but this was the low-touch change to iproute2 to
> push port settings.
Did not know about this. Thanks for the info.
>
> I'm hoping your new patch will kind of straighten this all out.  But
> you've got extra work to make sure backward compat with older iproute2
> still works, including this weirdness around hwmode.
>
>> We do need a feature flag (or net_device_flags), but it does not need to be
>> set by the user explicitly.
>> This flag can be set by the switch port driver on the switch ports. And the
>> logical device: bridge/bond/vxlan
>> can inherit it from the port. There was a need of a flag in some usecases,
>> to control offloading of specific bridge port flags
>> to hw/sw (example learning in hw or sw). example patch:
>> https://patchwork.ozlabs.org/patch/413211/
>>
>> I will post something today.
> Can you include matching iproute2 changes?  (Assuming you'll building
> on top of what's already in net-next and this iproute2 set Jiri sent
> out).  It's helpful to see the iproute2 changes to see what the new
> cmd structure is and how legacy is handled.
Just sent what ever i had. Take a look and let me know.

Thanks,
Roopa


>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-04 20:55                     ` Eric W. Biederman
  2014-12-04 21:10                       ` Jiri Pirko
@ 2014-12-05  9:54                       ` David Laight
  2014-12-08 21:56                         ` Eric W. Biederman
  1 sibling, 1 reply; 53+ messages in thread
From: David Laight @ 2014-12-05  9:54 UTC (permalink / raw)
  To: 'Eric W. Biederman', Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma@gmail.com

From: Eric W. Biederman
> Jiri Pirko <jiri@resnulli.us> writes:
> 
> > Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
> >>ebiederm@xmission.com (Eric W. Biederman) writes:
> >>
> >>> Jiri Pirko <jiri@resnulli.us> writes:
> >>>
> >>>>>So this id needs to be globally unique?
> >>>>
> >>>> No. It is enough to be unique within a single system. It serves for no
> >>>> more than to find out 2 ids are same or not, no other info value.
> >>>>
> >>>> So when the drivers uses sane ids (like mac for example, or in case of
> >>>> rocker an id which is passed by qemu command line), the chances of
> >>>> collision are very very close to none (never say never).
> >>
> >>Thinking about what you said a little more.
> >>
> >>Two different sources of persistent numbers picking numbers by
> >>completely different algorithms can give you no assurance that you don't
> >>produce conflicts.
> >>
> >>The switch id as desisgned can not work.
> >>
> >>There are expected to be between 2**36 to 2**40 devices in this world.
> >>Your first switch id is a 64it number.  At the very best by the birthday
> >>pardox predicts there will be a conflict ever 2**32 devices or between
> >>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
> >>randomly distributed (which they won't be) things could easily be much
> >>much worse.
> >>
> >>That is just good enough the code could get out there and run for years
> >>before you have the nightmare of having to fix all of userspace.   That
> >>is a nightmare no one needs.
> >>
> >>So please remove this broken code, and this broken concept from the
> >>kernel and go back to the drawing board.
> >
> > In that case the phys port id is broken in the same way. Let's rather
> > think about how to avoid conflicts for both. Given the fact the
> > conflicts should be avoided only on a single baremetal, that should be
> > doable (for (bad) example using driver name mixed with driver created
> > id).
> 
> No.  phys_port_id is not broken in the same way, and phys_port_id does
> not have the same set of properties.
> 
> phys_port_id's in practice all have an IEEE prefix that identifies the
> manufacturer and a manufacture assigned serial number.  Aka a mac
> address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
> a 64bit EUID-64 I don't know.  If there are problems in the worst
> case issues with phys_port_id are fixable by simple driver tweaks,
> because fundamentally we are working with globally uniuqe identifiers.
> Well globally unique baring manufacturing bugs in eeproms.

Manufacturers have to generate unique MAC addresses - otherwise people complain.
But can't be assumed to put different 'serial numbers' in other devices.
If you look at USB memory sticks you are likely to find that the serial
number in the (equivalent of the) ATA identify response isn't unique.
So I doubt you can use the value to distinguish between devices.

You also get the situation where ethernet MAC addresses only have to be
unique within a collision domain. Many old sun systems used a single MAC
address - valid because they assumed/required that multiple interfaces
be connected to different networks.
So even MAC addresses aren't per-interface.

	David

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
  2014-12-05  9:54                       ` David Laight
@ 2014-12-08 21:56                         ` Eric W. Biederman
  0 siblings, 0 replies; 53+ messages in thread
From: Eric W. Biederman @ 2014-12-08 21:56 UTC (permalink / raw)
  To: David Laight
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet,
	jhs@mojatatu.com

David Laight <David.Laight@ACULAB.COM> writes:

> From: Eric W. Biederman
>> Jiri Pirko <jiri@resnulli.us> writes:
>> 
>> > Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
>> >>ebiederm@xmission.com (Eric W. Biederman) writes:
>> >>
>> >>> Jiri Pirko <jiri@resnulli.us> writes:
>> >>>
>> >>>>>So this id needs to be globally unique?
>> >>>>
>> >>>> No. It is enough to be unique within a single system. It serves for no
>> >>>> more than to find out 2 ids are same or not, no other info value.
>> >>>>
>> >>>> So when the drivers uses sane ids (like mac for example, or in case of
>> >>>> rocker an id which is passed by qemu command line), the chances of
>> >>>> collision are very very close to none (never say never).
>> >>
>> >>Thinking about what you said a little more.
>> >>
>> >>Two different sources of persistent numbers picking numbers by
>> >>completely different algorithms can give you no assurance that you don't
>> >>produce conflicts.
>> >>
>> >>The switch id as desisgned can not work.
>> >>
>> >>There are expected to be between 2**36 to 2**40 devices in this world.
>> >>Your first switch id is a 64it number.  At the very best by the birthday
>> >>pardox predicts there will be a conflict ever 2**32 devices or between
>> >>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
>> >>randomly distributed (which they won't be) things could easily be much
>> >>much worse.
>> >>
>> >>That is just good enough the code could get out there and run for years
>> >>before you have the nightmare of having to fix all of userspace.   That
>> >>is a nightmare no one needs.
>> >>
>> >>So please remove this broken code, and this broken concept from the
>> >>kernel and go back to the drawing board.
>> >
>> > In that case the phys port id is broken in the same way. Let's rather
>> > think about how to avoid conflicts for both. Given the fact the
>> > conflicts should be avoided only on a single baremetal, that should be
>> > doable (for (bad) example using driver name mixed with driver created
>> > id).
>> 
>> No.  phys_port_id is not broken in the same way, and phys_port_id does
>> not have the same set of properties.
>> 
>> phys_port_id's in practice all have an IEEE prefix that identifies the
>> manufacturer and a manufacture assigned serial number.  Aka a mac
>> address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
>> a 64bit EUID-64 I don't know.  If there are problems in the worst
>> case issues with phys_port_id are fixable by simple driver tweaks,
>> because fundamentally we are working with globally uniuqe identifiers.
>> Well globally unique baring manufacturing bugs in eeproms.
>
> Manufacturers have to generate unique MAC addresses - otherwise people complain.
> But can't be assumed to put different 'serial numbers' in other devices.
> If you look at USB memory sticks you are likely to find that the serial
> number in the (equivalent of the) ATA identify response isn't unique.
> So I doubt you can use the value to distinguish between devices.

When I said serial number I was talking about MAC addresses.

> You also get the situation where ethernet MAC addresses only have to be
> unique within a collision domain. Many old sun systems used a single MAC
> address - valid because they assumed/required that multiple interfaces
> be connected to different networks.
> So even MAC addresses aren't per-interface.

The old sun system issue does not apply to switches and switch like
devices.

There are common layer two protocls (Spanning Tree? LACP?) that require
each switch port to have a unique mac address.  So people will complain
and software will break if there is not a mac address per port.  So for
switches and things that strongly resemble switches assuming a unique
mac address per port is a reasonable assumption.

Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 2/6] bridge/fdb: fix statistics output spacing
  2014-12-04  8:57 ` [patch iproute2 2/6] bridge/fdb: fix statistics output spacing Jiri Pirko
@ 2014-12-10  0:32   ` Stephen Hemminger
  0 siblings, 0 replies; 53+ messages in thread
From: Stephen Hemminger @ 2014-12-10  0:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On Thu,  4 Dec 2014 09:57:14 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> From: Scott Feldman <sfeldma@gmail.com>
> 
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Applied

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 4/6] bridge/link: add new offload hwmode swdev
  2014-12-04  8:57 ` [patch iproute2 4/6] bridge/link: add new offload hwmode swdev Jiri Pirko
  2014-12-04 13:23   ` Jamal Hadi Salim
@ 2014-12-24 20:37   ` Stephen Hemminger
  1 sibling, 0 replies; 53+ messages in thread
From: Stephen Hemminger @ 2014-12-24 20:37 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On Thu,  4 Dec 2014 09:57:16 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> From: Scott Feldman <sfeldma@gmail.com>
> 
> To support full-featured switch devices offloading bridge funtionality,
> add new hwmode 'swdev'.  Like 'vepa' and 'veb', 'swdev' indicated bridge
> port functionality is being offloaded to hardware.
> 
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Please resubmit when the functionality is in net-next.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device
  2014-12-04  8:57 ` [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device Jiri Pirko
  2014-12-04 13:19   ` Jamal Hadi Salim
@ 2014-12-24 20:39   ` Stephen Hemminger
  1 sibling, 0 replies; 53+ messages in thread
From: Stephen Hemminger @ 2014-12-24 20:39 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl, hemal

On Thu,  4 Dec 2014 09:57:15 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> From: Scott Feldman <sfeldma@gmail.com>
> 
> Add NTF_EXT_LEARNED flag to neigh flags to indicate FDB entry learned by
> device has been learned externally to bridge FDB.  For these entries,
> add "external" annotation in bridge fdb show output:
> 
>   00:02:00:00:03:00 dev swp2 used 2/2 master br0 external
>   00:02:00:00:03:00 dev swp2 self permanent
> 
> In the example above, 00:02:00:00:03:00 is shown twice on dev swp2.  The
> first entry if from the bridge (master) and is marked as "external" by
> the offload device.  The second entry is from the brport offload device (self),
> and was learned by the device.
> 
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Does not apply to current code.
Please resubmit (with out the change to neighbour.h which is already there).

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2014-12-24 20:39 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-04  8:57 [patch iproute2 0/6] iproute2: add changes for switchdev Jiri Pirko
2014-12-04  8:57 ` [patch iproute2 1/6] iproute2: ipa: show switch id Jiri Pirko
2014-12-04 13:17   ` Jamal Hadi Salim
2014-12-04 14:20   ` Andy Gospodarek
2014-12-04 14:29     ` Roopa Prabhu
2014-12-04 14:33     ` Jiri Pirko
2014-12-04 14:57       ` Andy Gospodarek
2014-12-04 15:12         ` Jiri Pirko
2014-12-04 15:15         ` Jiri Pirko
2014-12-04 15:28           ` Andy Gospodarek
2014-12-04 15:37         ` Thomas Graf
2014-12-04 16:15   ` Eric W. Biederman
2014-12-04 16:30     ` Jiri Pirko
2014-12-04 17:52       ` Eric W. Biederman
2014-12-04 17:59         ` Roopa Prabhu
2014-12-04 18:24         ` Jiri Pirko
2014-12-04 18:57           ` Eric W. Biederman
2014-12-04 19:19             ` Jiri Pirko
2014-12-04 19:26               ` Eric W. Biederman
2014-12-04 19:54                 ` Jiri Pirko
2014-12-04 20:06                 ` Eric W. Biederman
2014-12-04 20:27                   ` Jiri Pirko
2014-12-04 20:55                     ` Eric W. Biederman
2014-12-04 21:10                       ` Jiri Pirko
2014-12-04 21:24                         ` Eric W. Biederman
2014-12-04 22:07                         ` Thomas Graf
2014-12-05  9:54                       ` David Laight
2014-12-08 21:56                         ` Eric W. Biederman
2014-12-04  8:57 ` [patch iproute2 2/6] bridge/fdb: fix statistics output spacing Jiri Pirko
2014-12-10  0:32   ` Stephen Hemminger
2014-12-04  8:57 ` [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device Jiri Pirko
2014-12-04 13:19   ` Jamal Hadi Salim
2014-12-24 20:39   ` Stephen Hemminger
2014-12-04  8:57 ` [patch iproute2 4/6] bridge/link: add new offload hwmode swdev Jiri Pirko
2014-12-04 13:23   ` Jamal Hadi Salim
2014-12-04 20:55     ` Scott Feldman
2014-12-24 20:37   ` Stephen Hemminger
2014-12-04  8:57 ` [patch iproute2 5/6] link: add missing IFLA_BRPORT_PROXYARP Jiri Pirko
2014-12-04 18:53   ` Stephen Hemminger
2014-12-04  8:57 ` [patch iproute2 6/6] bridge/link: add learning_sync policy flag Jiri Pirko
2014-12-04 13:26   ` Jamal Hadi Salim
2014-12-04 14:15     ` Jamal Hadi Salim
2014-12-04 13:16 ` [patch iproute2 0/6] iproute2: add changes for switchdev Jamal Hadi Salim
2014-12-04 13:56   ` Andy Gospodarek
2014-12-04 14:22     ` Roopa Prabhu
2014-12-04 14:31       ` Jamal Hadi Salim
2014-12-04 14:26 ` Roopa Prabhu
2014-12-04 14:34   ` Jiri Pirko
2014-12-04 14:45     ` Roopa Prabhu
2014-12-04 16:04       ` Jiri Pirko
2014-12-04 16:55         ` Roopa Prabhu
2014-12-04 20:49           ` Scott Feldman
2014-12-05  2:28             ` Roopa Prabhu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.