All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding
@ 2015-07-19  1:24 sfeldma
  2015-07-19  1:24 ` [PATCH net-next v3 1/5] net: don't reforward packets already forwarded by offload device sfeldma
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: sfeldma @ 2015-07-19  1:24 UTC (permalink / raw)
  To: netdev; +Cc: jiri, roopa, simon.horman, nicolas.dichtel

From: Scott Feldman <sfeldma@gmail.com>

v3:

 - Per Nicolas Dichtel review: remove errant empty union.

v2:

 - Per davem review: in sk_buff, union fwd_mark with secmark to save space
   since features appear to be mutually exclusive.
 - Per Simon Horman review:
   - fix grammar in switchdev.txt wrt fwd_mark
   - remove some unrelated changes that snuck in

v1:

This patchset was previously submitted as RFC.  No changes from the last
version (v2) sent under RFC.  Including RFC version history here for reference.

RFC v2:

 - s/fwd_mark/offload_fwd_mark
 - use consume_skb rather than kfree_skb when dropping pkt on egress.
 - Use Jiri's suggestion to use ifindex of one of the ports in a group
   as the mark for all the ports in the group.  This can be done with
   no additional storage (no hashtable from v1).  To pull it off, we
   need some simple recursive routines to walk the netdev tree ensuring
   all leaves in the tree (ports) in the same group (e.g. bridge)
   belonging to the same switch device will have the same offload fwd mark.
   Maybe someone sees a better design for the recusive routines?  They're
   not too bad, and should cover the stacked driver cases.

RFC v1:

With switchdev support for offloading L2/L3 forwarding data path to a
switch device, we have a general problem where both the device and the
kernel may forward the packet, resulting in duplicate packets on the wire.
Anytime a packet is forwarded by the device and a copy is sent to the CPU,
there is potential for duplicate forwarding, as the kernel may also do a
forwarding lookup and send the packet on the wire.

The specific problem this patch series is interested in solving is avoiding
duplicate packets on bridged ports.  There was a previous RFC from Roopa
(http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
problem, but didn't solve the problem of mixed ports in the bridge from
different devices; there was no way to exclude some ports from forwarding
and include others.  This RFC solves that problem by tagging the ingressing
packet with a unique mark, and then comparing the packet mark with the
egress port mark, and skip forwarding when there is a match.  For the mixed
ports bridge case, only those ports with matching marks are skipped.

The switchdev port driver must do two things:

1) Generate a fwd_mark for each switch port, using some unique key of the
   switch device (and optionally port).  This is done when the port netdev
   is registered or if the port's group membership changes (joins/leaves
   a bridge, for example).

2) On packet ingress from port, mark the skb with the ingress port's
   fwd_mark.  If the device supports it, it's useful to only mark skbs
   which were already forwarded by the device.  If the device does not
   support such indication, all skbs can be marked, even if they're
   local dst.

Two new 32-bit fields are added to struct sk_buff and struct netdevice to
hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
tried using skb->mark for this purpose, but ebtables can overwrite the
skb->mark before the bridge gets it, so that will not work.

In general, this fwd_mark can be used for any case where a packet is
forwarded by the device and a copy is sent to the CPU, to avoid the kernel
re-forwarding the packet.  sFlow is another use-case that comes to mind,
but I haven't explored the details.
Scott Feldman (5):
  net: don't reforward packets already forwarded by offload device
  net: add phys ID compare helper to test if two IDs are the same
  switchdev: add offload_fwd_mark generator helper
  rocker: add offload_fwd_mark support
  switchdev: update documentation for offload_fwd_mark

 Documentation/networking/switchdev.txt |   14 +++-
 drivers/net/ethernet/rocker/rocker.c   |   11 ++++
 drivers/net/ethernet/rocker/rocker.h   |    1 +
 include/linux/netdevice.h              |   13 ++++
 include/linux/skbuff.h                 |    9 ++-
 include/net/switchdev.h                |    9 +++
 net/core/dev.c                         |   10 +++
 net/switchdev/switchdev.c              |  111 ++++++++++++++++++++++++++++++--
 8 files changed, 169 insertions(+), 9 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 1/5] net: don't reforward packets already forwarded by offload device
  2015-07-19  1:24 [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding sfeldma
@ 2015-07-19  1:24 ` sfeldma
  2015-07-20  7:21   ` Nicolas Dichtel
  2015-07-19  1:24 ` [PATCH net-next v3 2/5] net: add phys ID compare helper to test if two IDs are the same sfeldma
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: sfeldma @ 2015-07-19  1:24 UTC (permalink / raw)
  To: netdev; +Cc: jiri, roopa, simon.horman, nicolas.dichtel

From: Scott Feldman <sfeldma@gmail.com>

Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device.  If so, drop skb.  A
non-zero skb->offload_fwd_mark field is set by the switchdev port
driver/device on ingress to indicate the skb has already been forwarded by
the device to egress ports with matching dev->skb_mark.  The switchdev port
driver would assign a non-zero dev->offload_skb_mark for each device port
netdev during registration, for example.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 include/linux/netdevice.h |    6 ++++++
 include/linux/skbuff.h    |    9 ++++++++-
 net/core/dev.c            |   10 ++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 45cfd79..8364f29 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1456,6 +1456,8 @@ enum netdev_priv_flags {
  *
  *	@xps_maps:	XXX: need comments on this one
  *
+ *	@offload_fwd_mark:	Offload device fwding mark
+ *
  *	@trans_start:		Time (in jiffies) of last Tx
  *	@watchdog_timeo:	Represents the timeout that is used by
  *				the watchdog ( see dev_watchdog() )
@@ -1697,6 +1699,10 @@ struct net_device {
 	struct xps_dev_maps __rcu *xps_maps;
 #endif
 
+#ifdef CONFIG_NET_SWITCHDEV
+	u32			offload_fwd_mark;
+#endif
+
 	/* These may be needed for future network-power-down code. */
 
 	/*
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d6cdd6e..af7a096 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -506,6 +506,7 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
   *	@napi_id: id of the NAPI struct this skb came from
  *	@secmark: security marking
+ *	@offload_fwd_mark: fwding offload mark
  *	@mark: Generic packet mark
  *	@vlan_proto: vlan encapsulation protocol
  *	@vlan_tci: vlan tag control information
@@ -650,9 +651,15 @@ struct sk_buff {
 		unsigned int	sender_cpu;
 	};
 #endif
+	union {
 #ifdef CONFIG_NETWORK_SECMARK
-	__u32			secmark;
+		__u32		secmark;
+#endif
+#ifdef CONFIG_NET_SWITCHDEV
+		__u32		offload_fwd_mark;
 #endif
+	};
+
 	union {
 		__u32		mark;
 		__u32		reserved_tailroom;
diff --git a/net/core/dev.c b/net/core/dev.c
index 8810b6b..2ee15af 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3061,6 +3061,16 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 	else
 		skb_dst_force(skb);
 
+#ifdef CONFIG_NET_SWITCHDEV
+	/* Don't forward if offload device already forwarded */
+	if (skb->offload_fwd_mark &&
+	    skb->offload_fwd_mark == dev->offload_fwd_mark) {
+		consume_skb(skb);
+		rc = NET_XMIT_SUCCESS;
+		goto out;
+	}
+#endif
+
 	txq = netdev_pick_tx(dev, skb, accel_priv);
 	q = rcu_dereference_bh(txq->qdisc);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 2/5] net: add phys ID compare helper to test if two IDs are the same
  2015-07-19  1:24 [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding sfeldma
  2015-07-19  1:24 ` [PATCH net-next v3 1/5] net: don't reforward packets already forwarded by offload device sfeldma
@ 2015-07-19  1:24 ` sfeldma
  2015-07-19  1:24 ` [PATCH net-next v3 3/5] switchdev: add offload_fwd_mark generator helper sfeldma
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: sfeldma @ 2015-07-19  1:24 UTC (permalink / raw)
  To: netdev; +Cc: jiri, roopa, simon.horman, nicolas.dichtel

From: Scott Feldman <sfeldma@gmail.com>

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h |    7 +++++++
 net/switchdev/switchdev.c |    8 ++------
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8364f29..607b5f4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -766,6 +766,13 @@ struct netdev_phys_item_id {
 	unsigned char id_len;
 };
 
+static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
+					    struct netdev_phys_item_id *b)
+{
+	return a->id_len == b->id_len &&
+	       memcmp(a->id, b->id, a->id_len) == 0;
+}
+
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 9f2add3..4e5bba5 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -910,13 +910,9 @@ static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi)
 		if (switchdev_port_attr_get(dev, &attr))
 			return NULL;
 
-		if (nhsel > 0) {
-			if (prev_attr.u.ppid.id_len != attr.u.ppid.id_len)
+		if (nhsel > 0 &&
+		    !netdev_phys_item_id_same(&prev_attr.u.ppid, &attr.u.ppid))
 				return NULL;
-			if (memcmp(prev_attr.u.ppid.id, attr.u.ppid.id,
-				   attr.u.ppid.id_len))
-				return NULL;
-		}
 
 		prev_attr = attr;
 	}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 3/5] switchdev: add offload_fwd_mark generator helper
  2015-07-19  1:24 [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding sfeldma
  2015-07-19  1:24 ` [PATCH net-next v3 1/5] net: don't reforward packets already forwarded by offload device sfeldma
  2015-07-19  1:24 ` [PATCH net-next v3 2/5] net: add phys ID compare helper to test if two IDs are the same sfeldma
@ 2015-07-19  1:24 ` sfeldma
  2015-07-19  1:24 ` [PATCH net-next v3 4/5] rocker: add offload_fwd_mark support sfeldma
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: sfeldma @ 2015-07-19  1:24 UTC (permalink / raw)
  To: netdev; +Cc: jiri, roopa, simon.horman, nicolas.dichtel

From: Scott Feldman <sfeldma@gmail.com>

skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be
unique for device and may even be unique for a sub-set of ports within
device, so add switchdev helper function to generate unique marks based on
port's switch ID and group_ifindex.  group_ifindex would typically be the
container dev's ifindex, such as the bridge's ifindex.

The generator uses a global hash table to store offload_fwd_marks hashed by
{switch ID, group_ifindex} key.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
---
 include/net/switchdev.h   |    9 ++++
 net/switchdev/switchdev.c |  103 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 112 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index d5671f1..89da893 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -157,6 +157,9 @@ int switchdev_port_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
 int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
 			    struct net_device *dev,
 			    struct net_device *filter_dev, int idx);
+void switchdev_port_fwd_mark_set(struct net_device *dev,
+				 struct net_device *group_dev,
+				 bool joining);
 
 #else
 
@@ -271,6 +274,12 @@ static inline int switchdev_port_fdb_dump(struct sk_buff *skb,
 	return -EOPNOTSUPP;
 }
 
+static inline void switchdev_port_fwd_mark_set(struct net_device *dev,
+					       struct net_device *group_dev,
+					       bool joining)
+{
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 4e5bba5..33bafa2 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -1039,3 +1039,106 @@ void switchdev_fib_ipv4_abort(struct fib_info *fi)
 	fi->fib_net->ipv4.fib_offload_disabled = true;
 }
 EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_abort);
+
+static bool switchdev_port_same_parent_id(struct net_device *a,
+					  struct net_device *b)
+{
+	struct switchdev_attr a_attr = {
+		.id = SWITCHDEV_ATTR_PORT_PARENT_ID,
+		.flags = SWITCHDEV_F_NO_RECURSE,
+	};
+	struct switchdev_attr b_attr = {
+		.id = SWITCHDEV_ATTR_PORT_PARENT_ID,
+		.flags = SWITCHDEV_F_NO_RECURSE,
+	};
+
+	if (switchdev_port_attr_get(a, &a_attr) ||
+	    switchdev_port_attr_get(b, &b_attr))
+		return false;
+
+	return netdev_phys_item_id_same(&a_attr.u.ppid, &b_attr.u.ppid);
+}
+
+static u32 switchdev_port_fwd_mark_get(struct net_device *dev,
+				       struct net_device *group_dev)
+{
+	struct net_device *lower_dev;
+	struct list_head *iter;
+
+	netdev_for_each_lower_dev(group_dev, lower_dev, iter) {
+		if (lower_dev == dev)
+			continue;
+		if (switchdev_port_same_parent_id(dev, lower_dev))
+			return lower_dev->offload_fwd_mark;
+		return switchdev_port_fwd_mark_get(dev, lower_dev);
+	}
+
+	return dev->ifindex;
+}
+
+static void switchdev_port_fwd_mark_reset(struct net_device *group_dev,
+					  u32 old_mark, u32 *reset_mark)
+{
+	struct net_device *lower_dev;
+	struct list_head *iter;
+
+	netdev_for_each_lower_dev(group_dev, lower_dev, iter) {
+		if (lower_dev->offload_fwd_mark == old_mark) {
+			if (!*reset_mark)
+				*reset_mark = lower_dev->ifindex;
+			lower_dev->offload_fwd_mark = *reset_mark;
+		}
+		switchdev_port_fwd_mark_reset(lower_dev, old_mark, reset_mark);
+	}
+}
+
+/**
+ *	switchdev_port_fwd_mark_set - Set port offload forwarding mark
+ *
+ *	@dev: port device
+ *	@group_dev: containing device
+ *	@joining: true if dev is joining group; false if leaving group
+ *
+ *	An ungrouped port's offload mark is just its ifindex.  A grouped
+ *	port's (member of a bridge, for example) offload mark is the ifindex
+ *	of one of the ports in the group with the same parent (switch) ID.
+ *	Ports on the same device in the same group will have the same mark.
+ *
+ *	Example:
+ *
+ *		br0		ifindex=9
+ *		  sw1p1		ifindex=2	mark=2
+ *		  sw1p2		ifindex=3	mark=2
+ *		  sw2p1		ifindex=4	mark=5
+ *		  sw2p2		ifindex=5	mark=5
+ *
+ *	If sw2p2 leaves the bridge, we'll have:
+ *
+ *		br0		ifindex=9
+ *		  sw1p1		ifindex=2	mark=2
+ *		  sw1p2		ifindex=3	mark=2
+ *		  sw2p1		ifindex=4	mark=4
+ *		sw2p2		ifindex=5	mark=5
+ */
+void switchdev_port_fwd_mark_set(struct net_device *dev,
+				 struct net_device *group_dev,
+				 bool joining)
+{
+	u32 mark = dev->ifindex;
+	u32 reset_mark = 0;
+
+	if (group_dev && joining) {
+		mark = switchdev_port_fwd_mark_get(dev, group_dev);
+	} else if (group_dev && !joining) {
+		if (dev->offload_fwd_mark == mark)
+			/* Ohoh, this port was the mark reference port,
+			 * but it's leaving the group, so reset the
+			 * mark for the remaining ports in the group.
+			 */
+			switchdev_port_fwd_mark_reset(group_dev, mark,
+						      &reset_mark);
+	}
+
+	dev->offload_fwd_mark = mark;
+}
+EXPORT_SYMBOL_GPL(switchdev_port_fwd_mark_set);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 4/5] rocker: add offload_fwd_mark support
  2015-07-19  1:24 [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding sfeldma
                   ` (2 preceding siblings ...)
  2015-07-19  1:24 ` [PATCH net-next v3 3/5] switchdev: add offload_fwd_mark generator helper sfeldma
@ 2015-07-19  1:24 ` sfeldma
  2015-07-19  1:24 ` [PATCH net-next v3 5/5] switchdev: update documentation for offload_fwd_mark sfeldma
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: sfeldma @ 2015-07-19  1:24 UTC (permalink / raw)
  To: netdev; +Cc: jiri, roopa, simon.horman, nicolas.dichtel

From: Scott Feldman <sfeldma@gmail.com>

If device flags ingress packet as "fwd offload", mark the
skb->offlaod_fwd_mark using the ingress port's dev->offlaod_fwd_mark.  This
will be the hint to the kernel that this packet has already been forwarded
by device to egress ports matching skb->offlaod_fwd_mark.

For rocker, derive port dev->offlaod_fwd_mark based on device switch ID and
port ifindex.  If port is bridged, use the bridge ifindex rather than the
port ifindex.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/rocker/rocker.c |   11 +++++++++++
 drivers/net/ethernet/rocker/rocker.h |    1 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 9324283..0fdfa47 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4800,6 +4800,7 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
 	const struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
 	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
 	size_t rx_len;
+	u16 rx_flags = 0;
 
 	if (!skb)
 		return -ENOENT;
@@ -4807,6 +4808,8 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
 	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
 	if (!attrs[ROCKER_TLV_RX_FRAG_LEN])
 		return -EINVAL;
+	if (attrs[ROCKER_TLV_RX_FLAGS])
+		rx_flags = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FLAGS]);
 
 	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
 
@@ -4814,6 +4817,9 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
 	skb_put(skb, rx_len);
 	skb->protocol = eth_type_trans(skb, rocker_port->dev);
 
+	if (rx_flags & ROCKER_RX_FLAGS_FWD_OFFLOAD)
+		skb->offload_fwd_mark = rocker_port->dev->offload_fwd_mark;
+
 	rocker_port->dev->stats.rx_packets++;
 	rocker_port->dev->stats.rx_bytes += skb->len;
 
@@ -4951,6 +4957,8 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
 	}
 	rocker->ports[port_number] = rocker_port;
 
+	switchdev_port_fwd_mark_set(rocker_port->dev, NULL, false);
+
 	rocker_port_set_learning(rocker_port, SWITCHDEV_TRANS_NONE);
 
 	err = rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE, 0);
@@ -5230,6 +5238,7 @@ static int rocker_port_bridge_join(struct rocker_port *rocker_port,
 		rocker_port_internal_vlan_id_get(rocker_port, bridge->ifindex);
 
 	rocker_port->bridge_dev = bridge;
+	switchdev_port_fwd_mark_set(rocker_port->dev, bridge, true);
 
 	return rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
 				    untagged_vid, 0);
@@ -5250,6 +5259,8 @@ static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
 		rocker_port_internal_vlan_id_get(rocker_port,
 						 rocker_port->dev->ifindex);
 
+	switchdev_port_fwd_mark_set(rocker_port->dev, rocker_port->bridge_dev,
+				    false);
 	rocker_port->bridge_dev = NULL;
 
 	err = rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
index 08b2c3d..12490b2 100644
--- a/drivers/net/ethernet/rocker/rocker.h
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -246,6 +246,7 @@ enum {
 #define ROCKER_RX_FLAGS_TCP			BIT(5)
 #define ROCKER_RX_FLAGS_UDP			BIT(6)
 #define ROCKER_RX_FLAGS_TCP_UDP_CSUM_GOOD	BIT(7)
+#define ROCKER_RX_FLAGS_FWD_OFFLOAD		BIT(8)
 
 enum {
 	ROCKER_TLV_TX_UNSPEC,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 5/5] switchdev: update documentation for offload_fwd_mark
  2015-07-19  1:24 [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding sfeldma
                   ` (3 preceding siblings ...)
  2015-07-19  1:24 ` [PATCH net-next v3 4/5] rocker: add offload_fwd_mark support sfeldma
@ 2015-07-19  1:24 ` sfeldma
  2015-07-20 17:57 ` [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding Florian Fainelli
  2015-07-21  1:33 ` David Miller
  6 siblings, 0 replies; 10+ messages in thread
From: sfeldma @ 2015-07-19  1:24 UTC (permalink / raw)
  To: netdev; +Cc: jiri, roopa, simon.horman, nicolas.dichtel

From: Scott Feldman <sfeldma@gmail.com>

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
---
 Documentation/networking/switchdev.txt |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index c5d7ade..9825f32 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -279,8 +279,18 @@ and unknown unicast packets to all ports in domain, if allowed by port's
 current STP state.  The switch driver, knowing which ports are within which
 vlan L2 domain, can program the switch device for flooding.  The packet should
 also be sent to the port netdev for processing by the bridge driver.  The
-bridge should not reflood the packet to the same ports the device flooded.
-XXX: the mechanism to avoid duplicate flood packets is being discuseed.
+bridge should not reflood the packet to the same ports the device flooded,
+otherwise there will be duplicate packets on the wire.
+
+To avoid duplicate packets, the device/driver should mark a packet as already
+forwarded using skb->offload_fwd_mark.  The same mark is set on the device
+ports in the domain using dev->offload_fwd_mark.  If the skb->offload_fwd_mark
+is non-zero and matches the forwarding egress port's dev->skb_mark, the kernel
+will drop the skb right before transmit on the egress port, with the
+understanding that the device already forwarded the packet on same egress port.
+The driver can use switchdev_port_fwd_mark_set() to set a globally unique mark
+for port's dev->offload_fwd_mark, based on the port's parent ID (switch ID) and
+a group ifindex.
 
 It is possible for the switch device to not handle flooding and push the
 packets up to the bridge driver for flooding.  This is not ideal as the number
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 1/5] net: don't reforward packets already forwarded by offload device
  2015-07-19  1:24 ` [PATCH net-next v3 1/5] net: don't reforward packets already forwarded by offload device sfeldma
@ 2015-07-20  7:21   ` Nicolas Dichtel
  0 siblings, 0 replies; 10+ messages in thread
From: Nicolas Dichtel @ 2015-07-20  7:21 UTC (permalink / raw)
  To: sfeldma, netdev; +Cc: jiri, roopa, simon.horman

Le 19/07/2015 03:24, sfeldma@gmail.com a écrit :
> From: Scott Feldman <sfeldma@gmail.com>
>
> Just before queuing skb for xmit on port, check if skb has been marked by
> switchdev port driver as already fordwarded by device.  If so, drop skb.  A
> non-zero skb->offload_fwd_mark field is set by the switchdev port
> driver/device on ingress to indicate the skb has already been forwarded by
> the device to egress ports with matching dev->skb_mark.  The switchdev port
> driver would assign a non-zero dev->offload_skb_mark for each device port
> netdev during registration, for example.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Acked-by: Jiri Pirko <jiri@resnulli.us>
> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding
  2015-07-19  1:24 [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding sfeldma
                   ` (4 preceding siblings ...)
  2015-07-19  1:24 ` [PATCH net-next v3 5/5] switchdev: update documentation for offload_fwd_mark sfeldma
@ 2015-07-20 17:57 ` Florian Fainelli
  2015-07-20 18:08   ` Andrew Lunn
  2015-07-21  1:33 ` David Miller
  6 siblings, 1 reply; 10+ messages in thread
From: Florian Fainelli @ 2015-07-20 17:57 UTC (permalink / raw)
  To: sfeldma, netdev
  Cc: jiri, roopa, simon.horman, nicolas.dichtel, andrew,
	vivien.didelot, linux

Hi Scott,

On 18/07/15 18:24, sfeldma@gmail.com wrote:
> From: Scott Feldman <sfeldma@gmail.com>
> 
> v3:
> 
>  - Per Nicolas Dichtel review: remove errant empty union.
> 
> v2:
> 
>  - Per davem review: in sk_buff, union fwd_mark with secmark to save space
>    since features appear to be mutually exclusive.
>  - Per Simon Horman review:
>    - fix grammar in switchdev.txt wrt fwd_mark
>    - remove some unrelated changes that snuck in
> 
> v1:
> 
> This patchset was previously submitted as RFC.  No changes from the last
> version (v2) sent under RFC.  Including RFC version history here for reference.

This looks good to me, and it looks like this is also relevant for
DSA-driven switches, however I am not exactly clear on how we could
potentially use this on such switches.

For Broadcom tags, there is a reason code forwarded along with the tag
making it to the CPU, and the "mirror" bit sounds like something that
should dictate whether we should be setting the fwd_mark or not.

What does it look like on Marvell switches?

> 
> RFC v2:
> 
>  - s/fwd_mark/offload_fwd_mark
>  - use consume_skb rather than kfree_skb when dropping pkt on egress.
>  - Use Jiri's suggestion to use ifindex of one of the ports in a group
>    as the mark for all the ports in the group.  This can be done with
>    no additional storage (no hashtable from v1).  To pull it off, we
>    need some simple recursive routines to walk the netdev tree ensuring
>    all leaves in the tree (ports) in the same group (e.g. bridge)
>    belonging to the same switch device will have the same offload fwd mark.
>    Maybe someone sees a better design for the recusive routines?  They're
>    not too bad, and should cover the stacked driver cases.
> 
> RFC v1:
> 
> With switchdev support for offloading L2/L3 forwarding data path to a
> switch device, we have a general problem where both the device and the
> kernel may forward the packet, resulting in duplicate packets on the wire.
> Anytime a packet is forwarded by the device and a copy is sent to the CPU,
> there is potential for duplicate forwarding, as the kernel may also do a
> forwarding lookup and send the packet on the wire.
> 
> The specific problem this patch series is interested in solving is avoiding
> duplicate packets on bridged ports.  There was a previous RFC from Roopa
> (http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
> problem, but didn't solve the problem of mixed ports in the bridge from
> different devices; there was no way to exclude some ports from forwarding
> and include others.  This RFC solves that problem by tagging the ingressing
> packet with a unique mark, and then comparing the packet mark with the
> egress port mark, and skip forwarding when there is a match.  For the mixed
> ports bridge case, only those ports with matching marks are skipped.
> 
> The switchdev port driver must do two things:
> 
> 1) Generate a fwd_mark for each switch port, using some unique key of the
>    switch device (and optionally port).  This is done when the port netdev
>    is registered or if the port's group membership changes (joins/leaves
>    a bridge, for example).
> 
> 2) On packet ingress from port, mark the skb with the ingress port's
>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>    which were already forwarded by the device.  If the device does not
>    support such indication, all skbs can be marked, even if they're
>    local dst.
> 
> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
> tried using skb->mark for this purpose, but ebtables can overwrite the
> skb->mark before the bridge gets it, so that will not work.
> 
> In general, this fwd_mark can be used for any case where a packet is
> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
> re-forwarding the packet.  sFlow is another use-case that comes to mind,
> but I haven't explored the details.
> Scott Feldman (5):
>   net: don't reforward packets already forwarded by offload device
>   net: add phys ID compare helper to test if two IDs are the same
>   switchdev: add offload_fwd_mark generator helper
>   rocker: add offload_fwd_mark support
>   switchdev: update documentation for offload_fwd_mark
> 
>  Documentation/networking/switchdev.txt |   14 +++-
>  drivers/net/ethernet/rocker/rocker.c   |   11 ++++
>  drivers/net/ethernet/rocker/rocker.h   |    1 +
>  include/linux/netdevice.h              |   13 ++++
>  include/linux/skbuff.h                 |    9 ++-
>  include/net/switchdev.h                |    9 +++
>  net/core/dev.c                         |   10 +++
>  net/switchdev/switchdev.c              |  111 ++++++++++++++++++++++++++++++--
>  8 files changed, 169 insertions(+), 9 deletions(-)
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding
  2015-07-20 17:57 ` [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding Florian Fainelli
@ 2015-07-20 18:08   ` Andrew Lunn
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Lunn @ 2015-07-20 18:08 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: sfeldma, netdev, jiri, roopa, simon.horman, nicolas.dichtel,
	vivien.didelot, linux

> This looks good to me, and it looks like this is also relevant for
> DSA-driven switches, however I am not exactly clear on how we could
> potentially use this on such switches.
> 
> For Broadcom tags, there is a reason code forwarded along with the tag
> making it to the CPU, and the "mirror" bit sounds like something that
> should dictate whether we should be setting the fwd_mark or not.
> 
> What does it look like on Marvell switches?

Hi Florian

There are some bits indicating some status information. But these are
more to do with BDPU, IGMP/MLD and ARP.

I've not yet had time to dig into the details here. I suspect the D in
DSA is going to make this very interesting.

    Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding
  2015-07-19  1:24 [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding sfeldma
                   ` (5 preceding siblings ...)
  2015-07-20 17:57 ` [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding Florian Fainelli
@ 2015-07-21  1:33 ` David Miller
  6 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2015-07-21  1:33 UTC (permalink / raw)
  To: sfeldma; +Cc: netdev, jiri, roopa, simon.horman, nicolas.dichtel

From: sfeldma@gmail.com
Date: Sat, 18 Jul 2015 18:24:47 -0700

> With switchdev support for offloading L2/L3 forwarding data path to a
> switch device, we have a general problem where both the device and the
> kernel may forward the packet, resulting in duplicate packets on the wire.
> Anytime a packet is forwarded by the device and a copy is sent to the CPU,
> there is potential for duplicate forwarding, as the kernel may also do a
> forwarding lookup and send the packet on the wire.
> 
> The specific problem this patch series is interested in solving is avoiding
> duplicate packets on bridged ports.  There was a previous RFC from Roopa
> (http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
> problem, but didn't solve the problem of mixed ports in the bridge from
> different devices; there was no way to exclude some ports from forwarding
> and include others.  This RFC solves that problem by tagging the ingressing
> packet with a unique mark, and then comparing the packet mark with the
> egress port mark, and skip forwarding when there is a match.  For the mixed
> ports bridge case, only those ports with matching marks are skipped.
> 
> The switchdev port driver must do two things:
> 
> 1) Generate a fwd_mark for each switch port, using some unique key of the
>    switch device (and optionally port).  This is done when the port netdev
>    is registered or if the port's group membership changes (joins/leaves
>    a bridge, for example).
> 
> 2) On packet ingress from port, mark the skb with the ingress port's
>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>    which were already forwarded by the device.  If the device does not
>    support such indication, all skbs can be marked, even if they're
>    local dst.
> 
> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
> tried using skb->mark for this purpose, but ebtables can overwrite the
> skb->mark before the bridge gets it, so that will not work.
> 
> In general, this fwd_mark can be used for any case where a packet is
> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
> re-forwarding the packet.  sFlow is another use-case that comes to mind,
> but I haven't explored the details.

Series applied, thanks Scott.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-21  1:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-19  1:24 [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding sfeldma
2015-07-19  1:24 ` [PATCH net-next v3 1/5] net: don't reforward packets already forwarded by offload device sfeldma
2015-07-20  7:21   ` Nicolas Dichtel
2015-07-19  1:24 ` [PATCH net-next v3 2/5] net: add phys ID compare helper to test if two IDs are the same sfeldma
2015-07-19  1:24 ` [PATCH net-next v3 3/5] switchdev: add offload_fwd_mark generator helper sfeldma
2015-07-19  1:24 ` [PATCH net-next v3 4/5] rocker: add offload_fwd_mark support sfeldma
2015-07-19  1:24 ` [PATCH net-next v3 5/5] switchdev: update documentation for offload_fwd_mark sfeldma
2015-07-20 17:57 ` [PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding Florian Fainelli
2015-07-20 18:08   ` Andrew Lunn
2015-07-21  1:33 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.