All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
@ 2015-06-13 18:04 sfeldma
  2015-06-13 18:04 ` [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device sfeldma
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: sfeldma @ 2015-06-13 18:04 UTC (permalink / raw)
  To: netdev
  Cc: jiri, simon.horman, roopa, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

From: Scott Feldman <sfeldma@gmail.com>

(RFC because we're at rc7+ now)

With switchdev support for offloading L2/L3 forwarding data path to a
switch device, we have a general problem where both the device and the
kernel may forward the packet, resulting in duplicate packets on the wire.
Anytime a packet is forwarded by the device and a copy is sent to the CPU,
there is potential for duplicate forwarding, as the kernel may also do a
forwarding lookup and send the packet on the wire.

The specific problem this patch series is interested in solving is avoiding
duplicate packets on bridged ports.  There was a previous RFC from Roopa
(http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
problem, but didn't solve the problem of mixed ports in the bridge from
different devices; there was no way to exclude some ports from forwarding
and include others.  This RFC solves that problem by tagging the ingressing
packet with a unique mark, and then comparing the packet mark with the
egress port mark, and skip forwarding when there is a match.  For the mixed
ports bridge case, only those ports with matching marks are skipped.

The switchdev port driver must do two things:

1) Generate a fwd_mark for each switch port, using some unique key of the
   switch device (and optionally port).  This is a one-time operation done
   when port's netdev is setup.

2) On packet ingress from port, mark the skb with the ingress port's
   fwd_mark.  If the device supports it, it's useful to only mark skbs
   which were already forwarded by the device.  If the device does not
   support such indication, all skbs can be marked, even if they're
   local dst.

Two new 32-bit fields are added to struct sk_buff and struct netdevice to
hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
tried using skb->mark for this purpose, but ebtables can overwrite the
skb->mark before the bridge gets it, so that will not work.

In general, this fwd_mark can be used for any case where a packet is
forwarded by the device and a copy is sent to the CPU, to avoid the kernel
re-forwarding the packet.  sFlow is another use-case that comes to mind,
but I haven't explored the details.

Scott Feldman (4):
  net: don't reforward packets already forwarded by offload device
  switchdev: add fwd_mark generator helper
  rocker: add fwd_mark support
  switchdev: update documentation for fwd_mark

 Documentation/networking/switchdev.txt |   13 +++++-
 drivers/net/ethernet/rocker/rocker.c   |   24 +++++++++++
 drivers/net/ethernet/rocker/rocker.h   |    1 +
 include/linux/netdevice.h              |    6 +++
 include/linux/skbuff.h                 |    4 ++
 include/net/switchdev.h                |    6 +++
 net/core/dev.c                         |    9 ++++
 net/switchdev/switchdev.c              |   72 ++++++++++++++++++++++++++++++++
 8 files changed, 133 insertions(+), 2 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device
  2015-06-13 18:04 [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding sfeldma
@ 2015-06-13 18:04 ` sfeldma
  2015-06-14  6:51   ` Jiri Pirko
  2015-06-15 14:21   ` roopa
  2015-06-13 18:04 ` [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper sfeldma
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: sfeldma @ 2015-06-13 18:04 UTC (permalink / raw)
  To: netdev
  Cc: jiri, simon.horman, roopa, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

From: Scott Feldman <sfeldma@gmail.com>

Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device.  If so, drop skb.  A
non-zero skb->fwd_mark field is set by the switchdev port driver/device on
ingress to indicate the skb has already been forwarded by the device to
egress ports with matching dev->skb_mark.  The switchdev port driver would
assign a non-zero dev->skb_mark for each device port netdev during
registration, for example.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
---
 include/linux/netdevice.h |    6 ++++++
 include/linux/skbuff.h    |    4 ++++
 net/core/dev.c            |    9 +++++++++
 3 files changed, 19 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6f5f71f..181b08f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1444,6 +1444,8 @@ enum netdev_priv_flags {
  *
  *	@xps_maps:	XXX: need comments on this one
  *
+ *	@fwd_mark:		Offload device fwding mark
+ *
  *	@trans_start:		Time (in jiffies) of last Tx
  *	@watchdog_timeo:	Represents the timeout that is used by
  *				the watchdog ( see dev_watchdog() )
@@ -1681,6 +1683,10 @@ struct net_device {
 	struct xps_dev_maps __rcu *xps_maps;
 #endif
 
+#ifdef CONFIG_NET_SWITCHDEV
+	u32			fwd_mark;
+#endif
+
 	/* These may be needed for future network-power-down code. */
 
 	/*
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index cc612fc..ba98c05 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -501,6 +501,7 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
   *	@napi_id: id of the NAPI struct this skb came from
  *	@secmark: security marking
+ *	@fwd_mark: fwding offload mark
  *	@mark: Generic packet mark
  *	@vlan_proto: vlan encapsulation protocol
  *	@vlan_tci: vlan tag control information
@@ -648,6 +649,9 @@ struct sk_buff {
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
 #endif
+#ifdef CONFIG_NET_SWITCHDEV
+	__u32			fwd_mark;
+#endif
 	union {
 		__u32		mark;
 		__u32		reserved_tailroom;
diff --git a/net/core/dev.c b/net/core/dev.c
index 6778a99..558bf33 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3065,6 +3065,15 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 	else
 		skb_dst_force(skb);
 
+#ifdef CONFIG_NET_SWITCHDEV
+	/* Don't forward if offload device already forwarded */
+	if (skb->fwd_mark && skb->fwd_mark == dev->fwd_mark) {
+		kfree_skb(skb);
+		rc = NET_XMIT_SUCCESS;
+		goto out;
+	}
+#endif
+
 	txq = netdev_pick_tx(dev, skb, accel_priv);
 	q = rcu_dereference_bh(txq->qdisc);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper
  2015-06-13 18:04 [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding sfeldma
  2015-06-13 18:04 ` [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device sfeldma
@ 2015-06-13 18:04 ` sfeldma
  2015-06-14  6:56   ` Jiri Pirko
  2015-06-15 15:17   ` roopa
  2015-06-13 18:04 ` [RFC PATCH net-next 3/4] rocker: add fwd_mark support sfeldma
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: sfeldma @ 2015-06-13 18:04 UTC (permalink / raw)
  To: netdev
  Cc: jiri, simon.horman, roopa, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

From: Scott Feldman <sfeldma@gmail.com>

skb->fwd_mark and dev->fwd_mark are 32-bit and should be unique for device
and maybe even unique for a sub-set of ports within device, so add
switchdev helper function to generate unique marks based on driver-supplied
key.  Typically, the driver would use device switch ID for key, and maybe
additional fields in key for grouped ports such as bridge ifindex.  The key
can be of arbitrary length.

The generator uses a global hash table to store fwd_marks hashed by key.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
---
 include/net/switchdev.h   |    6 ++++
 net/switchdev/switchdev.c |   72 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 437f8fe..6eaceee 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -157,6 +157,7 @@ int switchdev_port_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
 int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
 			    struct net_device *dev,
 			    struct net_device *filter_dev, int idx);
+u32 switchdev_mark_get(void *key, size_t key_len);
 
 #else
 
@@ -271,6 +272,11 @@ static inline int switchdev_port_fdb_dump(struct sk_buff *skb,
 	return -EOPNOTSUPP;
 }
 
+static inline u32 switchdev_mark_get(void *key, size_t key_len)
+{
+	return 0;
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index a5d0f8e..9ca37b3 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -16,6 +16,8 @@
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
 #include <linux/if_bridge.h>
+#include <linux/hashtable.h>
+#include <linux/crc32.h>
 #include <net/ip_fib.h>
 #include <net/switchdev.h>
 
@@ -924,3 +926,73 @@ void switchdev_fib_ipv4_abort(struct fib_info *fi)
 	fi->fib_net->ipv4.fib_offload_disabled = true;
 }
 EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_abort);
+
+#define SWITCHDEV_MARK_HT_BITS	5
+static DEFINE_HASHTABLE(switchdev_mark_ht, SWITCHDEV_MARK_HT_BITS);
+static DEFINE_SPINLOCK(switchdev_mark_lock);
+static u32 switchdev_mark_next = 1;
+
+/**
+ *	switchdev_mark_get - Generate a unique mark for key
+ *
+ *	@key: key used to generate mark
+ *	@key_len: length of key in bytes
+ *
+ *	Returns unqiue 32-bit mark for given key, or 0 if error.
+ *	A small global hash table stores the marks for each key.
+ *	The length of the key and key contents are arbitrary.
+ *	The marks can be used, for example, to skb->fwd_mark a pkt
+ *	to associate the skb with a key.
+ */
+u32 switchdev_mark_get(void *key, size_t key_len)
+{
+	struct switchdev_mark_ht_entry {
+		struct hlist_node entry;
+		void *key;
+		size_t key_len;
+		u32 key_crc32;
+		u32 mark;
+	} *entry;
+	u32 key_crc32 = crc32(~0, key, key_len);
+	u32 mark = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&switchdev_mark_lock, flags);
+	hash_for_each_possible(switchdev_mark_ht, entry,
+			       entry, key_crc32) {
+		if (entry->key_len != key_len)
+			continue;
+		if (memcmp(entry->key, key, key_len) == 0) {
+			mark = entry->mark;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&switchdev_mark_lock, flags);
+
+	if (mark)
+		goto out;
+
+	entry = kmalloc(GFP_KERNEL, sizeof(*entry));
+	if (!entry)
+		goto out;
+
+	entry->key = kmalloc(GFP_KERNEL, key_len);
+	if (!entry->key) {
+		kfree(entry);
+		goto out;
+	}
+
+	memcpy(entry->key, key, key_len);
+	entry->key_len = key_len;
+	entry->key_crc32 = key_crc32;
+
+	spin_lock_irqsave(&switchdev_mark_lock, flags);
+	mark = switchdev_mark_next++;
+	entry->mark = mark;
+	hash_add(switchdev_mark_ht, &entry->entry, key_crc32);
+	spin_unlock_irqrestore(&switchdev_mark_lock, flags);
+
+out:
+	return mark;
+}
+EXPORT_SYMBOL_GPL(switchdev_mark_get);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 3/4] rocker: add fwd_mark support
  2015-06-13 18:04 [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding sfeldma
  2015-06-13 18:04 ` [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device sfeldma
  2015-06-13 18:04 ` [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper sfeldma
@ 2015-06-13 18:04 ` sfeldma
  2015-06-14  7:02   ` Jiri Pirko
  2015-06-13 18:04 ` [RFC PATCH net-next 4/4] switchdev: update documentation for fwd_mark sfeldma
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: sfeldma @ 2015-06-13 18:04 UTC (permalink / raw)
  To: netdev
  Cc: jiri, simon.horman, roopa, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

From: Scott Feldman <sfeldma@gmail.com>

If device flags ingress packet as "fwd offload", mark the skb->fwd_mark
using the ingress port's dev->fwd_mark.  This will be the hint to the
kernel that this packet has already been forwarded by device to egress
ports matching skb->fwd_mark.

For rocker, derive port dev->fwd_mark based on device switch ID.  If port
is bridged, include the bridge's ifindex in the key for deriving
dev->fwd_mark.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
---
 drivers/net/ethernet/rocker/rocker.c |   24 ++++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker.h |    1 +
 2 files changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index a06b93d..81407d8 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4701,6 +4701,7 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
 	const struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
 	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
 	size_t rx_len;
+	u16 rx_flags = 0;
 
 	if (!skb)
 		return -ENOENT;
@@ -4708,6 +4709,8 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
 	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
 	if (!attrs[ROCKER_TLV_RX_FRAG_LEN])
 		return -EINVAL;
+	if (attrs[ROCKER_TLV_RX_FLAGS])
+		rx_flags = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FLAGS]);
 
 	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
 
@@ -4715,6 +4718,9 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
 	skb_put(skb, rx_len);
 	skb->protocol = eth_type_trans(skb, rocker_port->dev);
 
+	if (rx_flags & ROCKER_RX_FLAGS_FWD_OFFLOAD)
+		skb->fwd_mark = rocker_port->dev->fwd_mark;
+
 	rocker_port->dev->stats.rx_packets++;
 	rocker_port->dev->stats.rx_bytes += skb->len;
 
@@ -4814,6 +4820,21 @@ static void rocker_port_dev_addr_init(struct rocker_port *rocker_port)
 	}
 }
 
+static void rocker_port_fwd_mark_set(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct {
+		u64 hw_id;
+		int ifindex;
+	} key = {
+		.hw_id = rocker->hw.id,
+		.ifindex = rocker_port_is_bridged(rocker_port) ?
+			   rocker_port->bridge_dev->ifindex : 0,
+	};
+
+	rocker_port->dev->fwd_mark = switchdev_mark_get(&key, sizeof(key));
+}
+
 static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
 {
 	const struct pci_dev *pdev = rocker->pdev;
@@ -4832,6 +4853,7 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
 	rocker_port->pport = port_number + 1;
 	rocker_port->brport_flags = BR_LEARNING | BR_LEARNING_SYNC;
 	INIT_LIST_HEAD(&rocker_port->trans_mem);
+	rocker_port_fwd_mark_set(rocker_port);
 
 	rocker_port_dev_addr_init(rocker_port);
 	dev->netdev_ops = &rocker_port_netdev_ops;
@@ -5131,6 +5153,7 @@ static int rocker_port_bridge_join(struct rocker_port *rocker_port,
 		rocker_port_internal_vlan_id_get(rocker_port, bridge->ifindex);
 
 	rocker_port->bridge_dev = bridge;
+	rocker_port_fwd_mark_set(rocker_port);
 
 	return rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
 				    untagged_vid, 0);
@@ -5152,6 +5175,7 @@ static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
 						 rocker_port->dev->ifindex);
 
 	rocker_port->bridge_dev = NULL;
+	rocker_port_fwd_mark_set(rocker_port);
 
 	err = rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
 				   untagged_vid, 0);
diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
index c61fbf9..f846c0d 100644
--- a/drivers/net/ethernet/rocker/rocker.h
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -245,6 +245,7 @@ enum {
 #define ROCKER_RX_FLAGS_TCP			BIT(5)
 #define ROCKER_RX_FLAGS_UDP			BIT(6)
 #define ROCKER_RX_FLAGS_TCP_UDP_CSUM_GOOD	BIT(7)
+#define ROCKER_RX_FLAGS_FWD_OFFLOAD		BIT(8)
 
 enum {
 	ROCKER_TLV_TX_UNSPEC,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH net-next 4/4] switchdev: update documentation for fwd_mark
  2015-06-13 18:04 [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding sfeldma
                   ` (2 preceding siblings ...)
  2015-06-13 18:04 ` [RFC PATCH net-next 3/4] rocker: add fwd_mark support sfeldma
@ 2015-06-13 18:04 ` sfeldma
  2015-06-15 13:54 ` [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding roopa
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: sfeldma @ 2015-06-13 18:04 UTC (permalink / raw)
  To: netdev
  Cc: jiri, simon.horman, roopa, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

From: Scott Feldman <sfeldma@gmail.com>

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
---
 Documentation/networking/switchdev.txt |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index da82cd7..d6a8695 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -286,8 +286,17 @@ and unknown unicast packets to all ports in domain, if allowed by port's
 current STP state.  The switch driver, knowing which ports are within which
 vlan L2 domain, can program the switch device for flooding.  The packet should
 also be sent to the port netdev for processing by the bridge driver.  The
-bridge should not reflood the packet to the same ports the device flooded.
-XXX: the mechanism to avoid duplicate flood packets is being discuseed.
+bridge should not reflood the packet to the same ports the device flooded,
+otherwise there will be duplicate packets on the wire.
+
+To avoid duplicate packets, the device/driver can mark a packet as already
+forwarded using skb->fwd_mark.  The same mark is set on the device ports in the
+domain using dev->fwd_mark.  If the skb->fwd_mark is non-zero and matches the
+forwarding egress port's dev->skb_mark, the kernel will drop the skb right
+before transmit on the egress port, with the understanding that the device
+already forwarded the packet on same egress port.  The driver can use
+switchdev_mark_get() to get a globally unique mark for egress port(s)'
+dev->fwd_mark, based on a driver/device-supplied key.
 
 It is possible for the switch device to not handle flooding and push the
 packets up to the bridge driver for flooding.  This is not ideal as the number
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device
  2015-06-13 18:04 ` [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device sfeldma
@ 2015-06-14  6:51   ` Jiri Pirko
  2015-06-15 14:21   ` roopa
  1 sibling, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2015-06-14  6:51 UTC (permalink / raw)
  To: sfeldma
  Cc: netdev, simon.horman, roopa, ronen.arad, john.r.fastabend,
	andrew, f.fainelli, linux, davidch, stephen

Sat, Jun 13, 2015 at 08:04:27PM CEST, sfeldma@gmail.com wrote:
>From: Scott Feldman <sfeldma@gmail.com>
>
>Just before queuing skb for xmit on port, check if skb has been marked by
>switchdev port driver as already fordwarded by device.  If so, drop skb.  A
>non-zero skb->fwd_mark field is set by the switchdev port driver/device on
>ingress to indicate the skb has already been forwarded by the device to
>egress ports with matching dev->skb_mark.  The switchdev port driver would
>assign a non-zero dev->skb_mark for each device port netdev during
>registration, for example.
>
>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>---
> include/linux/netdevice.h |    6 ++++++
> include/linux/skbuff.h    |    4 ++++
> net/core/dev.c            |    9 +++++++++
> 3 files changed, 19 insertions(+)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 6f5f71f..181b08f 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1444,6 +1444,8 @@ enum netdev_priv_flags {
>  *
>  *	@xps_maps:	XXX: need comments on this one
>  *
>+ *	@fwd_mark:		Offload device fwding mark
>+ *

How about to say this is an offloading/switchdev stuff?
"offload_fwd_mark" ?


<snip>

>diff --git a/net/core/dev.c b/net/core/dev.c
>index 6778a99..558bf33 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -3065,6 +3065,15 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
> 	else
> 		skb_dst_force(skb);
> 
>+#ifdef CONFIG_NET_SWITCHDEV
>+	/* Don't forward if offload device already forwarded */
>+	if (skb->fwd_mark && skb->fwd_mark == dev->fwd_mark) {
>+		kfree_skb(skb);

You should use consume_skb here to do not indicate skb was dropped after
failure.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper
  2015-06-13 18:04 ` [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper sfeldma
@ 2015-06-14  6:56   ` Jiri Pirko
  2015-06-14 17:50     ` Scott Feldman
  2015-06-15 15:17   ` roopa
  1 sibling, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2015-06-14  6:56 UTC (permalink / raw)
  To: sfeldma
  Cc: netdev, simon.horman, roopa, ronen.arad, john.r.fastabend,
	andrew, f.fainelli, linux, davidch, stephen

Sat, Jun 13, 2015 at 08:04:28PM CEST, sfeldma@gmail.com wrote:
>From: Scott Feldman <sfeldma@gmail.com>
>
>skb->fwd_mark and dev->fwd_mark are 32-bit and should be unique for device
>and maybe even unique for a sub-set of ports within device, so add
>switchdev helper function to generate unique marks based on driver-supplied
>key.  Typically, the driver would use device switch ID for key, and maybe
>additional fields in key for grouped ports such as bridge ifindex.  The key
>can be of arbitrary length.
>
>The generator uses a global hash table to store fwd_marks hashed by key.
>
>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>---
> include/net/switchdev.h   |    6 ++++
> net/switchdev/switchdev.c |   72 +++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 78 insertions(+)
>
>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>index 437f8fe..6eaceee 100644
>--- a/include/net/switchdev.h
>+++ b/include/net/switchdev.h
>@@ -157,6 +157,7 @@ int switchdev_port_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
> int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
> 			    struct net_device *dev,
> 			    struct net_device *filter_dev, int idx);
>+u32 switchdev_mark_get(void *key, size_t key_len);
> 
> #else
> 
>@@ -271,6 +272,11 @@ static inline int switchdev_port_fdb_dump(struct sk_buff *skb,
> 	return -EOPNOTSUPP;
> }
> 
>+static inline u32 switchdev_mark_get(void *key, size_t key_len)
>+{
>+	return 0;
>+}
>+
> #endif
> 
> #endif /* _LINUX_SWITCHDEV_H_ */
>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>index a5d0f8e..9ca37b3 100644
>--- a/net/switchdev/switchdev.c
>+++ b/net/switchdev/switchdev.c
>@@ -16,6 +16,8 @@
> #include <linux/notifier.h>
> #include <linux/netdevice.h>
> #include <linux/if_bridge.h>
>+#include <linux/hashtable.h>
>+#include <linux/crc32.h>
> #include <net/ip_fib.h>
> #include <net/switchdev.h>
> 
>@@ -924,3 +926,73 @@ void switchdev_fib_ipv4_abort(struct fib_info *fi)
> 	fi->fib_net->ipv4.fib_offload_disabled = true;
> }
> EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_abort);
>+
>+#define SWITCHDEV_MARK_HT_BITS	5
>+static DEFINE_HASHTABLE(switchdev_mark_ht, SWITCHDEV_MARK_HT_BITS);
>+static DEFINE_SPINLOCK(switchdev_mark_lock);
>+static u32 switchdev_mark_next = 1;
>+
>+/**
>+ *	switchdev_mark_get - Generate a unique mark for key
>+ *
>+ *	@key: key used to generate mark
>+ *	@key_len: length of key in bytes
>+ *
>+ *	Returns unqiue 32-bit mark for given key, or 0 if error.
                ^^^^^^
		typo.


>+ *	A small global hash table stores the marks for each key.
>+ *	The length of the key and key contents are arbitrary.
>+ *	The marks can be used, for example, to skb->fwd_mark a pkt
>+ *	to associate the skb with a key.
>+ */
>+u32 switchdev_mark_get(void *key, size_t key_len)
>+{
>+	struct switchdev_mark_ht_entry {
>+		struct hlist_node entry;
>+		void *key;
>+		size_t key_len;
>+		u32 key_crc32;
>+		u32 mark;
>+	} *entry;
>+	u32 key_crc32 = crc32(~0, key, key_len);
>+	u32 mark = 0;
>+	unsigned long flags;
>+
>+	spin_lock_irqsave(&switchdev_mark_lock, flags);

I fail to see why _irqsave variant is needed here.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 3/4] rocker: add fwd_mark support
  2015-06-13 18:04 ` [RFC PATCH net-next 3/4] rocker: add fwd_mark support sfeldma
@ 2015-06-14  7:02   ` Jiri Pirko
  2015-06-14 18:00     ` Scott Feldman
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2015-06-14  7:02 UTC (permalink / raw)
  To: sfeldma
  Cc: netdev, simon.horman, roopa, ronen.arad, john.r.fastabend,
	andrew, f.fainelli, linux, davidch, stephen

Sat, Jun 13, 2015 at 08:04:29PM CEST, sfeldma@gmail.com wrote:
>From: Scott Feldman <sfeldma@gmail.com>
>
>If device flags ingress packet as "fwd offload", mark the skb->fwd_mark
>using the ingress port's dev->fwd_mark.  This will be the hint to the
>kernel that this packet has already been forwarded by device to egress
>ports matching skb->fwd_mark.
>
>For rocker, derive port dev->fwd_mark based on device switch ID.  If port
>is bridged, include the bridge's ifindex in the key for deriving
>dev->fwd_mark.
>
>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>---
> drivers/net/ethernet/rocker/rocker.c |   24 ++++++++++++++++++++++++
> drivers/net/ethernet/rocker/rocker.h |    1 +
> 2 files changed, 25 insertions(+)
>
>diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
>index a06b93d..81407d8 100644
>--- a/drivers/net/ethernet/rocker/rocker.c
>+++ b/drivers/net/ethernet/rocker/rocker.c
>@@ -4701,6 +4701,7 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
> 	const struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
> 	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
> 	size_t rx_len;
>+	u16 rx_flags = 0;
> 
> 	if (!skb)
> 		return -ENOENT;
>@@ -4708,6 +4709,8 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
> 	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
> 	if (!attrs[ROCKER_TLV_RX_FRAG_LEN])
> 		return -EINVAL;
>+	if (attrs[ROCKER_TLV_RX_FLAGS])
>+		rx_flags = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FLAGS]);
> 
> 	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
> 
>@@ -4715,6 +4718,9 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
> 	skb_put(skb, rx_len);
> 	skb->protocol = eth_type_trans(skb, rocker_port->dev);
> 
>+	if (rx_flags & ROCKER_RX_FLAGS_FWD_OFFLOAD)

Nasty :) I'm not sure that this can be easily done for real hw. But
anyway, that is a driver problem.


>+		skb->fwd_mark = rocker_port->dev->fwd_mark;
>+
> 	rocker_port->dev->stats.rx_packets++;
> 	rocker_port->dev->stats.rx_bytes += skb->len;
> 
>@@ -4814,6 +4820,21 @@ static void rocker_port_dev_addr_init(struct rocker_port *rocker_port)
> 	}
> }
> 
>+static void rocker_port_fwd_mark_set(struct rocker_port *rocker_port)
>+{
>+	struct rocker *rocker = rocker_port->rocker;
>+	struct {
>+		u64 hw_id;
>+		int ifindex;
>+	} key = {
>+		.hw_id = rocker->hw.id,
>+		.ifindex = rocker_port_is_bridged(rocker_port) ?
>+			   rocker_port->bridge_dev->ifindex : 0,
>+	};
>+
>+	rocker_port->dev->fwd_mark = switchdev_mark_get(&key, sizeof(key));
>+}
>+
> static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
> {
> 	const struct pci_dev *pdev = rocker->pdev;
>@@ -4832,6 +4853,7 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
> 	rocker_port->pport = port_number + 1;
> 	rocker_port->brport_flags = BR_LEARNING | BR_LEARNING_SYNC;
> 	INIT_LIST_HEAD(&rocker_port->trans_mem);
>+	rocker_port_fwd_mark_set(rocker_port);
> 
> 	rocker_port_dev_addr_init(rocker_port);
> 	dev->netdev_ops = &rocker_port_netdev_ops;
>@@ -5131,6 +5153,7 @@ static int rocker_port_bridge_join(struct rocker_port *rocker_port,
> 		rocker_port_internal_vlan_id_get(rocker_port, bridge->ifindex);
> 
> 	rocker_port->bridge_dev = bridge;
>+	rocker_port_fwd_mark_set(rocker_port);
> 
> 	return rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
> 				    untagged_vid, 0);
>@@ -5152,6 +5175,7 @@ static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
> 						 rocker_port->dev->ifindex);
> 
> 	rocker_port->bridge_dev = NULL;
>+	rocker_port_fwd_mark_set(rocker_port);


Hmm, wouldn't it make sense to move fwd_mark setting from drivers to
switchdev core? This will look the same for all drivers, so I think it
should be in common swithdev core, always generated according to "switch id".

So all what driver needs to do is to call some helper like:
static inline void switchdev_skb_fwd_mark(struct sk_buff *skb,
					  struct net_device *dev) {
	skb->fwd_mark = dev->fwd_mark;
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper
  2015-06-14  6:56   ` Jiri Pirko
@ 2015-06-14 17:50     ` Scott Feldman
  2015-06-15  5:46       ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Scott Feldman @ 2015-06-14 17:50 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, simon.horman, Roopa Prabhu, Arad, Ronen, Fastabend,
	John R, andrew, Florian Fainelli, Guenter Roeck, davidch,
	stephen

On Sat, Jun 13, 2015 at 11:56 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Sat, Jun 13, 2015 at 08:04:28PM CEST, sfeldma@gmail.com wrote:
>>From: Scott Feldman <sfeldma@gmail.com>
>>
>>skb->fwd_mark and dev->fwd_mark are 32-bit and should be unique for device
>>and maybe even unique for a sub-set of ports within device, so add
>>switchdev helper function to generate unique marks based on driver-supplied
>>key.  Typically, the driver would use device switch ID for key, and maybe
>>additional fields in key for grouped ports such as bridge ifindex.  The key
>>can be of arbitrary length.
>>
>>The generator uses a global hash table to store fwd_marks hashed by key.
>>
>>Signed-off-by: Scott Feldman <sfeldma@gmail.com>

<snip>

>>+u32 switchdev_mark_get(void *key, size_t key_len)
>>+{
>>+      struct switchdev_mark_ht_entry {
>>+              struct hlist_node entry;
>>+              void *key;
>>+              size_t key_len;
>>+              u32 key_crc32;
>>+              u32 mark;
>>+      } *entry;
>>+      u32 key_crc32 = crc32(~0, key, key_len);
>>+      u32 mark = 0;
>>+      unsigned long flags;
>>+
>>+      spin_lock_irqsave(&switchdev_mark_lock, flags);
>
> I fail to see why _irqsave variant is needed here.

I don't know what context caller is in, so using most conservative
spinlock.  Is there a better way?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 3/4] rocker: add fwd_mark support
  2015-06-14  7:02   ` Jiri Pirko
@ 2015-06-14 18:00     ` Scott Feldman
  2015-06-15  5:49       ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Scott Feldman @ 2015-06-14 18:00 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, simon.horman, Roopa Prabhu, Arad, Ronen, Fastabend,
	John R, andrew, Florian Fainelli, Guenter Roeck, davidch,
	stephen

On Sun, Jun 14, 2015 at 12:02 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Sat, Jun 13, 2015 at 08:04:29PM CEST, sfeldma@gmail.com wrote:
>>From: Scott Feldman <sfeldma@gmail.com>
>>
>>If device flags ingress packet as "fwd offload", mark the skb->fwd_mark
>>using the ingress port's dev->fwd_mark.  This will be the hint to the
>>kernel that this packet has already been forwarded by device to egress
>>ports matching skb->fwd_mark.
>>
>>For rocker, derive port dev->fwd_mark based on device switch ID.  If port
>>is bridged, include the bridge's ifindex in the key for deriving
>>dev->fwd_mark.
>>
>>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>---
>> drivers/net/ethernet/rocker/rocker.c |   24 ++++++++++++++++++++++++
>> drivers/net/ethernet/rocker/rocker.h |    1 +
>> 2 files changed, 25 insertions(+)
>>
>>diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
>>index a06b93d..81407d8 100644
>>--- a/drivers/net/ethernet/rocker/rocker.c
>>+++ b/drivers/net/ethernet/rocker/rocker.c
>>@@ -4701,6 +4701,7 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
>>       const struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
>>       struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
>>       size_t rx_len;
>>+      u16 rx_flags = 0;
>>
>>       if (!skb)
>>               return -ENOENT;
>>@@ -4708,6 +4709,8 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
>>       rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
>>       if (!attrs[ROCKER_TLV_RX_FRAG_LEN])
>>               return -EINVAL;
>>+      if (attrs[ROCKER_TLV_RX_FLAGS])
>>+              rx_flags = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FLAGS]);
>>
>>       rocker_dma_rx_ring_skb_unmap(rocker, attrs);
>>
>>@@ -4715,6 +4718,9 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
>>       skb_put(skb, rx_len);
>>       skb->protocol = eth_type_trans(skb, rocker_port->dev);
>>
>>+      if (rx_flags & ROCKER_RX_FLAGS_FWD_OFFLOAD)
>
> Nasty :) I'm not sure that this can be easily done for real hw. But
> anyway, that is a driver problem.

Why is this nasty?

Some real HW can do it, some can't.  Some that can go into detail of
the nature of the fwd.  Rocker can do it because we know when we copy
a pkt to the CPU port that's also been forwarded.

>
>>+              skb->fwd_mark = rocker_port->dev->fwd_mark;
>>+
>>       rocker_port->dev->stats.rx_packets++;
>>       rocker_port->dev->stats.rx_bytes += skb->len;
>>
>>@@ -4814,6 +4820,21 @@ static void rocker_port_dev_addr_init(struct rocker_port *rocker_port)
>>       }
>> }
>>
>>+static void rocker_port_fwd_mark_set(struct rocker_port *rocker_port)
>>+{
>>+      struct rocker *rocker = rocker_port->rocker;
>>+      struct {
>>+              u64 hw_id;
>>+              int ifindex;
>>+      } key = {
>>+              .hw_id = rocker->hw.id,
>>+              .ifindex = rocker_port_is_bridged(rocker_port) ?
>>+                         rocker_port->bridge_dev->ifindex : 0,
>>+      };
>>+
>>+      rocker_port->dev->fwd_mark = switchdev_mark_get(&key, sizeof(key));
>>+}
>>+
>> static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
>> {
>>       const struct pci_dev *pdev = rocker->pdev;
>>@@ -4832,6 +4853,7 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
>>       rocker_port->pport = port_number + 1;
>>       rocker_port->brport_flags = BR_LEARNING | BR_LEARNING_SYNC;
>>       INIT_LIST_HEAD(&rocker_port->trans_mem);
>>+      rocker_port_fwd_mark_set(rocker_port);
>>
>>       rocker_port_dev_addr_init(rocker_port);
>>       dev->netdev_ops = &rocker_port_netdev_ops;
>>@@ -5131,6 +5153,7 @@ static int rocker_port_bridge_join(struct rocker_port *rocker_port,
>>               rocker_port_internal_vlan_id_get(rocker_port, bridge->ifindex);
>>
>>       rocker_port->bridge_dev = bridge;
>>+      rocker_port_fwd_mark_set(rocker_port);
>>
>>       return rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
>>                                   untagged_vid, 0);
>>@@ -5152,6 +5175,7 @@ static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
>>                                                rocker_port->dev->ifindex);
>>
>>       rocker_port->bridge_dev = NULL;
>>+      rocker_port_fwd_mark_set(rocker_port);
>
>
> Hmm, wouldn't it make sense to move fwd_mark setting from drivers to
> switchdev core? This will look the same for all drivers, so I think it
> should be in common swithdev core, always generated according to "switch id".

This is setting the fwd_mark on the port during init, not the skb
during run-time.

> So all what driver needs to do is to call some helper like:
> static inline void switchdev_skb_fwd_mark(struct sk_buff *skb,
>                                           struct net_device *dev) {
>         skb->fwd_mark = dev->fwd_mark;
> }
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper
  2015-06-14 17:50     ` Scott Feldman
@ 2015-06-15  5:46       ` Jiri Pirko
  2015-06-15 13:52         ` Scott Feldman
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2015-06-15  5:46 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, simon.horman, Roopa Prabhu, Arad, Ronen, Fastabend,
	John R, andrew, Florian Fainelli, Guenter Roeck, davidch,
	stephen

Sun, Jun 14, 2015 at 07:50:13PM CEST, sfeldma@gmail.com wrote:
>On Sat, Jun 13, 2015 at 11:56 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Sat, Jun 13, 2015 at 08:04:28PM CEST, sfeldma@gmail.com wrote:
>>>From: Scott Feldman <sfeldma@gmail.com>
>>>
>>>skb->fwd_mark and dev->fwd_mark are 32-bit and should be unique for device
>>>and maybe even unique for a sub-set of ports within device, so add
>>>switchdev helper function to generate unique marks based on driver-supplied
>>>key.  Typically, the driver would use device switch ID for key, and maybe
>>>additional fields in key for grouped ports such as bridge ifindex.  The key
>>>can be of arbitrary length.
>>>
>>>The generator uses a global hash table to store fwd_marks hashed by key.
>>>
>>>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>
><snip>
>
>>>+u32 switchdev_mark_get(void *key, size_t key_len)
>>>+{
>>>+      struct switchdev_mark_ht_entry {
>>>+              struct hlist_node entry;
>>>+              void *key;
>>>+              size_t key_len;
>>>+              u32 key_crc32;
>>>+              u32 mark;
>>>+      } *entry;
>>>+      u32 key_crc32 = crc32(~0, key, key_len);
>>>+      u32 mark = 0;
>>>+      unsigned long flags;
>>>+
>>>+      spin_lock_irqsave(&switchdev_mark_lock, flags);
>>
>> I fail to see why _irqsave variant is needed here.
>
>I don't know what context caller is in, so using most conservative
>spinlock.  Is there a better way?

I don't see why would someone call this from irq.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 3/4] rocker: add fwd_mark support
  2015-06-14 18:00     ` Scott Feldman
@ 2015-06-15  5:49       ` Jiri Pirko
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2015-06-15  5:49 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, simon.horman, Roopa Prabhu, Arad, Ronen, Fastabend,
	John R, andrew, Florian Fainelli, Guenter Roeck, davidch,
	stephen

Sun, Jun 14, 2015 at 08:00:11PM CEST, sfeldma@gmail.com wrote:
>On Sun, Jun 14, 2015 at 12:02 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Sat, Jun 13, 2015 at 08:04:29PM CEST, sfeldma@gmail.com wrote:
>>>From: Scott Feldman <sfeldma@gmail.com>
>>>
>>>If device flags ingress packet as "fwd offload", mark the skb->fwd_mark
>>>using the ingress port's dev->fwd_mark.  This will be the hint to the
>>>kernel that this packet has already been forwarded by device to egress
>>>ports matching skb->fwd_mark.
>>>
>>>For rocker, derive port dev->fwd_mark based on device switch ID.  If port
>>>is bridged, include the bridge's ifindex in the key for deriving
>>>dev->fwd_mark.
>>>
>>>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>>---
>>> drivers/net/ethernet/rocker/rocker.c |   24 ++++++++++++++++++++++++
>>> drivers/net/ethernet/rocker/rocker.h |    1 +
>>> 2 files changed, 25 insertions(+)
>>>
>>>diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
>>>index a06b93d..81407d8 100644
>>>--- a/drivers/net/ethernet/rocker/rocker.c
>>>+++ b/drivers/net/ethernet/rocker/rocker.c
>>>@@ -4701,6 +4701,7 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
>>>       const struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
>>>       struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
>>>       size_t rx_len;
>>>+      u16 rx_flags = 0;
>>>
>>>       if (!skb)
>>>               return -ENOENT;
>>>@@ -4708,6 +4709,8 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
>>>       rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
>>>       if (!attrs[ROCKER_TLV_RX_FRAG_LEN])
>>>               return -EINVAL;
>>>+      if (attrs[ROCKER_TLV_RX_FLAGS])
>>>+              rx_flags = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FLAGS]);
>>>
>>>       rocker_dma_rx_ring_skb_unmap(rocker, attrs);
>>>
>>>@@ -4715,6 +4718,9 @@ static int rocker_port_rx_proc(const struct rocker *rocker,
>>>       skb_put(skb, rx_len);
>>>       skb->protocol = eth_type_trans(skb, rocker_port->dev);
>>>
>>>+      if (rx_flags & ROCKER_RX_FLAGS_FWD_OFFLOAD)
>>
>> Nasty :) I'm not sure that this can be easily done for real hw. But
>> anyway, that is a driver problem.
>
>Why is this nasty?
>
>Some real HW can do it, some can't.  Some that can go into detail of
>the nature of the fwd.  Rocker can do it because we know when we copy
>a pkt to the CPU port that's also been forwarded.

Yep, rocker has advantage in this.


>
>>
>>>+              skb->fwd_mark = rocker_port->dev->fwd_mark;
>>>+
>>>       rocker_port->dev->stats.rx_packets++;
>>>       rocker_port->dev->stats.rx_bytes += skb->len;
>>>
>>>@@ -4814,6 +4820,21 @@ static void rocker_port_dev_addr_init(struct rocker_port *rocker_port)
>>>       }
>>> }
>>>
>>>+static void rocker_port_fwd_mark_set(struct rocker_port *rocker_port)
>>>+{
>>>+      struct rocker *rocker = rocker_port->rocker;
>>>+      struct {
>>>+              u64 hw_id;
>>>+              int ifindex;
>>>+      } key = {
>>>+              .hw_id = rocker->hw.id,
>>>+              .ifindex = rocker_port_is_bridged(rocker_port) ?
>>>+                         rocker_port->bridge_dev->ifindex : 0,
>>>+      };
>>>+
>>>+      rocker_port->dev->fwd_mark = switchdev_mark_get(&key, sizeof(key));
>>>+}
>>>+
>>> static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
>>> {
>>>       const struct pci_dev *pdev = rocker->pdev;
>>>@@ -4832,6 +4853,7 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
>>>       rocker_port->pport = port_number + 1;
>>>       rocker_port->brport_flags = BR_LEARNING | BR_LEARNING_SYNC;
>>>       INIT_LIST_HEAD(&rocker_port->trans_mem);
>>>+      rocker_port_fwd_mark_set(rocker_port);
>>>
>>>       rocker_port_dev_addr_init(rocker_port);
>>>       dev->netdev_ops = &rocker_port_netdev_ops;
>>>@@ -5131,6 +5153,7 @@ static int rocker_port_bridge_join(struct rocker_port *rocker_port,
>>>               rocker_port_internal_vlan_id_get(rocker_port, bridge->ifindex);
>>>
>>>       rocker_port->bridge_dev = bridge;
>>>+      rocker_port_fwd_mark_set(rocker_port);
>>>
>>>       return rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE,
>>>                                   untagged_vid, 0);
>>>@@ -5152,6 +5175,7 @@ static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
>>>                                                rocker_port->dev->ifindex);
>>>
>>>       rocker_port->bridge_dev = NULL;
>>>+      rocker_port_fwd_mark_set(rocker_port);
>>
>>
>> Hmm, wouldn't it make sense to move fwd_mark setting from drivers to
>> switchdev core? This will look the same for all drivers, so I think it
>> should be in common swithdev core, always generated according to "switch id".
>
>This is setting the fwd_mark on the port during init, not the skb
>during run-time.


I know this particular code is called during initialization.

But what do you say about my idea to move common code sreating fwd_mark and
actual skb marking into swtichdev code?


>
>> So all what driver needs to do is to call some helper like:
>> static inline void switchdev_skb_fwd_mark(struct sk_buff *skb,
>>                                           struct net_device *dev) {
>>         skb->fwd_mark = dev->fwd_mark;
>> }
>>
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper
  2015-06-15  5:46       ` Jiri Pirko
@ 2015-06-15 13:52         ` Scott Feldman
  2015-06-15 14:09           ` Sergei Shtylyov
  0 siblings, 1 reply; 26+ messages in thread
From: Scott Feldman @ 2015-06-15 13:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, simon.horman, Roopa Prabhu, Arad, Ronen, Fastabend,
	John R, andrew, Florian Fainelli, Guenter Roeck, davidch,
	stephen

On Sun, Jun 14, 2015 at 10:46 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Sun, Jun 14, 2015 at 07:50:13PM CEST, sfeldma@gmail.com wrote:
>>On Sat, Jun 13, 2015 at 11:56 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Sat, Jun 13, 2015 at 08:04:28PM CEST, sfeldma@gmail.com wrote:
>>>>From: Scott Feldman <sfeldma@gmail.com>
>>>>
>>>>skb->fwd_mark and dev->fwd_mark are 32-bit and should be unique for device
>>>>and maybe even unique for a sub-set of ports within device, so add
>>>>switchdev helper function to generate unique marks based on driver-supplied
>>>>key.  Typically, the driver would use device switch ID for key, and maybe
>>>>additional fields in key for grouped ports such as bridge ifindex.  The key
>>>>can be of arbitrary length.
>>>>
>>>>The generator uses a global hash table to store fwd_marks hashed by key.
>>>>
>>>>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>
>><snip>
>>
>>>>+u32 switchdev_mark_get(void *key, size_t key_len)
>>>>+{
>>>>+      struct switchdev_mark_ht_entry {
>>>>+              struct hlist_node entry;
>>>>+              void *key;
>>>>+              size_t key_len;
>>>>+              u32 key_crc32;
>>>>+              u32 mark;
>>>>+      } *entry;
>>>>+      u32 key_crc32 = crc32(~0, key, key_len);
>>>>+      u32 mark = 0;
>>>>+      unsigned long flags;
>>>>+
>>>>+      spin_lock_irqsave(&switchdev_mark_lock, flags);
>>>
>>> I fail to see why _irqsave variant is needed here.
>>
>>I don't know what context caller is in, so using most conservative
>>spinlock.  Is there a better way?
>
> I don't see why would someone call this from irq.

Ok, good point, I'll adjust to spin_lock.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-13 18:04 [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding sfeldma
                   ` (3 preceding siblings ...)
  2015-06-13 18:04 ` [RFC PATCH net-next 4/4] switchdev: update documentation for fwd_mark sfeldma
@ 2015-06-15 13:54 ` roopa
  2015-06-15 14:23 ` roopa
  2015-06-15 23:25 ` David Miller
  6 siblings, 0 replies; 26+ messages in thread
From: roopa @ 2015-06-15 13:54 UTC (permalink / raw)
  To: sfeldma
  Cc: netdev, jiri, simon.horman, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

On 6/13/15, 11:04 AM, sfeldma@gmail.com wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> (RFC because we're at rc7+ now)
>
> With switchdev support for offloading L2/L3 forwarding data path to a
> switch device, we have a general problem where both the device and the
> kernel may forward the packet, resulting in duplicate packets on the wire.
> Anytime a packet is forwarded by the device and a copy is sent to the CPU,
> there is potential for duplicate forwarding, as the kernel may also do a
> forwarding lookup and send the packet on the wire.
>
> The specific problem this patch series is interested in solving is avoiding
> duplicate packets on bridged ports.  There was a previous RFC from Roopa
> (http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
> problem, but didn't solve the problem of mixed ports in the bridge from
> different devices; there was no way to exclude some ports from forwarding
> and include others.  This RFC solves that problem by tagging the ingressing
> packet with a unique mark, and then comparing the packet mark with the
> egress port mark, and skip forwarding when there is a match.  For the mixed
> ports bridge case, only those ports with matching marks are skipped.
>
> The switchdev port driver must do two things:
>
> 1) Generate a fwd_mark for each switch port, using some unique key of the
>     switch device (and optionally port).  This is a one-time operation done
>     when port's netdev is setup.
>
> 2) On packet ingress from port, mark the skb with the ingress port's
>     fwd_mark.  If the device supports it, it's useful to only mark skbs
>     which were already forwarded by the device.  If the device does not
>     support such indication, all skbs can be marked, even if they're
>     local dst.
>
> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
> tried using skb->mark for this purpose, but ebtables can overwrite the
> skb->mark before the bridge gets it, so that will not work.
>
> In general, this fwd_mark can be used for any case where a packet is
> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
> re-forwarding the packet.  sFlow is another use-case that comes to mind,
> but I haven't explored the details.
>
>
scott, nicely done!. I like the patch series. Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper
  2015-06-15 13:52         ` Scott Feldman
@ 2015-06-15 14:09           ` Sergei Shtylyov
  0 siblings, 0 replies; 26+ messages in thread
From: Sergei Shtylyov @ 2015-06-15 14:09 UTC (permalink / raw)
  To: Scott Feldman, Jiri Pirko
  Cc: Netdev, simon.horman, Roopa Prabhu, Arad, Ronen, Fastabend,
	John R, andrew, Florian Fainelli, Guenter Roeck, davidch,
	stephen

Hello.

On 6/15/2015 4:52 PM, Scott Feldman wrote:

>>>>> From: Scott Feldman <sfeldma@gmail.com>

>>>>> skb->fwd_mark and dev->fwd_mark are 32-bit and should be unique for device
>>>>> and maybe even unique for a sub-set of ports within device, so add
>>>>> switchdev helper function to generate unique marks based on driver-supplied
>>>>> key.  Typically, the driver would use device switch ID for key, and maybe
>>>>> additional fields in key for grouped ports such as bridge ifindex.  The key
>>>>> can be of arbitrary length.

>>>>> The generator uses a global hash table to store fwd_marks hashed by key.

>>>>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>

>>> <snip>

>>>>> +u32 switchdev_mark_get(void *key, size_t key_len)
>>>>> +{
>>>>> +      struct switchdev_mark_ht_entry {
>>>>> +              struct hlist_node entry;
>>>>> +              void *key;
>>>>> +              size_t key_len;
>>>>> +              u32 key_crc32;
>>>>> +              u32 mark;
>>>>> +      } *entry;
>>>>> +      u32 key_crc32 = crc32(~0, key, key_len);
>>>>> +      u32 mark = 0;
>>>>> +      unsigned long flags;
>>>>> +
>>>>> +      spin_lock_irqsave(&switchdev_mark_lock, flags);

>>>> I fail to see why _irqsave variant is needed here.

>>> I don't know what context caller is in, so using most conservative
>>> spinlock.  Is there a better way?

>> I don't see why would someone call this from irq.

> Ok, good point, I'll adjust to spin_lock.

     I guess spi_lock_irq() is what you meant. Disabling IRQs when called from 
the hardirq context made no sense since hardirq handlrs are executed with IRQs 
disabled anyway.

WBR, Sergei

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device
  2015-06-13 18:04 ` [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device sfeldma
  2015-06-14  6:51   ` Jiri Pirko
@ 2015-06-15 14:21   ` roopa
  1 sibling, 0 replies; 26+ messages in thread
From: roopa @ 2015-06-15 14:21 UTC (permalink / raw)
  To: sfeldma
  Cc: netdev, jiri, simon.horman, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

On 6/13/15, 11:04 AM, sfeldma@gmail.com wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 6f5f71f..181b08f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1444,6 +1444,8 @@ enum netdev_priv_flags {
>    *
>    *	@xps_maps:	XXX: need comments on this one
>    *
> + *	@fwd_mark:		Offload device fwding mark
> + *
>    *	@trans_start:		Time (in jiffies) of last Tx
>    *	@watchdog_timeo:	Represents the timeout that is used by
>    *				the watchdog ( see dev_watchdog() )
> @@ -1681,6 +1683,10 @@ struct net_device {
>   	struct xps_dev_maps __rcu *xps_maps;
>   #endif
>   
> +#ifdef CONFIG_NET_SWITCHDEV
> +	u32			fwd_mark;
> +#endif
> +
>   
scott, Just calling out a few other use-cases to make sure this flag can 
be used in a generic way in the future:
- avoid updating acl counters in the kernel when hardware acl counters 
have already accounted for this packet
- ability to conditionally clear the mark  by other protocols  in the 
kernel before xmit: one use case is the bridge driver allowing software 
forwarding of some protocol packets when hardware forwarding of these is 
turned off eg igmp (As I understand, This is possible with your patch 
series if we need to in the future)

PS: Maybe the name should be generic enough to suite other use-cases in 
the future (i like the name i had hw_offloaded better ;).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-13 18:04 [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding sfeldma
                   ` (4 preceding siblings ...)
  2015-06-15 13:54 ` [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding roopa
@ 2015-06-15 14:23 ` roopa
  2015-06-15 23:25 ` David Miller
  6 siblings, 0 replies; 26+ messages in thread
From: roopa @ 2015-06-15 14:23 UTC (permalink / raw)
  To: sfeldma
  Cc: netdev, jiri, simon.horman, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

On 6/13/15, 11:04 AM, sfeldma@gmail.com wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> (RFC because we're at rc7+ now)
>
> With switchdev support for offloading L2/L3 forwarding data path to a
> switch device, we have a general problem where both the device and the
> kernel may forward the packet, resulting in duplicate packets on the wire.
> Anytime a packet is forwarded by the device and a copy is sent to the CPU,
> there is potential for duplicate forwarding, as the kernel may also do a
> forwarding lookup and send the packet on the wire.
>
> The specific problem this patch series is interested in solving is avoiding
> duplicate packets on bridged ports.  There was a previous RFC from Roopa
> (http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
> problem, but didn't solve the problem of mixed ports in the bridge from
> different devices; there was no way to exclude some ports from forwarding
> and include others.  This RFC solves that problem by tagging the ingressing
> packet with a unique mark, and then comparing the packet mark with the
> egress port mark, and skip forwarding when there is a match.  For the mixed
> ports bridge case, only those ports with matching marks are skipped.
>
> The switchdev port driver must do two things:
>
> 1) Generate a fwd_mark for each switch port, using some unique key of the
>     switch device (and optionally port).  This is a one-time operation done
>     when port's netdev is setup.
>
> 2) On packet ingress from port, mark the skb with the ingress port's
>     fwd_mark.  If the device supports it, it's useful to only mark skbs
>     which were already forwarded by the device.  If the device does not
>     support such indication, all skbs can be marked, even if they're
>     local dst.
>
> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
> tried using skb->mark for this purpose, but ebtables can overwrite the
> skb->mark before the bridge gets it, so that will not work.
>
> In general, this fwd_mark can be used for any case where a packet is
> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
> re-forwarding the packet.  sFlow is another use-case that comes to mind,
> but I haven't explored the details.
>
>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper
  2015-06-13 18:04 ` [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper sfeldma
  2015-06-14  6:56   ` Jiri Pirko
@ 2015-06-15 15:17   ` roopa
  1 sibling, 0 replies; 26+ messages in thread
From: roopa @ 2015-06-15 15:17 UTC (permalink / raw)
  To: sfeldma
  Cc: netdev, jiri, simon.horman, ronen.arad, john.r.fastabend, andrew,
	f.fainelli, linux, davidch, stephen

On 6/13/15, 11:04 AM, sfeldma@gmail.com wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> skb->fwd_mark and dev->fwd_mark are 32-bit and should be unique for device
> and maybe even unique for a sub-set of ports within device, so add
> switchdev helper function to generate unique marks based on driver-supplied
> key.  Typically, the driver would use device switch ID for key, and maybe
> additional fields in key for grouped ports such as bridge ifindex.  The key
> can be of arbitrary length.
>
> The generator uses a global hash table to store fwd_marks hashed by key.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-13 18:04 [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding sfeldma
                   ` (5 preceding siblings ...)
  2015-06-15 14:23 ` roopa
@ 2015-06-15 23:25 ` David Miller
  2015-06-16  6:04   ` Jiri Pirko
  6 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2015-06-15 23:25 UTC (permalink / raw)
  To: sfeldma
  Cc: netdev, jiri, simon.horman, roopa, ronen.arad, john.r.fastabend,
	andrew, f.fainelli, linux, davidch, stephen

From: sfeldma@gmail.com
Date: Sat, 13 Jun 2015 11:04:26 -0700

> The switchdev port driver must do two things:
> 
> 1) Generate a fwd_mark for each switch port, using some unique key of the
>    switch device (and optionally port).  This is a one-time operation done
>    when port's netdev is setup.
> 
> 2) On packet ingress from port, mark the skb with the ingress port's
>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>    which were already forwarded by the device.  If the device does not
>    support such indication, all skbs can be marked, even if they're
>    local dst.
> 
> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
> tried using skb->mark for this purpose, but ebtables can overwrite the
> skb->mark before the bridge gets it, so that will not work.
> 
> In general, this fwd_mark can be used for any case where a packet is
> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
> re-forwarding the packet.  sFlow is another use-case that comes to mind,
> but I haven't explored the details.

Generally I'm against adding new fields fo sk_buff but I'm trying to be
open minded. :-)

About the per-device fwd_mark, if the key attribute is uniqueness,
let's just do it right and use something like lib/idr.c to generate
truly unique indices at probe time for all devices using this
facility.  I like that better than having them be unique by a happy
accident.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-15 23:25 ` David Miller
@ 2015-06-16  6:04   ` Jiri Pirko
  2015-06-16 16:47     ` Scott Feldman
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2015-06-16  6:04 UTC (permalink / raw)
  To: David Miller
  Cc: sfeldma, netdev, simon.horman, roopa, ronen.arad,
	john.r.fastabend, andrew, f.fainelli, linux, davidch, stephen

Tue, Jun 16, 2015 at 01:25:51AM CEST, davem@davemloft.net wrote:
>From: sfeldma@gmail.com
>Date: Sat, 13 Jun 2015 11:04:26 -0700
>
>> The switchdev port driver must do two things:
>> 
>> 1) Generate a fwd_mark for each switch port, using some unique key of the
>>    switch device (and optionally port).  This is a one-time operation done
>>    when port's netdev is setup.
>> 
>> 2) On packet ingress from port, mark the skb with the ingress port's
>>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>>    which were already forwarded by the device.  If the device does not
>>    support such indication, all skbs can be marked, even if they're
>>    local dst.
>> 
>> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
>> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
>> tried using skb->mark for this purpose, but ebtables can overwrite the
>> skb->mark before the bridge gets it, so that will not work.
>> 
>> In general, this fwd_mark can be used for any case where a packet is
>> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
>> re-forwarding the packet.  sFlow is another use-case that comes to mind,
>> but I haven't explored the details.
>
>Generally I'm against adding new fields fo sk_buff but I'm trying to be
>open minded. :-)
>
>About the per-device fwd_mark, if the key attribute is uniqueness,
>let's just do it right and use something like lib/idr.c to generate
>truly unique indices at probe time for all devices using this
>facility.  I like that better than having them be unique by a happy
>accident.

We already have per-device uniqueue key. dev->ifindex.
That should be good for fwd_mark purposes I believe.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-16  6:04   ` Jiri Pirko
@ 2015-06-16 16:47     ` Scott Feldman
  2015-06-16 21:11       ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Scott Feldman @ 2015-06-16 16:47 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, Netdev, simon.horman, Roopa Prabhu, Arad, Ronen,
	Fastabend, John R, andrew, Florian Fainelli, Guenter Roeck,
	davidch, stephen

On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Jun 16, 2015 at 01:25:51AM CEST, davem@davemloft.net wrote:
>>From: sfeldma@gmail.com
>>Date: Sat, 13 Jun 2015 11:04:26 -0700
>>
>>> The switchdev port driver must do two things:
>>>
>>> 1) Generate a fwd_mark for each switch port, using some unique key of the
>>>    switch device (and optionally port).  This is a one-time operation done
>>>    when port's netdev is setup.
>>>
>>> 2) On packet ingress from port, mark the skb with the ingress port's
>>>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>>>    which were already forwarded by the device.  If the device does not
>>>    support such indication, all skbs can be marked, even if they're
>>>    local dst.
>>>
>>> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
>>> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
>>> tried using skb->mark for this purpose, but ebtables can overwrite the
>>> skb->mark before the bridge gets it, so that will not work.
>>>
>>> In general, this fwd_mark can be used for any case where a packet is
>>> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
>>> re-forwarding the packet.  sFlow is another use-case that comes to mind,
>>> but I haven't explored the details.
>>
>>Generally I'm against adding new fields fo sk_buff but I'm trying to be
>>open minded. :-)
>>
>>About the per-device fwd_mark, if the key attribute is uniqueness,
>>let's just do it right and use something like lib/idr.c to generate
>>truly unique indices at probe time for all devices using this
>>facility.  I like that better than having them be unique by a happy
>>accident.
>
> We already have per-device uniqueue key. dev->ifindex.
> That should be good for fwd_mark purposes I believe.

It would be great if we could use dev->index, but fwd_mark is really
to mark device ports that belong to a group.  In the case of a bridge,
the device ports in the bridge should all have the same mark.  And
another device's ports in the same bridge would have a different mark
(so we can't use the bridge's dev->ifindex).  On ingress, the skb is
marked with the ingress port's mark.  If the skb is to be forwarded
out an egress port, the skb mark is compared with egress port's mark.
If marks compare, then the device has already forwarded the pkt so the
kernel can consume_skb to avoid duplicate pkts on the wire.

So what we need is a unique mark for device ports within a fwding
group, such as a bridge.

I'm investigating Dave's suggestion to use IDR.  I think this will work...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-16 16:47     ` Scott Feldman
@ 2015-06-16 21:11       ` Jiri Pirko
  2015-06-16 23:53         ` Scott Feldman
  2015-06-17 10:23         ` Jamal Hadi Salim
  0 siblings, 2 replies; 26+ messages in thread
From: Jiri Pirko @ 2015-06-16 21:11 UTC (permalink / raw)
  To: Scott Feldman
  Cc: David Miller, Netdev, simon.horman, Roopa Prabhu, Arad, Ronen,
	Fastabend, John R, andrew, Florian Fainelli, Guenter Roeck,
	davidch, stephen

Tue, Jun 16, 2015 at 06:47:47PM CEST, sfeldma@gmail.com wrote:
>On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Jun 16, 2015 at 01:25:51AM CEST, davem@davemloft.net wrote:
>>>From: sfeldma@gmail.com
>>>Date: Sat, 13 Jun 2015 11:04:26 -0700
>>>
>>>> The switchdev port driver must do two things:
>>>>
>>>> 1) Generate a fwd_mark for each switch port, using some unique key of the
>>>>    switch device (and optionally port).  This is a one-time operation done
>>>>    when port's netdev is setup.
>>>>
>>>> 2) On packet ingress from port, mark the skb with the ingress port's
>>>>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>>>>    which were already forwarded by the device.  If the device does not
>>>>    support such indication, all skbs can be marked, even if they're
>>>>    local dst.
>>>>
>>>> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
>>>> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
>>>> tried using skb->mark for this purpose, but ebtables can overwrite the
>>>> skb->mark before the bridge gets it, so that will not work.
>>>>
>>>> In general, this fwd_mark can be used for any case where a packet is
>>>> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
>>>> re-forwarding the packet.  sFlow is another use-case that comes to mind,
>>>> but I haven't explored the details.
>>>
>>>Generally I'm against adding new fields fo sk_buff but I'm trying to be
>>>open minded. :-)
>>>
>>>About the per-device fwd_mark, if the key attribute is uniqueness,
>>>let's just do it right and use something like lib/idr.c to generate
>>>truly unique indices at probe time for all devices using this
>>>facility.  I like that better than having them be unique by a happy
>>>accident.
>>
>> We already have per-device uniqueue key. dev->ifindex.
>> That should be good for fwd_mark purposes I believe.
>
>It would be great if we could use dev->index, but fwd_mark is really
>to mark device ports that belong to a group.  In the case of a bridge,
>the device ports in the bridge should all have the same mark.  And
>another device's ports in the same bridge would have a different mark
>(so we can't use the bridge's dev->ifindex).  On ingress, the skb is
>marked with the ingress port's mark.  If the skb is to be forwarded
>out an egress port, the skb mark is compared with egress port's mark.
>If marks compare, then the device has already forwarded the pkt so the
>kernel can consume_skb to avoid duplicate pkts on the wire.
>
>So what we need is a unique mark for device ports within a fwding
>group, such as a bridge.

Yep, have a group of netdevs, pick one of them and use it's ifindex for
the whole group.


>
>I'm investigating Dave's suggestion to use IDR.  I think this will work...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-16 21:11       ` Jiri Pirko
@ 2015-06-16 23:53         ` Scott Feldman
  2015-06-17  6:30           ` Jiri Pirko
  2015-06-17 10:23         ` Jamal Hadi Salim
  1 sibling, 1 reply; 26+ messages in thread
From: Scott Feldman @ 2015-06-16 23:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, Netdev, simon.horman, Roopa Prabhu, Arad, Ronen,
	Fastabend, John R, andrew, Florian Fainelli, Guenter Roeck,
	davidch, stephen

On Tue, Jun 16, 2015 at 2:11 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Jun 16, 2015 at 06:47:47PM CEST, sfeldma@gmail.com wrote:
>>On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Tue, Jun 16, 2015 at 01:25:51AM CEST, davem@davemloft.net wrote:
>>>>From: sfeldma@gmail.com
>>>>Date: Sat, 13 Jun 2015 11:04:26 -0700
>>>>
>>>>> The switchdev port driver must do two things:
>>>>>
>>>>> 1) Generate a fwd_mark for each switch port, using some unique key of the
>>>>>    switch device (and optionally port).  This is a one-time operation done
>>>>>    when port's netdev is setup.
>>>>>
>>>>> 2) On packet ingress from port, mark the skb with the ingress port's
>>>>>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>>>>>    which were already forwarded by the device.  If the device does not
>>>>>    support such indication, all skbs can be marked, even if they're
>>>>>    local dst.
>>>>>
>>>>> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
>>>>> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
>>>>> tried using skb->mark for this purpose, but ebtables can overwrite the
>>>>> skb->mark before the bridge gets it, so that will not work.
>>>>>
>>>>> In general, this fwd_mark can be used for any case where a packet is
>>>>> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
>>>>> re-forwarding the packet.  sFlow is another use-case that comes to mind,
>>>>> but I haven't explored the details.
>>>>
>>>>Generally I'm against adding new fields fo sk_buff but I'm trying to be
>>>>open minded. :-)
>>>>
>>>>About the per-device fwd_mark, if the key attribute is uniqueness,
>>>>let's just do it right and use something like lib/idr.c to generate
>>>>truly unique indices at probe time for all devices using this
>>>>facility.  I like that better than having them be unique by a happy
>>>>accident.
>>>
>>> We already have per-device uniqueue key. dev->ifindex.
>>> That should be good for fwd_mark purposes I believe.
>>
>>It would be great if we could use dev->index, but fwd_mark is really
>>to mark device ports that belong to a group.  In the case of a bridge,
>>the device ports in the bridge should all have the same mark.  And
>>another device's ports in the same bridge would have a different mark
>>(so we can't use the bridge's dev->ifindex).  On ingress, the skb is
>>marked with the ingress port's mark.  If the skb is to be forwarded
>>out an egress port, the skb mark is compared with egress port's mark.
>>If marks compare, then the device has already forwarded the pkt so the
>>kernel can consume_skb to avoid duplicate pkts on the wire.
>>
>>So what we need is a unique mark for device ports within a fwding
>>group, such as a bridge.
>
> Yep, have a group of netdevs, pick one of them and use it's ifindex for
> the whole group.

That will not work because ports from two switches in the same bridge
need different marks...that's how the bridge knows which ports to fwd
and which ones to skip.

Example:

br0
   sw1p1 (mark=3)
   sw1p2 (mark=3)
   sw2p2 (mark=7)
   sw2p2 (mark=7)

Two switches, sw1 and sw2, in bridge br0.  Let's say sw1 receives an
unknown unicast pkt. It'll flood the pkt to its other switch ports
(sw1p2, in this case) and send a copy to the CPU (the bridge), with
skb->mark=3.  The bridge will flood pkt to sw1p2, sw2p1, and sw2p2,
but our little check in dev.c will drop the pkt to sw1p2 just before
egress onto wire.  Pkt goes out on sw2p1 and sw2p2.  This is why we
can't use just the br0 ifindex to generate the mark.  We need
something unique about the switch port and the bridge ifindex to give
us a mark for a port.

I'm going to send v2 which uses switch port ppid + some group ifindex
to generate port mark.  Group ifindex could be bond ifindex or bridge
ifindex or even zero if ports are L3 router ports and we want to prune
any L3 forwarding by the kernel.  There is some flexibility here,
depending on what we're trying to do.  We have the switch port ppid,
so we might as well use it.

-scott

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-16 23:53         ` Scott Feldman
@ 2015-06-17  6:30           ` Jiri Pirko
  2015-06-17  7:02             ` Scott Feldman
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2015-06-17  6:30 UTC (permalink / raw)
  To: Scott Feldman
  Cc: David Miller, Netdev, simon.horman, Roopa Prabhu, Arad, Ronen,
	Fastabend, John R, andrew, Florian Fainelli, Guenter Roeck,
	davidch, stephen

Wed, Jun 17, 2015 at 01:53:56AM CEST, sfeldma@gmail.com wrote:
>On Tue, Jun 16, 2015 at 2:11 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Jun 16, 2015 at 06:47:47PM CEST, sfeldma@gmail.com wrote:
>>>On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Tue, Jun 16, 2015 at 01:25:51AM CEST, davem@davemloft.net wrote:
>>>>>From: sfeldma@gmail.com
>>>>>Date: Sat, 13 Jun 2015 11:04:26 -0700
>>>>>
>>>>>> The switchdev port driver must do two things:
>>>>>>
>>>>>> 1) Generate a fwd_mark for each switch port, using some unique key of the
>>>>>>    switch device (and optionally port).  This is a one-time operation done
>>>>>>    when port's netdev is setup.
>>>>>>
>>>>>> 2) On packet ingress from port, mark the skb with the ingress port's
>>>>>>    fwd_mark.  If the device supports it, it's useful to only mark skbs
>>>>>>    which were already forwarded by the device.  If the device does not
>>>>>>    support such indication, all skbs can be marked, even if they're
>>>>>>    local dst.
>>>>>>
>>>>>> Two new 32-bit fields are added to struct sk_buff and struct netdevice to
>>>>>> hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
>>>>>> tried using skb->mark for this purpose, but ebtables can overwrite the
>>>>>> skb->mark before the bridge gets it, so that will not work.
>>>>>>
>>>>>> In general, this fwd_mark can be used for any case where a packet is
>>>>>> forwarded by the device and a copy is sent to the CPU, to avoid the kernel
>>>>>> re-forwarding the packet.  sFlow is another use-case that comes to mind,
>>>>>> but I haven't explored the details.
>>>>>
>>>>>Generally I'm against adding new fields fo sk_buff but I'm trying to be
>>>>>open minded. :-)
>>>>>
>>>>>About the per-device fwd_mark, if the key attribute is uniqueness,
>>>>>let's just do it right and use something like lib/idr.c to generate
>>>>>truly unique indices at probe time for all devices using this
>>>>>facility.  I like that better than having them be unique by a happy
>>>>>accident.
>>>>
>>>> We already have per-device uniqueue key. dev->ifindex.
>>>> That should be good for fwd_mark purposes I believe.
>>>
>>>It would be great if we could use dev->index, but fwd_mark is really
>>>to mark device ports that belong to a group.  In the case of a bridge,
>>>the device ports in the bridge should all have the same mark.  And
>>>another device's ports in the same bridge would have a different mark
>>>(so we can't use the bridge's dev->ifindex).  On ingress, the skb is
>>>marked with the ingress port's mark.  If the skb is to be forwarded
>>>out an egress port, the skb mark is compared with egress port's mark.
>>>If marks compare, then the device has already forwarded the pkt so the
>>>kernel can consume_skb to avoid duplicate pkts on the wire.
>>>
>>>So what we need is a unique mark for device ports within a fwding
>>>group, such as a bridge.
>>
>> Yep, have a group of netdevs, pick one of them and use it's ifindex for
>> the whole group.
>
>That will not work because ports from two switches in the same bridge
>need different marks...that's how the bridge knows which ports to fwd
>and which ones to skip.
>
>Example:
>
>br0
>   sw1p1 (mark=3)
>   sw1p2 (mark=3)
>   sw2p2 (mark=7)
>   sw2p2 (mark=7)
>
>Two switches, sw1 and sw2, in bridge br0.  Let's say sw1 receives an
>unknown unicast pkt. It'll flood the pkt to its other switch ports
>(sw1p2, in this case) and send a copy to the CPU (the bridge), with
>skb->mark=3.  The bridge will flood pkt to sw1p2, sw2p1, and sw2p2,
>but our little check in dev.c will drop the pkt to sw1p2 just before
>egress onto wire.  Pkt goes out on sw2p1 and sw2p2.  This is why we
>can't use just the br0 ifindex to generate the mark.  We need
>something unique about the switch port and the bridge ifindex to give
>us a mark for a port.


Scott, again, I'm not talking about br0 ifindex! I'm talking about
ifindexes of the port netdevs.

br0
  sw1p1  ifindex=1
  sw1p2  ifindex=2
  sw2p1  ifindex=3
  sw2p2  ifindex=4


now we have 2 groups, one in each switch. So for sw1 group (sw1p1 and
sw1p2) we select mark one of the ifindexes, so either "1" or "2"
For second group, from sw2 we select "3" or "4". So for example:
br0
  sw1p1  mark=1
  sw1p2  mark=1
  sw2p1  mark=4
  sw2p2  mark=4

This is generic, usable for every group.


>
>I'm going to send v2 which uses switch port ppid + some group ifindex
>to generate port mark.  Group ifindex could be bond ifindex or bridge
>ifindex or even zero if ports are L3 router ports and we want to prune
>any L3 forwarding by the kernel.  There is some flexibility here,
>depending on what we're trying to do.  We have the switch port ppid,
>so we might as well use it.
>
>-scott

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-17  6:30           ` Jiri Pirko
@ 2015-06-17  7:02             ` Scott Feldman
  0 siblings, 0 replies; 26+ messages in thread
From: Scott Feldman @ 2015-06-17  7:02 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, Netdev, simon.horman, Roopa Prabhu, Arad, Ronen,
	Fastabend, John R, andrew, Florian Fainelli, Guenter Roeck,
	davidch, stephen

On Tue, Jun 16, 2015 at 11:30 PM, Jiri Pirko <jiri@resnulli.us> wrote:

> Scott, again, I'm not talking about br0 ifindex! I'm talking about
> ifindexes of the port netdevs.
>
> br0
>   sw1p1  ifindex=1
>   sw1p2  ifindex=2
>   sw2p1  ifindex=3
>   sw2p2  ifindex=4
>
>
> now we have 2 groups, one in each switch. So for sw1 group (sw1p1 and
> sw1p2) we select mark one of the ifindexes, so either "1" or "2"
> For second group, from sw2 we select "3" or "4". So for example:
> br0
>   sw1p1  mark=1
>   sw1p2  mark=1
>   sw2p1  mark=4
>   sw2p2  mark=4
>
> This is generic, usable for every group.

Oooh, that's good.  I like it.  If the port is not grouped, set
mark=ifindex.  Or mark=0 to have a dflt group for ungrouped ports.

Do you see a way to code this up?  The only thing that comes to mind
is for the swX driver to keep track of mark picked for each bridge
brY, but I was hoping you have an idea how to do this without any
storage?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding
  2015-06-16 21:11       ` Jiri Pirko
  2015-06-16 23:53         ` Scott Feldman
@ 2015-06-17 10:23         ` Jamal Hadi Salim
  1 sibling, 0 replies; 26+ messages in thread
From: Jamal Hadi Salim @ 2015-06-17 10:23 UTC (permalink / raw)
  To: Jiri Pirko, Scott Feldman
  Cc: David Miller, Netdev, simon.horman, Roopa Prabhu, Arad, Ronen,
	Fastabend, John R, andrew, Florian Fainelli, Guenter Roeck,
	davidch, stephen

On 06/16/15 17:11, Jiri Pirko wrote:
> Tue, Jun 16, 2015 at 06:47:47PM CEST, sfeldma@gmail.com wrote:
>> On Mon, Jun 15, 2015 at 11:04 PM, Jiri Pirko <jiri@resnulli.us> wrote:

>> So what we need is a unique mark for device ports within a fwding
>> group, such as a bridge.
>
> Yep, have a group of netdevs, pick one of them and use it's ifindex for
> the whole group.
>

Have you look at the netdev group attribute?
Typically user space sets the value and then you can perform operations
per group of netdevs. You could argue that the field is for user space
to dictate in some policy therefore may be slightly unsuited
for what you are after maybe?

cheers,
jamal

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-06-17 10:23 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-13 18:04 [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding sfeldma
2015-06-13 18:04 ` [RFC PATCH net-next 1/4] net: don't reforward packets already forwarded by offload device sfeldma
2015-06-14  6:51   ` Jiri Pirko
2015-06-15 14:21   ` roopa
2015-06-13 18:04 ` [RFC PATCH net-next 2/4] switchdev: add fwd_mark generator helper sfeldma
2015-06-14  6:56   ` Jiri Pirko
2015-06-14 17:50     ` Scott Feldman
2015-06-15  5:46       ` Jiri Pirko
2015-06-15 13:52         ` Scott Feldman
2015-06-15 14:09           ` Sergei Shtylyov
2015-06-15 15:17   ` roopa
2015-06-13 18:04 ` [RFC PATCH net-next 3/4] rocker: add fwd_mark support sfeldma
2015-06-14  7:02   ` Jiri Pirko
2015-06-14 18:00     ` Scott Feldman
2015-06-15  5:49       ` Jiri Pirko
2015-06-13 18:04 ` [RFC PATCH net-next 4/4] switchdev: update documentation for fwd_mark sfeldma
2015-06-15 13:54 ` [RFC PATCH net-next 0/4] switchdev: avoid duplicate packet forwarding roopa
2015-06-15 14:23 ` roopa
2015-06-15 23:25 ` David Miller
2015-06-16  6:04   ` Jiri Pirko
2015-06-16 16:47     ` Scott Feldman
2015-06-16 21:11       ` Jiri Pirko
2015-06-16 23:53         ` Scott Feldman
2015-06-17  6:30           ` Jiri Pirko
2015-06-17  7:02             ` Scott Feldman
2015-06-17 10:23         ` Jamal Hadi Salim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.