All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next 0/8] Add support for offloading packet-sampling
@ 2016-11-10 11:23 Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 1/8] Introduce ife encapsulation module Jiri Pirko
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Jiri Pirko <jiri@mellanox.com>

Add the sample tc action, which allows to sample packet matching
a classifier. The sample action peeks randomly packets, duplicates them,
truncates them and adds informative metadata on the packet, for example,
the input interface and the original packet length. The sampled packets
are marked to allow matching them and redirecting them to a specific
collector device.

The sampled packets metadata is packed using ife encapsulation. To do
that, this patch-set extracts ife logics from the tc_ife action into an
independent ife module, and uses that functionality to pack the metadata.
To include all the needed metadata, this patch-set introduces some new
IFE_META tlv types.

In addition, Add the support for offloading the macthall-sample tc command
in the Mellanox mlxsw driver, for ingress qdiscs.

Userspace examples for that code can be found in:
 - https://github.com/yotamgi/host-sflow: sflow client that uses the sample
   with combination of tap device to sample packets and send to a
   centralized sflow collector.
 - https://github.com/yotamgi/libife: a library for manipulating ife
   packets in userspace. The library is used in the host-sflow program to
   parse the sampled packets.

---
rfc1->v1:
 - Change ifindex sampled packets metadatum to in_ifindex and out_ifindex
 - Add sequence number metadatum to sampled packets
 - Add sampler_id metadatum to sampled packets
 - Make the user kernel interface extensible
 - Move the sampling helper function to the ife module
 - Fix ife header to be safe when CONFIG_NET_IFE is not set
 - Made the sampled packets eth_type field mandatory other then optional
 - Change the IFE_META* fields to be enum
 - Remove the ife_packet_info struct and pass the parameters directly to
   the ife_packet_ifo_pack function
 - Couple of more styling and cosmetic issues

Yotam Gigi (8):
  Introduce ife encapsulation module
  act_ife: Change to use ife module
  net: ife: Introduce new metadata tlv types
  net: ife: Introduce packet info packing method
  Introduce sample tc action
  tc: sample: Add sequence number and sampler_id fields
  mlxsw: reg: add the Monitoring Packet Sampling Configuration Register
  mlxsw: packet sample: Add packet sample offloading support

 MAINTAINERS                                    |   7 +
 drivers/net/ethernet/mellanox/mlxsw/reg.h      |  38 ++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 120 +++++++++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  12 +
 drivers/net/ethernet/mellanox/mlxsw/trap.h     |   1 +
 include/net/ife.h                              |  63 ++++++
 include/net/tc_act/tc_ife.h                    |   3 -
 include/net/tc_act/tc_sample.h                 |  74 +++++++
 include/uapi/linux/Kbuild                      |   1 +
 include/uapi/linux/ife.h                       |  24 ++
 include/uapi/linux/tc_act/Kbuild               |   1 +
 include/uapi/linux/tc_act/tc_ife.h             |  10 +-
 include/uapi/linux/tc_act/tc_sample.h          |  29 +++
 net/Kconfig                                    |   1 +
 net/Makefile                                   |   1 +
 net/ife/Kconfig                                |  16 ++
 net/ife/Makefile                               |   5 +
 net/ife/ife.c                                  | 199 +++++++++++++++++
 net/sched/Kconfig                              |  14 ++
 net/sched/Makefile                             |   1 +
 net/sched/act_ife.c                            | 109 +++------
 net/sched/act_sample.c                         | 291 +++++++++++++++++++++++++
 22 files changed, 923 insertions(+), 97 deletions(-)
 create mode 100644 include/net/ife.h
 create mode 100644 include/net/tc_act/tc_sample.h
 create mode 100644 include/uapi/linux/ife.h
 create mode 100644 include/uapi/linux/tc_act/tc_sample.h
 create mode 100644 net/ife/Kconfig
 create mode 100644 net/ife/Makefile
 create mode 100644 net/ife/ife.c
 create mode 100644 net/sched/act_sample.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch net-next 1/8] Introduce ife encapsulation module
  2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
@ 2016-11-10 11:23 ` Jiri Pirko
  2016-11-10 19:17   ` David Miller
  2016-11-10 11:23 ` [patch net-next 2/8] act_ife: Change to use ife module Jiri Pirko
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Yotam Gigi <yotamg@mellanox.com>

This module is responsible for the ife encapsulation protocol
encode/decode logics. That module can:
 - ife_encode: encode skb and reserve space for the ife meta header
 - ife_decode: decode skb and extract the meta header size
 - ife_tlv_meta_encode - encodes one tlv entry into the reserved ife
   header space.
 - ife_tlv_meta_decode - decodes one tlv entry from the packet
 - ife_tlv_meta_next - advance to the next tlv

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 MAINTAINERS               |   7 +++
 include/net/ife.h         |  52 +++++++++++++++++
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/ife.h  |  18 ++++++
 net/Kconfig               |   1 +
 net/Makefile              |   1 +
 net/ife/Kconfig           |  16 ++++++
 net/ife/Makefile          |   5 ++
 net/ife/ife.c             | 144 ++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 245 insertions(+)
 create mode 100644 include/net/ife.h
 create mode 100644 include/uapi/linux/ife.h
 create mode 100644 net/ife/Kconfig
 create mode 100644 net/ife/Makefile
 create mode 100644 net/ife/ife.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e5c17a9..cc3e640 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6114,6 +6114,13 @@ F:	include/net/cfg802154.h
 F:	include/net/ieee802154_netdev.h
 F:	Documentation/networking/ieee802154.txt
 
+IFE PROTOCOL
+M:	Yotam Gigi <yotamg@mellanox.com>
+M:	Jamal Hadi Salim <jhs@mojatatu.com>
+F:	net/ife
+F:	include/net/ife.h
+F:	include/uapi/linux/ife.h
+
 IGORPLUG-USB IR RECEIVER
 M:	Sean Young <sean@mess.org>
 L:	linux-media@vger.kernel.org
diff --git a/include/net/ife.h b/include/net/ife.h
new file mode 100644
index 0000000..a75d4e0
--- /dev/null
+++ b/include/net/ife.h
@@ -0,0 +1,52 @@
+#ifndef __NET_IFE_H
+#define __NET_IFE_H
+
+#include <uapi/linux/ife.h>
+#include <linux/etherdevice.h>
+#include <linux/rtnetlink.h>
+#include <linux/module.h>
+#include <uapi/linux/ife.h>
+
+#if IS_ENABLED(CONFIG_NET_IFE)
+
+void *ife_encode(struct sk_buff *skb, u16 metalen);
+void *ife_decode(struct sk_buff *skb, u16 *metalen);
+
+void *ife_tlv_meta_decode(void *skbdata, u16 *attrtype, u16 *dlen, u16 *totlen);
+int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen,
+			const void *dval);
+
+void *ife_tlv_meta_next(void *skbdata);
+
+#else
+
+static inline void *ife_encode(struct sk_buff *skb, u16 metalen)
+{
+	return NULL;
+}
+
+static inline void *ife_decode(struct sk_buff *skb, u16 *metalen)
+{
+	return NULL;
+}
+
+static inline void *ife_tlv_meta_decode(void *skbdata, u16 *attrtype, u16 *dlen,
+					u16 *totlen)
+{
+	return NULL;
+}
+
+static inline int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen,
+			const void *dval)
+{
+	return 0;
+}
+
+static inline void *ife_tlv_meta_next(void *skbdata)
+{
+	return NULL;
+}
+
+#endif
+
+#endif /* __NET_IFE_H */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index cd2be1c..eabf838 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -191,6 +191,7 @@ header-y += if_tun.h
 header-y += if_tunnel.h
 header-y += if_vlan.h
 header-y += if_x25.h
+header-y += ife.h
 header-y += igmp.h
 header-y += ila.h
 header-y += in6.h
diff --git a/include/uapi/linux/ife.h b/include/uapi/linux/ife.h
new file mode 100644
index 0000000..2954da3
--- /dev/null
+++ b/include/uapi/linux/ife.h
@@ -0,0 +1,18 @@
+#ifndef __UAPI_IFE_H
+#define __UAPI_IFE_H
+
+#define IFE_METAHDRLEN 2
+
+enum {
+	IFE_META_SKBMARK = 1,
+	IFE_META_HASHID,
+	IFE_META_PRIO,
+	IFE_META_QMAP,
+	IFE_META_TCINDEX,
+	__IFE_META_MAX
+};
+
+/*Can be overridden at runtime by module option*/
+#define IFE_META_MAX (__IFE_META_MAX - 1)
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 7b6cd34..3cf29b1 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -393,6 +393,7 @@ source "net/9p/Kconfig"
 source "net/caif/Kconfig"
 source "net/ceph/Kconfig"
 source "net/nfc/Kconfig"
+source "net/ife/Kconfig"
 
 config LWTUNNEL
 	bool "Network light weight tunnels"
diff --git a/net/Makefile b/net/Makefile
index 4cafaa2..4ddc67e 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -69,6 +69,7 @@ obj-$(CONFIG_DNS_RESOLVER)	+= dns_resolver/
 obj-$(CONFIG_CEPH_LIB)		+= ceph/
 obj-$(CONFIG_BATMAN_ADV)	+= batman-adv/
 obj-$(CONFIG_NFC)		+= nfc/
+obj-$(CONFIG_NET_IFE)		+= ife/
 obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_MPLS)		+= mpls/
diff --git a/net/ife/Kconfig b/net/ife/Kconfig
new file mode 100644
index 0000000..31e48b6
--- /dev/null
+++ b/net/ife/Kconfig
@@ -0,0 +1,16 @@
+#
+# IFE subsystem configuration
+#
+
+menuconfig NET_IFE
+	depends on NET
+        tristate "Inter-FE based on IETF ForCES InterFE LFB"
+	default n
+	help
+	  Say Y here to add support of IFE encapsulation protocol
+	  For details refer to netdev01 paper:
+	  "Distributing Linux Traffic Control Classifier-Action Subsystem"
+	   Authors: Jamal Hadi Salim and Damascene M. Joachimpillai
+
+	  To compile this support as a module, choose M here: the module will
+	  be called ife.
diff --git a/net/ife/Makefile b/net/ife/Makefile
new file mode 100644
index 0000000..2a90d97
--- /dev/null
+++ b/net/ife/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the IFE encapsulation protocol
+#
+
+obj-$(CONFIG_NET_IFE) += ife.o
diff --git a/net/ife/ife.c b/net/ife/ife.c
new file mode 100644
index 0000000..ee3221a
--- /dev/null
+++ b/net/ife/ife.c
@@ -0,0 +1,144 @@
+/*
+ * net/ife/ife.c - Inter-FE protocol based on ForCES WG InterFE LFB
+ * Copyright (c) 2015 Jamal Hadi Salim <jhs@mojatatu.com>
+ * Copyright (c) 2016 Yotam Gigi <yotamg@mellanox.com>
+ *
+ * Refer to: draft-ietf-forces-interfelfb-03 and netdev01 paper:
+ * "Distributing Linux Traffic Control Classifier-Action Subsystem"
+ * Authors: Jamal Hadi Salim and Damascene M. Joachimpillai
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <net/net_namespace.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <linux/etherdevice.h>
+#include <net/ife.h>
+
+void *ife_encode(struct sk_buff *skb, u16 metalen)
+{
+	/* OUTERHDR:TOTMETALEN:{TLVHDR:Metadatum:TLVHDR..}:ORIGDATA
+	 * where ORIGDATA = original ethernet header ...
+	 */
+	int hdrm = metalen + IFE_METAHDRLEN;
+	int total_push = hdrm + skb->dev->hard_header_len;
+	struct ethhdr *iethh;	/* inner ether header */
+	int skboff = 0;
+	int err;
+
+	err = skb_cow_head(skb, total_push);
+	if (unlikely(err))
+		return NULL;
+
+	iethh = (struct ethhdr *) skb->data;
+
+	__skb_push(skb, total_push);
+	memcpy(skb->data, iethh, skb->dev->hard_header_len);
+	skb_reset_mac_header(skb);
+	skboff += skb->dev->hard_header_len;
+
+	/* total metadata length */
+	metalen += IFE_METAHDRLEN;
+	metalen = htons(metalen);
+	memcpy((skb->data + skboff), &metalen, IFE_METAHDRLEN);
+	skboff += IFE_METAHDRLEN;
+
+	return skb->data + skboff;
+}
+EXPORT_SYMBOL_GPL(ife_encode);
+
+struct ifeheadr {
+	__be16 metalen;
+	u8 tlv_data[];
+};
+
+void *ife_decode(struct sk_buff *skb, u16 *metalen)
+{
+	struct ifeheadr *ifehdr;
+	int total_pull;
+	u16 ifehdrln;
+
+	ifehdr = (struct ifeheadr *) (skb->data + skb->dev->hard_header_len);
+	ifehdrln = ifehdr->metalen;
+	ifehdrln = ntohs(ifehdrln);
+	total_pull = skb->dev->hard_header_len + ifehdrln;
+
+	if (unlikely(ifehdrln < 2))
+		return NULL;
+
+	if (unlikely(!pskb_may_pull(skb, total_pull)))
+		return NULL;
+
+	skb_set_mac_header(skb, total_pull);
+	__skb_pull(skb, total_pull);
+	*metalen = ifehdrln - IFE_METAHDRLEN;
+
+	return &ifehdr->tlv_data;
+}
+EXPORT_SYMBOL_GPL(ife_decode);
+
+struct meta_tlvhdr {
+	__be16 type;
+	__be16 len;
+};
+
+/* Caller takes care of presenting data in network order
+ */
+void *ife_tlv_meta_decode(void *skbdata, u16 *attrtype, u16 *dlen, u16 *totlen)
+{
+	struct meta_tlvhdr *tlv = (struct meta_tlvhdr *) skbdata;
+
+	*dlen = ntohs(tlv->len) - NLA_HDRLEN;
+	*attrtype = ntohs(tlv->type);
+
+	if (totlen)
+		*totlen = nla_total_size(*dlen);
+
+	return skbdata + sizeof(struct meta_tlvhdr);
+}
+EXPORT_SYMBOL_GPL(ife_tlv_meta_decode);
+
+void *ife_tlv_meta_next(void *skbdata)
+{
+	struct meta_tlvhdr *tlv = (struct meta_tlvhdr *) skbdata;
+	u16 tlvlen = tlv->len;
+
+	tlvlen = ntohs(tlvlen);
+	tlvlen = NLA_ALIGN(tlvlen);
+
+	return skbdata + tlvlen;
+}
+EXPORT_SYMBOL_GPL(ife_tlv_meta_next);
+
+/* Caller takes care of presenting data in network order
+ */
+int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval)
+{
+	u32 *tlv = (u32 *) (skbdata);
+	u16 totlen = nla_total_size(dlen);	/*alignment + hdr */
+	char *dptr = (char *) tlv + NLA_HDRLEN;
+	u32 htlv = attrtype << 16 | (dlen + NLA_HDRLEN);
+
+	*tlv = htonl(htlv);
+	memset(dptr, 0, totlen - NLA_HDRLEN);
+	memcpy(dptr, dval, dlen);
+
+	return totlen;
+}
+EXPORT_SYMBOL_GPL(ife_tlv_meta_encode);
+
+MODULE_AUTHOR("Jamal Hadi Salim <jhs@mojatatu.com>");
+MODULE_AUTHOR("Yotam Gigi <yotamg@mellanox.com>");
+MODULE_DESCRIPTION("Inter-FE LFB action");
+MODULE_LICENSE("GPL");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next 2/8] act_ife: Change to use ife module
  2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 1/8] Introduce ife encapsulation module Jiri Pirko
@ 2016-11-10 11:23 ` Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 3/8] net: ife: Introduce new metadata tlv types Jiri Pirko
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Yotam Gigi <yotamg@mellanox.com>

Use the encode/decode functionality from the ife module instead of using
implementation inside the act_ife.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/tc_act/tc_ife.h        |   3 -
 include/uapi/linux/tc_act/tc_ife.h |  10 +---
 net/sched/Kconfig                  |   1 +
 net/sched/act_ife.c                | 109 +++++++++++--------------------------
 4 files changed, 34 insertions(+), 89 deletions(-)

diff --git a/include/net/tc_act/tc_ife.h b/include/net/tc_act/tc_ife.h
index 9fd2bea0..30ba459 100644
--- a/include/net/tc_act/tc_ife.h
+++ b/include/net/tc_act/tc_ife.h
@@ -6,7 +6,6 @@
 #include <linux/rtnetlink.h>
 #include <linux/module.h>
 
-#define IFE_METAHDRLEN 2
 struct tcf_ife_info {
 	struct tc_action common;
 	u8 eth_dst[ETH_ALEN];
@@ -45,8 +44,6 @@ struct tcf_meta_ops {
 
 int ife_get_meta_u32(struct sk_buff *skb, struct tcf_meta_info *mi);
 int ife_get_meta_u16(struct sk_buff *skb, struct tcf_meta_info *mi);
-int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen,
-			const void *dval);
 int ife_alloc_meta_u32(struct tcf_meta_info *mi, void *metaval, gfp_t gfp);
 int ife_alloc_meta_u16(struct tcf_meta_info *mi, void *metaval, gfp_t gfp);
 int ife_check_meta_u32(u32 metaval, struct tcf_meta_info *mi);
diff --git a/include/uapi/linux/tc_act/tc_ife.h b/include/uapi/linux/tc_act/tc_ife.h
index cd18360..7c28178 100644
--- a/include/uapi/linux/tc_act/tc_ife.h
+++ b/include/uapi/linux/tc_act/tc_ife.h
@@ -3,6 +3,7 @@
 
 #include <linux/types.h>
 #include <linux/pkt_cls.h>
+#include <linux/ife.h>
 
 #define TCA_ACT_IFE 25
 /* Flag bits for now just encoding/decoding; mutually exclusive */
@@ -28,13 +29,4 @@ enum {
 };
 #define TCA_IFE_MAX (__TCA_IFE_MAX - 1)
 
-#define IFE_META_SKBMARK 1
-#define IFE_META_HASHID 2
-#define	IFE_META_PRIO 3
-#define	IFE_META_QMAP 4
-#define	IFE_META_TCINDEX 5
-/*Can be overridden at runtime by module option*/
-#define	__IFE_META_MAX 6
-#define IFE_META_MAX (__IFE_META_MAX - 1)
-
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 87956a7..24f7cac 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -763,6 +763,7 @@ config NET_ACT_SKBMOD
 config NET_ACT_IFE
         tristate "Inter-FE action based on IETF ForCES InterFE LFB"
         depends on NET_CLS_ACT
+        select NET_IFE
         ---help---
 	  Say Y here to allow for sourcing and terminating metadata
 	  For details refer to netdev01 paper:
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 95c463c..5c2478a 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -32,6 +32,7 @@
 #include <uapi/linux/tc_act/tc_ife.h>
 #include <net/tc_act/tc_ife.h>
 #include <linux/etherdevice.h>
+#include <net/ife.h>
 
 #define IFE_TAB_MASK 15
 
@@ -46,23 +47,6 @@ static const struct nla_policy ife_policy[TCA_IFE_MAX + 1] = {
 	[TCA_IFE_TYPE] = { .type = NLA_U16},
 };
 
-/* Caller takes care of presenting data in network order
-*/
-int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval)
-{
-	u32 *tlv = (u32 *)(skbdata);
-	u16 totlen = nla_total_size(dlen);	/*alignment + hdr */
-	char *dptr = (char *)tlv + NLA_HDRLEN;
-	u32 htlv = attrtype << 16 | (dlen + NLA_HDRLEN);
-
-	*tlv = htonl(htlv);
-	memset(dptr, 0, totlen - NLA_HDRLEN);
-	memcpy(dptr, dval, dlen);
-
-	return totlen;
-}
-EXPORT_SYMBOL_GPL(ife_tlv_meta_encode);
-
 int ife_encode_meta_u16(u16 metaval, void *skbdata, struct tcf_meta_info *mi)
 {
 	u16 edata = 0;
@@ -637,69 +621,60 @@ int find_decode_metaid(struct sk_buff *skb, struct tcf_ife_info *ife,
 	return 0;
 }
 
-struct ifeheadr {
-	__be16 metalen;
-	u8 tlv_data[];
-};
-
-struct meta_tlvhdr {
-	__be16 type;
-	__be16 len;
-};
-
 static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
 			  struct tcf_result *res)
 {
 	struct tcf_ife_info *ife = to_ife(a);
+	u32 at = G_TC_AT(skb->tc_verd);
 	int action = ife->tcf_action;
-	struct ifeheadr *ifehdr = (struct ifeheadr *)skb->data;
-	int ifehdrln = (int)ifehdr->metalen;
-	struct meta_tlvhdr *tlv = (struct meta_tlvhdr *)(ifehdr->tlv_data);
+	u8 *ifehdr_end;
+	u8 *tlv_data;
+	u16 metalen;
 
 	spin_lock(&ife->tcf_lock);
 	bstats_update(&ife->tcf_bstats, skb);
 	tcf_lastuse_update(&ife->tcf_tm);
 	spin_unlock(&ife->tcf_lock);
 
-	ifehdrln = ntohs(ifehdrln);
-	if (unlikely(!pskb_may_pull(skb, ifehdrln))) {
+	if (!(at & AT_EGRESS))
+		skb_push(skb, skb->dev->hard_header_len);
+
+	tlv_data = ife_decode(skb, &metalen);
+	if (unlikely(!tlv_data)) {
 		spin_lock(&ife->tcf_lock);
 		ife->tcf_qstats.drops++;
 		spin_unlock(&ife->tcf_lock);
 		return TC_ACT_SHOT;
 	}
 
-	skb_set_mac_header(skb, ifehdrln);
-	__skb_pull(skb, ifehdrln);
-	skb->protocol = eth_type_trans(skb, skb->dev);
-	ifehdrln -= IFE_METAHDRLEN;
-
-	while (ifehdrln > 0) {
-		u8 *tlvdata = (u8 *)tlv;
-		u16 mtype = tlv->type;
-		u16 mlen = tlv->len;
-		u16 alen;
+	ifehdr_end = tlv_data + metalen;
+	for (; tlv_data < ifehdr_end; tlv_data = ife_tlv_meta_next(tlv_data)) {
+		u8 *curr_data;
+		u16 mtype;
+		u16 dlen;
 
-		mtype = ntohs(mtype);
-		mlen = ntohs(mlen);
-		alen = NLA_ALIGN(mlen);
+		curr_data = ife_tlv_meta_decode(tlv_data, &mtype, &dlen, NULL);
 
-		if (find_decode_metaid(skb, ife, mtype, (mlen - NLA_HDRLEN),
-				       (void *)(tlvdata + NLA_HDRLEN))) {
+		if (find_decode_metaid(skb, ife, mtype, dlen, curr_data)) {
 			/* abuse overlimits to count when we receive metadata
 			 * but dont have an ops for it
 			 */
-			pr_info_ratelimited("Unknown metaid %d alnlen %d\n",
-					    mtype, mlen);
+			pr_info_ratelimited("Unknown metaid %d dlen %d\n",
+					    mtype, dlen);
 			ife->tcf_qstats.overlimits++;
 		}
+	}
 
-		tlvdata += alen;
-		ifehdrln -= alen;
-		tlv = (struct meta_tlvhdr *)tlvdata;
+	if (WARN_ON(tlv_data != ifehdr_end)) {
+		spin_lock(&ife->tcf_lock);
+		ife->tcf_qstats.drops++;
+		spin_unlock(&ife->tcf_lock);
+		return TC_ACT_SHOT;
 	}
 
+	skb->protocol = eth_type_trans(skb, skb->dev);
 	skb_reset_network_header(skb);
+
 	return action;
 }
 
@@ -727,7 +702,6 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_ife_info *ife = to_ife(a);
 	int action = ife->tcf_action;
 	struct ethhdr *oethh;	/* outer ether header */
-	struct ethhdr *iethh;	/* inner eth header */
 	struct tcf_meta_info *e;
 	/*
 	   OUTERHDR:TOTMETALEN:{TLVHDR:Metadatum:TLVHDR..}:ORIGDATA
@@ -735,10 +709,11 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 	 */
 	u16 metalen = ife_get_sz(skb, ife);
 	int hdrm = metalen + skb->dev->hard_header_len + IFE_METAHDRLEN;
-	unsigned int skboff = skb->dev->hard_header_len;
+	unsigned int skboff = 0;
 	u32 at = G_TC_AT(skb->tc_verd);
 	int new_len = skb->len + hdrm;
 	bool exceed_mtu = false;
+	void *ife_meta;
 	int err;
 
 	if (at & AT_EGRESS) {
@@ -766,27 +741,10 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 		return TC_ACT_SHOT;
 	}
 
-	err = skb_cow_head(skb, hdrm);
-	if (unlikely(err)) {
-		ife->tcf_qstats.drops++;
-		spin_unlock(&ife->tcf_lock);
-		return TC_ACT_SHOT;
-	}
-
 	if (!(at & AT_EGRESS))
 		skb_push(skb, skb->dev->hard_header_len);
 
-	iethh = (struct ethhdr *)skb->data;
-	__skb_push(skb, hdrm);
-	memcpy(skb->data, iethh, skb->mac_len);
-	skb_reset_mac_header(skb);
-	oethh = eth_hdr(skb);
-
-	/*total metadata length */
-	metalen += IFE_METAHDRLEN;
-	metalen = htons(metalen);
-	memcpy((skb->data + skboff), &metalen, IFE_METAHDRLEN);
-	skboff += IFE_METAHDRLEN;
+	ife_meta = ife_encode(skb, metalen);
 
 	/* XXX: we dont have a clever way of telling encode to
 	 * not repeat some of the computations that are done by
@@ -794,7 +752,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 	 */
 	list_for_each_entry(e, &ife->metalist, metalist) {
 		if (e->ops->encode) {
-			err = e->ops->encode(skb, (void *)(skb->data + skboff),
+			err = e->ops->encode(skb, (void *)(ife_meta + skboff),
 					     e);
 		}
 		if (err < 0) {
@@ -805,15 +763,12 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 		}
 		skboff += err;
 	}
+	oethh = (struct ethhdr *)skb->data;
 
 	if (!is_zero_ether_addr(ife->eth_src))
 		ether_addr_copy(oethh->h_source, ife->eth_src);
-	else
-		ether_addr_copy(oethh->h_source, iethh->h_source);
 	if (!is_zero_ether_addr(ife->eth_dst))
 		ether_addr_copy(oethh->h_dest, ife->eth_dst);
-	else
-		ether_addr_copy(oethh->h_dest, iethh->h_dest);
 	oethh->h_proto = htons(ife->eth_type);
 
 	if (!(at & AT_EGRESS))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next 3/8] net: ife: Introduce new metadata tlv types
  2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 1/8] Introduce ife encapsulation module Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 2/8] act_ife: Change to use ife module Jiri Pirko
@ 2016-11-10 11:23 ` Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 4/8] net: ife: Introduce packet info packing method Jiri Pirko
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Yotam Gigi <yotamg@mellanox.com>

IFE_META_IN_IFINDEX: Allow to pass input ifindex value as part of the
ife metadata
IFE_META_OUT_IFINDEX: Allow to pass output ifindex value as part of the
ife metadata
IFE_META_ORIG_SIZE: Allow to pass the original packet size as part of
the ife metadata. Can be used in case that the packet is truncated
IFE_META_SIZE: Allow to pass the size of the encapsulated packet as
part of the ife metadata

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/ife.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/ife.h b/include/uapi/linux/ife.h
index 2954da3..ebdd785 100644
--- a/include/uapi/linux/ife.h
+++ b/include/uapi/linux/ife.h
@@ -9,6 +9,10 @@ enum {
 	IFE_META_PRIO,
 	IFE_META_QMAP,
 	IFE_META_TCINDEX,
+	IFE_META_IN_IFINDEX,
+	IFE_META_OUT_IFINDEX,
+	IFE_META_ORIGSIZE,
+	IFE_META_SIZE,
 	__IFE_META_MAX
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next 4/8] net: ife: Introduce packet info packing method
  2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
                   ` (2 preceding siblings ...)
  2016-11-10 11:23 ` [patch net-next 3/8] net: ife: Introduce new metadata tlv types Jiri Pirko
@ 2016-11-10 11:23 ` Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 5/8] Introduce sample tc action Jiri Pirko
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Yotam Gigi <yotamg@mellanox.com>

Add the ife_packet_info_pack function which encodes some informative
metadata into a packet using the ife encapsulation. The informative
metadata consists of:
 - The packet input/output ifindices
 - The packet original size, in case it was truncated

This method is useful for anyone that wants to pass informative metadata
on a packet alongside with the packet itself. A good usage can be packet
sampling.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ife.h | 10 ++++++++++
 net/ife/ife.c     | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/include/net/ife.h b/include/net/ife.h
index a75d4e0..5fbcfb3 100644
--- a/include/net/ife.h
+++ b/include/net/ife.h
@@ -18,6 +18,9 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen,
 
 void *ife_tlv_meta_next(void *skbdata);
 
+struct ethhdr *ife_packet_info_pack(struct sk_buff *skb, int orig_size,
+				    int in_ifindex, int out_ifindex);
+
 #else
 
 static inline void *ife_encode(struct sk_buff *skb, u16 metalen)
@@ -47,6 +50,13 @@ static inline void *ife_tlv_meta_next(void *skbdata)
 	return NULL;
 }
 
+static inline struct ethhdr *
+ife_packet_info_pack(struct sk_buff *skb, int orig_size, int in_ifindex,
+		     int out_ifindex)
+{
+	return NULL;
+}
+
 #endif
 
 #endif /* __NET_IFE_H */
diff --git a/net/ife/ife.c b/net/ife/ife.c
index ee3221a..756af46 100644
--- a/net/ife/ife.c
+++ b/net/ife/ife.c
@@ -138,6 +138,51 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval)
 }
 EXPORT_SYMBOL_GPL(ife_tlv_meta_encode);
 
+struct ethhdr *ife_packet_info_pack(struct sk_buff *skb, int orig_size,
+				    int in_ifindex, int out_ifindex)
+{
+	int curr_size;
+	void *ifetlv;
+	u16 metalen;
+
+	curr_size = skb->len;
+
+	metalen = nla_total_size(sizeof(orig_size)) +
+		  nla_total_size(sizeof(curr_size));
+
+	if (in_ifindex)
+		metalen += nla_total_size(sizeof(in_ifindex));
+	if (out_ifindex)
+		metalen += nla_total_size(sizeof(out_ifindex));
+
+	in_ifindex = htonl(in_ifindex);
+	out_ifindex = htonl(out_ifindex);
+	orig_size = htonl(orig_size);
+	curr_size = htonl(curr_size);
+
+	ifetlv = ife_encode(skb, metalen);
+	if (!ifetlv)
+		return NULL;
+
+	if (in_ifindex)
+		ifetlv += ife_tlv_meta_encode(ifetlv, IFE_META_IN_IFINDEX,
+					      sizeof(in_ifindex), &in_ifindex);
+
+	if (out_ifindex)
+		ifetlv += ife_tlv_meta_encode(ifetlv, IFE_META_OUT_IFINDEX,
+					      sizeof(out_ifindex),
+					      &out_ifindex);
+
+	ifetlv += ife_tlv_meta_encode(ifetlv, IFE_META_ORIGSIZE,
+				      sizeof(orig_size), &orig_size);
+
+	ifetlv += ife_tlv_meta_encode(ifetlv, IFE_META_SIZE,
+				      sizeof(curr_size), &curr_size);
+
+	return (struct ethhdr *) skb->data;
+}
+EXPORT_SYMBOL(ife_packet_info_pack);
+
 MODULE_AUTHOR("Jamal Hadi Salim <jhs@mojatatu.com>");
 MODULE_AUTHOR("Yotam Gigi <yotamg@mellanox.com>");
 MODULE_DESCRIPTION("Inter-FE LFB action");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next 5/8] Introduce sample tc action
  2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
                   ` (3 preceding siblings ...)
  2016-11-10 11:23 ` [patch net-next 4/8] net: ife: Introduce packet info packing method Jiri Pirko
@ 2016-11-10 11:23 ` Jiri Pirko
  2016-11-10 19:35   ` John Fastabend
  2016-11-10 11:23 ` [patch net-next 6/8] tc: sample: Add sequence number and sampler_id fields Jiri Pirko
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Yotam Gigi <yotamg@mellanox.com>

This action allow the user to sample traffic matched by tc classifier.
The sampling consists of choosing packets randomly, truncating them,
adding some informative metadata regarding the interface and the original
packet size and mark them with specific mark, to allow further tc rules to
match and process. The marked sample packets are then injected into the
device ingress qdisc using netif_receive_skb.

The packets metadata is packed using the ife encapsulation protocol, and
the outer packet's ethernet dest, source and eth_type, along with the
rate, mark and the optional truncation size can be configured from
userspace.

Example:
To sample ingress traffic from interface eth1, and redirect the sampled
the sampled packets to interface dummy0, one may use the commands:

tc qdisc add dev eth1 handle ffff: ingress

tc filter add dev eth1 parent ffff: \
	   matchall action sample rate 12 mark 17

tc filter add parent ffff: dev eth1 protocol all \
	   u32 match mark 17 0xff \
	   action mirred egress redirect dev dummy0

Where the first command adds an ingress qdisc and the second starts
sampling every 12'th packet on dev eth1 and marks the sampled packets with
17. The third command catches the sampled packets, which are marked with
17, and redirects them to dev dummy0.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/tc_act/tc_sample.h        |  65 ++++++++
 include/uapi/linux/tc_act/Kbuild      |   1 +
 include/uapi/linux/tc_act/tc_sample.h |  29 ++++
 net/sched/Kconfig                     |  13 ++
 net/sched/Makefile                    |   1 +
 net/sched/act_sample.c                | 283 ++++++++++++++++++++++++++++++++++
 6 files changed, 392 insertions(+)
 create mode 100644 include/net/tc_act/tc_sample.h
 create mode 100644 include/uapi/linux/tc_act/tc_sample.h
 create mode 100644 net/sched/act_sample.c

diff --git a/include/net/tc_act/tc_sample.h b/include/net/tc_act/tc_sample.h
new file mode 100644
index 0000000..73154b3
--- /dev/null
+++ b/include/net/tc_act/tc_sample.h
@@ -0,0 +1,65 @@
+#ifndef __NET_TC_SAMPLE_H
+#define __NET_TC_SAMPLE_H
+
+#include <net/act_api.h>
+#include <linux/tc_act/tc_sample.h>
+
+struct tcf_sample {
+	struct tc_action	common;
+	u32			rate;
+	u32			mark;
+	bool			truncate;
+	u32			trunc_size;
+	u32			packet_counter;
+	u8			eth_dst[ETH_ALEN];
+	u8			eth_src[ETH_ALEN];
+	u16			eth_type;
+	struct list_head	tcfm_list;
+};
+#define to_sample(a) ((struct tcf_sample *)a)
+
+static inline bool is_tcf_sample(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	return a->ops && a->ops->type == TCA_ACT_SAMPLE;
+#else
+	return false;
+#endif
+}
+
+static inline __u32 tcf_sample_mark(const struct tc_action *a)
+{
+	return to_sample(a)->mark;
+}
+
+static inline __u32 tcf_sample_rate(const struct tc_action *a)
+{
+	return to_sample(a)->rate;
+}
+
+static inline bool tcf_sample_truncate(const struct tc_action *a)
+{
+	return to_sample(a)->truncate;
+}
+
+static inline int tcf_sample_trunc_size(const struct tc_action *a)
+{
+	return to_sample(a)->trunc_size;
+}
+
+static inline u16 tcf_sample_eth_type(const struct tc_action *a)
+{
+	return to_sample(a)->eth_type;
+}
+
+static inline void tcf_sample_eth_dst_addr(const struct tc_action *a, u8 *dst)
+{
+	ether_addr_copy(dst, to_sample(a)->eth_dst);
+}
+
+static inline void tcf_sample_eth_src_addr(const struct tc_action *a, u8 *src)
+{
+	ether_addr_copy(src, to_sample(a)->eth_src);
+}
+
+#endif /* __NET_TC_SAMPLE_H */
diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index e3969bd..6c6b8d6 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -4,6 +4,7 @@ header-y += tc_defact.h
 header-y += tc_gact.h
 header-y += tc_ipt.h
 header-y += tc_mirred.h
+header-y += tc_sample.h
 header-y += tc_nat.h
 header-y += tc_pedit.h
 header-y += tc_skbedit.h
diff --git a/include/uapi/linux/tc_act/tc_sample.h b/include/uapi/linux/tc_act/tc_sample.h
new file mode 100644
index 0000000..44ee9d0
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_sample.h
@@ -0,0 +1,29 @@
+#ifndef __LINUX_TC_SAMPLE_H
+#define __LINUX_TC_SAMPLE_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+#include <linux/if_ether.h>
+
+#define TCA_ACT_SAMPLE 26
+
+struct tc_sample {
+	tc_gen;
+};
+
+enum {
+	TCA_SAMPLE_UNSPEC,
+	TCA_SAMPLE_PARMS,
+	TCA_SAMPLE_TM,
+	TCA_SAMPLE_RATE,
+	TCA_SAMPLE_MARK,
+	TCA_SAMPLE_TRUNC_SIZE,
+	TCA_SAMPLE_ETH_DST,
+	TCA_SAMPLE_ETH_SRC,
+	TCA_SAMPLE_ETH_TYPE,
+	TCA_SAMPLE_PAD,
+	__TCA_SAMPLE_MAX
+};
+#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 24f7cac..c54ea6b 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -650,6 +650,19 @@ config NET_ACT_MIRRED
 	  To compile this code as a module, choose M here: the
 	  module will be called act_mirred.
 
+config NET_ACT_SAMPLE
+        tristate "Traffic Sampling"
+        depends on NET_CLS_ACT
+        select NET_IFE
+        ---help---
+	  Say Y here to allow packet sampling tc action. The packet sample
+	  action consists of statistically duplicating packets, truncating them
+	  and adding descriptive metadata to them. The duplicated packets are
+	  then marked to allow further processing using tc.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_sample.
+
 config NET_ACT_IPT
         tristate "IPtables targets"
         depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 4bdda36..7b915d2 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_NET_CLS_ACT)	+= act_api.o
 obj-$(CONFIG_NET_ACT_POLICE)	+= act_police.o
 obj-$(CONFIG_NET_ACT_GACT)	+= act_gact.o
 obj-$(CONFIG_NET_ACT_MIRRED)	+= act_mirred.o
+obj-$(CONFIG_NET_ACT_SAMPLE)	+= act_sample.o
 obj-$(CONFIG_NET_ACT_IPT)	+= act_ipt.o
 obj-$(CONFIG_NET_ACT_NAT)	+= act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)	+= act_pedit.o
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
new file mode 100644
index 0000000..6596f4b
--- /dev/null
+++ b/net/sched/act_sample.c
@@ -0,0 +1,283 @@
+/*
+ * net/sched/act_sample.c - Packet samplig tc action
+ * Copyright (c) 2016 Yotam Gigi <yotamg@mellanox.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/gfp.h>
+#include <net/net_namespace.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <linux/tc_act/tc_sample.h>
+#include <net/tc_act/tc_sample.h>
+#include <net/ife.h>
+
+#include <linux/if_arp.h>
+
+#define SAMPLE_TAB_MASK     7
+static int sample_net_id;
+static struct tc_action_ops act_sample_ops;
+
+static const struct nla_policy sample_policy[TCA_SAMPLE_MAX + 1] = {
+	[TCA_SAMPLE_PARMS]	= { .len = sizeof(struct tc_sample) },
+};
+
+static int tcf_sample_init(struct net *net, struct nlattr *nla,
+			   struct nlattr *est, struct tc_action **a, int ovr,
+			   int bind)
+{
+	struct tc_action_net *tn = net_generic(net, sample_net_id);
+	struct nlattr *tb[TCA_SAMPLE_MAX + 1];
+	struct tc_sample *parm;
+	struct tcf_sample *s;
+	int ret;
+	bool exists = false;
+
+	if (!nla)
+		return -EINVAL;
+	ret = nla_parse_nested(tb, TCA_SAMPLE_MAX, nla, sample_policy);
+	if (ret < 0)
+		return ret;
+	if (!tb[TCA_SAMPLE_PARMS] || !tb[TCA_SAMPLE_RATE] ||
+	    !tb[TCA_SAMPLE_MARK] || !tb[TCA_SAMPLE_ETH_TYPE])
+		return -EINVAL;
+
+	parm = nla_data(tb[TCA_SAMPLE_PARMS]);
+
+	exists = tcf_hash_check(tn, parm->index, a, bind);
+	if (exists && bind)
+		return 0;
+
+	if (!exists) {
+		ret = tcf_hash_create(tn, parm->index, est, a,
+				      &act_sample_ops, bind, false);
+		if (ret)
+			return ret;
+		ret = ACT_P_CREATED;
+	} else {
+		tcf_hash_release(*a, bind);
+		if (!ovr)
+			return -EEXIST;
+	}
+	s = to_sample(*a);
+
+	ASSERT_RTNL();
+	s->tcf_action = parm->action;
+	s->rate = nla_get_u32(tb[TCA_SAMPLE_RATE]);
+	s->mark = nla_get_u32(tb[TCA_SAMPLE_MARK]);
+	s->eth_type = nla_get_u16(tb[TCA_SAMPLE_ETH_TYPE]);
+
+	s->truncate = tb[TCA_SAMPLE_TRUNC_SIZE];
+	if (tb[TCA_SAMPLE_TRUNC_SIZE])
+		s->trunc_size = nla_get_u32(tb[TCA_SAMPLE_TRUNC_SIZE]);
+
+	s->packet_counter = 0;
+
+	if (tb[TCA_SAMPLE_ETH_SRC])
+		ether_addr_copy(s->eth_src, nla_data(tb[TCA_SAMPLE_ETH_SRC]));
+	else
+		eth_zero_addr(s->eth_src);
+	if (tb[TCA_SAMPLE_ETH_DST])
+		ether_addr_copy(s->eth_dst, nla_data(tb[TCA_SAMPLE_ETH_DST]));
+	else
+		eth_zero_addr(s->eth_dst);
+
+	if (ret == ACT_P_CREATED)
+		tcf_hash_insert(tn, *a);
+	return ret;
+}
+
+static bool dev_ok_push(struct net_device *dev)
+{
+	switch (dev->type) {
+	case ARPHRD_TUNNEL:
+	case ARPHRD_TUNNEL6:
+	case ARPHRD_SIT:
+	case ARPHRD_IPGRE:
+	case ARPHRD_VOID:
+	case ARPHRD_NONE:
+		return false;
+	default:
+		return true;
+	}
+}
+
+static int tcf_sample(struct sk_buff *skb, const struct tc_action *a,
+		      struct tcf_result *res)
+{
+	struct tcf_sample *s = to_sample(a);
+	static struct ethhdr *ethhdr;
+	struct sk_buff *skb2;
+	int orig_size;
+	int retval;
+	u32 at;
+
+	tcf_lastuse_update(&s->tcf_tm);
+	bstats_cpu_update(this_cpu_ptr(s->common.cpu_bstats), skb);
+
+	rcu_read_lock();
+	retval = READ_ONCE(s->tcf_action);
+
+	if (++s->packet_counter % s->rate == 0) {
+		skb2 = skb_copy(skb, GFP_ATOMIC);
+		if (!skb2)
+			goto out;
+
+		if (s->truncate)
+			skb_trim(skb2, s->trunc_size);
+
+		at = G_TC_AT(skb->tc_verd);
+		skb2->mac_len = skb->mac_len;
+
+		/* on ingress, the mac header gets poped, so push it back */
+		if (!(at & AT_EGRESS) && dev_ok_push(skb->dev))
+			skb_push(skb2, skb2->mac_len);
+
+		orig_size = skb->len + skb->dev->hard_header_len;
+		ethhdr = ife_packet_info_pack(skb2, orig_size,
+					      skb->dev->ifindex, 0);
+		if (!ethhdr)
+			goto out;
+
+		ethhdr->h_proto = htons(s->eth_type);
+		if (!is_zero_ether_addr(s->eth_src))
+			ether_addr_copy(ethhdr->h_source, s->eth_src);
+		if (!is_zero_ether_addr(s->eth_dst))
+			ether_addr_copy(ethhdr->h_dest, s->eth_dst);
+
+		skb2->mark = s->mark;
+		netif_receive_skb(skb2);
+
+		/* mirror is always swallowed */
+		skb2->tc_verd = SET_TC_FROM(skb2->tc_verd, at);
+	}
+out:
+	rcu_read_unlock();
+
+	return retval;
+}
+
+static int tcf_sample_dump(struct sk_buff *skb, struct tc_action *a,
+			   int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_sample *s = to_sample(a);
+	struct tc_sample opt = {
+		.index      = s->tcf_index,
+		.action     = s->tcf_action,
+		.refcnt     = s->tcf_refcnt - ref,
+		.bindcnt    = s->tcf_bindcnt - bind,
+	};
+	struct tcf_t t;
+
+	if (nla_put(skb, TCA_SAMPLE_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	tcf_tm_dump(&t, &s->tcf_tm);
+	if (nla_put_64bit(skb, TCA_SAMPLE_TM, sizeof(t), &t, TCA_SAMPLE_PAD))
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_SAMPLE_RATE, s->rate))
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_SAMPLE_MARK, s->mark))
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_SAMPLE_ETH_TYPE, s->eth_type))
+		goto nla_put_failure;
+
+	if (s->truncate)
+		if (nla_put_u32(skb, TCA_SAMPLE_TRUNC_SIZE, s->trunc_size))
+			goto nla_put_failure;
+
+	if (!is_zero_ether_addr(s->eth_src))
+		if (nla_put(skb, TCA_SAMPLE_ETH_SRC, ETH_ALEN, s->eth_src))
+			goto nla_put_failure;
+
+	if (!is_zero_ether_addr(s->eth_dst))
+		if (nla_put(skb, TCA_SAMPLE_ETH_DST, ETH_ALEN, s->eth_dst))
+			goto nla_put_failure;
+
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_sample_walker(struct net *net, struct sk_buff *skb,
+			     struct netlink_callback *cb, int type,
+			     const struct tc_action_ops *ops)
+{
+	struct tc_action_net *tn = net_generic(net, sample_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops);
+}
+
+static int tcf_sample_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, sample_net_id);
+
+	return tcf_hash_search(tn, a, index);
+}
+
+static struct tc_action_ops act_sample_ops = {
+	.kind	= "sample",
+	.type	= TCA_ACT_SAMPLE,
+	.owner	= THIS_MODULE,
+	.act	= tcf_sample,
+	.dump	= tcf_sample_dump,
+	.init	= tcf_sample_init,
+	.walk	= tcf_sample_walker,
+	.lookup	= tcf_sample_search,
+	.size	= sizeof(struct tcf_sample),
+};
+
+static __net_init int sample_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, sample_net_id);
+
+	return tc_action_net_init(tn, &act_sample_ops, SAMPLE_TAB_MASK);
+}
+
+static void __net_exit sample_exit_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, sample_net_id);
+
+	tc_action_net_exit(tn);
+}
+
+static struct pernet_operations sample_net_ops = {
+	.init = sample_init_net,
+	.exit = sample_exit_net,
+	.id   = &sample_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init sample_init_module(void)
+{
+	return tcf_register_action(&act_sample_ops, &sample_net_ops);
+}
+
+static void __exit sample_cleanup_module(void)
+{
+	tcf_unregister_action(&act_sample_ops, &sample_net_ops);
+}
+
+module_init(sample_init_module);
+module_exit(sample_cleanup_module);
+
+MODULE_AUTHOR("Yotam Gigi <yotamg@mellanox.com>");
+MODULE_DESCRIPTION("Packet sampling action");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next 6/8] tc: sample: Add sequence number and sampler_id fields
  2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
                   ` (4 preceding siblings ...)
  2016-11-10 11:23 ` [patch net-next 5/8] Introduce sample tc action Jiri Pirko
@ 2016-11-10 11:23 ` Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 7/8] mlxsw: reg: add the Monitoring Packet Sampling Configuration Register Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 8/8] mlxsw: packet sample: Add packet sample offloading support Jiri Pirko
  7 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Yotam Gigi <yotamg@mellanox.com>

Add support for sequence numbering to the sampled packets. The sequence
number counts the sampled packets and enables to check whether sampled
packets have been dropped. The sequence state is kept per sample action
instance.

Add unique id (sampler_id) for each sample action instance, and add that
sampler_id to the packet. This way, a user application can verify sample
drops, by tracking the sequence numbers from each sampler id.

Because of the fact that hw and sw can be triggered by the same tc action
(if the user does not specify both skip_sw and skip_hw), each should
have a different sampler_id, as each generates a different sequence
numbering. To allow that, the macros TC_SAMPLE_HW_ID and TC_SAMPLE_SW_ID,
which, given a sampler_id, calculates the hw sample_id and sw sample_id.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ife.h              |  5 +++--
 include/net/tc_act/tc_sample.h |  9 +++++++++
 include/uapi/linux/ife.h       |  2 ++
 net/ife/ife.c                  | 14 ++++++++++++--
 net/sched/act_sample.c         | 10 +++++++++-
 5 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/include/net/ife.h b/include/net/ife.h
index 5fbcfb3..3ae4f0d 100644
--- a/include/net/ife.h
+++ b/include/net/ife.h
@@ -19,7 +19,8 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen,
 void *ife_tlv_meta_next(void *skbdata);
 
 struct ethhdr *ife_packet_info_pack(struct sk_buff *skb, int orig_size,
-				    int in_ifindex, int out_ifindex);
+				    int in_ifindex, int out_ifindex,
+				    u32 sampler_id, u32 seq);
 
 #else
 
@@ -52,7 +53,7 @@ static inline void *ife_tlv_meta_next(void *skbdata)
 
 static inline struct ethhdr *
 ife_packet_info_pack(struct sk_buff *skb, int orig_size, int in_ifindex,
-		     int out_ifindex)
+		     int out_ifindex, u32 sampler_id, u32 seq)
 {
 	return NULL;
 }
diff --git a/include/net/tc_act/tc_sample.h b/include/net/tc_act/tc_sample.h
index 73154b3..98eafda 100644
--- a/include/net/tc_act/tc_sample.h
+++ b/include/net/tc_act/tc_sample.h
@@ -14,9 +14,13 @@ struct tcf_sample {
 	u8			eth_dst[ETH_ALEN];
 	u8			eth_src[ETH_ALEN];
 	u16			eth_type;
+	u32			sampler_id;
+	u32			seq;
 	struct list_head	tcfm_list;
 };
 #define to_sample(a) ((struct tcf_sample *)a)
+#define TC_SAMPLE_SW_ID(id) ((id << 1) + 0)
+#define TC_SAMPLE_HW_ID(id) ((id << 1) + 1)
 
 static inline bool is_tcf_sample(const struct tc_action *a)
 {
@@ -62,4 +66,9 @@ static inline void tcf_sample_eth_src_addr(const struct tc_action *a, u8 *src)
 	ether_addr_copy(src, to_sample(a)->eth_src);
 }
 
+static inline u32 tcf_sample_sampler_id(const struct tc_action *a)
+{
+	return to_sample(a)->sampler_id;
+}
+
 #endif /* __NET_TC_SAMPLE_H */
diff --git a/include/uapi/linux/ife.h b/include/uapi/linux/ife.h
index ebdd785..2007818 100644
--- a/include/uapi/linux/ife.h
+++ b/include/uapi/linux/ife.h
@@ -13,6 +13,8 @@ enum {
 	IFE_META_OUT_IFINDEX,
 	IFE_META_ORIGSIZE,
 	IFE_META_SIZE,
+	IFE_META_SAMPLER_ID,
+	IFE_META_SEQ,
 	__IFE_META_MAX
 };
 
diff --git a/net/ife/ife.c b/net/ife/ife.c
index 756af46..3ebd6ab 100644
--- a/net/ife/ife.c
+++ b/net/ife/ife.c
@@ -139,7 +139,8 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval)
 EXPORT_SYMBOL_GPL(ife_tlv_meta_encode);
 
 struct ethhdr *ife_packet_info_pack(struct sk_buff *skb, int orig_size,
-				    int in_ifindex, int out_ifindex)
+				    int in_ifindex, int out_ifindex,
+				    u32 sampler_id, u32 seq)
 {
 	int curr_size;
 	void *ifetlv;
@@ -148,7 +149,9 @@ struct ethhdr *ife_packet_info_pack(struct sk_buff *skb, int orig_size,
 	curr_size = skb->len;
 
 	metalen = nla_total_size(sizeof(orig_size)) +
-		  nla_total_size(sizeof(curr_size));
+		  nla_total_size(sizeof(curr_size)) +
+		  nla_total_size(sizeof(sampler_id)) +
+		  nla_total_size(sizeof(seq));
 
 	if (in_ifindex)
 		metalen += nla_total_size(sizeof(in_ifindex));
@@ -159,6 +162,8 @@ struct ethhdr *ife_packet_info_pack(struct sk_buff *skb, int orig_size,
 	out_ifindex = htonl(out_ifindex);
 	orig_size = htonl(orig_size);
 	curr_size = htonl(curr_size);
+	sampler_id = htonl(sampler_id);
+	seq = htonl(seq);
 
 	ifetlv = ife_encode(skb, metalen);
 	if (!ifetlv)
@@ -179,6 +184,11 @@ struct ethhdr *ife_packet_info_pack(struct sk_buff *skb, int orig_size,
 	ifetlv += ife_tlv_meta_encode(ifetlv, IFE_META_SIZE,
 				      sizeof(curr_size), &curr_size);
 
+	ifetlv += ife_tlv_meta_encode(ifetlv, IFE_META_SAMPLER_ID,
+				      sizeof(sampler_id), &sampler_id);
+
+	ifetlv += ife_tlv_meta_encode(ifetlv, IFE_META_SEQ, sizeof(seq), &seq);
+
 	return (struct ethhdr *) skb->data;
 }
 EXPORT_SYMBOL(ife_packet_info_pack);
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 6596f4b..3530401 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -27,6 +27,7 @@
 
 #define SAMPLE_TAB_MASK     7
 static int sample_net_id;
+static u32 samplers;
 static struct tc_action_ops act_sample_ops;
 
 static const struct nla_policy sample_policy[TCA_SAMPLE_MAX + 1] = {
@@ -77,6 +78,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
 	s->rate = nla_get_u32(tb[TCA_SAMPLE_RATE]);
 	s->mark = nla_get_u32(tb[TCA_SAMPLE_MARK]);
 	s->eth_type = nla_get_u16(tb[TCA_SAMPLE_ETH_TYPE]);
+	s->sampler_id = samplers++;
 
 	s->truncate = tb[TCA_SAMPLE_TRUNC_SIZE];
 	if (tb[TCA_SAMPLE_TRUNC_SIZE])
@@ -119,6 +121,7 @@ static int tcf_sample(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_sample *s = to_sample(a);
 	static struct ethhdr *ethhdr;
 	struct sk_buff *skb2;
+	u32 sampler_id;
 	int orig_size;
 	int retval;
 	u32 at;
@@ -145,8 +148,12 @@ static int tcf_sample(struct sk_buff *skb, const struct tc_action *a,
 			skb_push(skb2, skb2->mac_len);
 
 		orig_size = skb->len + skb->dev->hard_header_len;
+		sampler_id = TC_SAMPLE_SW_ID(s->sampler_id);
+
 		ethhdr = ife_packet_info_pack(skb2, orig_size,
-					      skb->dev->ifindex, 0);
+					      skb->dev->ifindex, 0,
+					      sampler_id, s->seq++);
+
 		if (!ethhdr)
 			goto out;
 
@@ -267,6 +274,7 @@ static struct pernet_operations sample_net_ops = {
 
 static int __init sample_init_module(void)
 {
+	samplers = 0;
 	return tcf_register_action(&act_sample_ops, &sample_net_ops);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next 7/8] mlxsw: reg: add the Monitoring Packet Sampling Configuration Register
  2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
                   ` (5 preceding siblings ...)
  2016-11-10 11:23 ` [patch net-next 6/8] tc: sample: Add sequence number and sampler_id fields Jiri Pirko
@ 2016-11-10 11:23 ` Jiri Pirko
  2016-11-10 11:23 ` [patch net-next 8/8] mlxsw: packet sample: Add packet sample offloading support Jiri Pirko
  7 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Yotam Gigi <yotamg@mellanox.com>

The MPSC register allows to configure ingress packet sampling on specific
port of the mlxsw device. The sampled packets are then trapped via
PKT_SAMPLE trap.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 38 +++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index a61ce34..0936b4a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -4771,6 +4771,44 @@ static inline void mlxsw_reg_mlcr_pack(char *payload, u8 local_port,
 					   MLXSW_REG_MLCR_DURATION_MAX : 0);
 }
 
+/* MPSC - Monitoring Packet Sampling Configuration Register
+ * --------------------------------------------------------
+ * MPSC Register is used to configure the Packet Sampling mechanism.
+ */
+#define MLXSW_REG_MPSC_ID 0x9080
+#define MLXSW_REG_MPSC_LEN 0x14
+
+MLXSW_REG_DEFINE(mpsc, MLXSW_REG_MPSC_ID, MLXSW_REG_MPSC_LEN);
+
+/* reg_mpsc_local_port
+ * Local port number
+ * Not supported for CPU port
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mpsc, local_port, 0x00, 16, 8);
+
+/* reg_mpsc_e
+ * Enable sampling on port local_port
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mpsc, e, 0x04, 30, 1);
+
+/* reg_mpsc_rate
+ * Sampling rate = 1 out of rate packets (with randomization around
+ * the point). Valid values are: 1 to 3.5*10^9
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mpsc, rate, 0x08, 0, 32);
+
+static inline void mlxsw_reg_mpsc_pack(char *payload, u8 local_port, bool e,
+				       u32 rate)
+{
+	MLXSW_REG_ZERO(mpsc, payload);
+	mlxsw_reg_mpsc_local_port_set(payload, local_port);
+	mlxsw_reg_mpsc_e_set(payload, e);
+	mlxsw_reg_mpsc_rate_set(payload, rate);
+}
+
 /* SBPR - Shared Buffer Pools Register
  * -----------------------------------
  * The SBPR configures and retrieves the shared buffer pools and configuration.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next 8/8] mlxsw: packet sample: Add packet sample offloading support
  2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
                   ` (6 preceding siblings ...)
  2016-11-10 11:23 ` [patch net-next 7/8] mlxsw: reg: add the Monitoring Packet Sampling Configuration Register Jiri Pirko
@ 2016-11-10 11:23 ` Jiri Pirko
  7 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2016-11-10 11:23 UTC (permalink / raw)
  To: netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Yotam Gigi <yotamg@mellanox.com>

Using the MPSC regiter, add the functions that configure port packets
sampling in hardware and the necessary datatypes in the mlxsw_sp_port
struct. In addition, add the necessary trap for sampled packets and
integrate with matchall offloading to allow offloading of the sample tc
action.

The current offload support is for the tc command:

tc filter add dev <DEV> parent ffff:   \
	  matchall   \
	  action sample rate <RATE> mark <MARK> [trunc <SIZE>]   \
	  		[src <SADDR>] [dst <DADDR>] [type <TYPE>]

Where only ingress qdiscs are supported, and only a combination of
matchall classifier and sample action will lead to activating hardware
packet sampling.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 120 +++++++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  12 +++
 drivers/net/ethernet/mellanox/mlxsw/trap.h     |   1 +
 3 files changed, 125 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index a5433e4..6cbf67c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -57,6 +57,8 @@
 #include <net/pkt_cls.h>
 #include <net/tc_act/tc_mirred.h>
 #include <net/netevent.h>
+#include <net/tc_act/tc_sample.h>
+#include <net/ife.h>
 
 #include "spectrum.h"
 #include "pci.h"
@@ -467,6 +469,16 @@ static void mlxsw_sp_span_mirror_remove(struct mlxsw_sp_port *from,
 	mlxsw_sp_span_inspected_port_unbind(from, span_entry, type);
 }
 
+static int mlxsw_sp_port_sample_set(struct mlxsw_sp_port *mlxsw_sp_port,
+				    bool enable, u32 rate)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	char mpsc_pl[MLXSW_REG_MPSC_LEN];
+
+	mlxsw_reg_mpsc_pack(mpsc_pl, mlxsw_sp_port->local_port, enable, rate);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mpsc), mpsc_pl);
+}
+
 static int mlxsw_sp_port_admin_status_set(struct mlxsw_sp_port *mlxsw_sp_port,
 					  bool is_up)
 {
@@ -1221,6 +1233,49 @@ mlxsw_sp_port_add_cls_matchall_mirror(struct mlxsw_sp_port *mlxsw_sp_port,
 	return err;
 }
 
+static int
+mlxsw_sp_port_add_cls_matchall_sample(struct mlxsw_sp_port *mlxsw_sp_port,
+				      struct tc_cls_matchall_offload *cls,
+				      const struct tc_action *a,
+				      bool ingress)
+{
+	struct mlxsw_sp_port_mall_tc_entry *mall_tc_entry;
+	int sampler_id;
+	int err;
+
+	if (mlxsw_sp_port->sample.enable) {
+		netdev_err(mlxsw_sp_port->dev, "Sample already active\n");
+		return -EEXIST;
+	}
+
+	err = mlxsw_sp_port_sample_set(mlxsw_sp_port, true, tcf_sample_rate(a));
+	if (err)
+		return err;
+
+	sampler_id = tcf_sample_sampler_id(a);
+
+	mlxsw_sp_port->sample.enable = true;
+	mlxsw_sp_port->sample.mark = tcf_sample_mark(a);
+	mlxsw_sp_port->sample.truncate = tcf_sample_truncate(a);
+	mlxsw_sp_port->sample.trunc_size = tcf_sample_trunc_size(a);
+	mlxsw_sp_port->sample.eth_type = tcf_sample_eth_type(a);
+	mlxsw_sp_port->sample.sampler_id = TC_SAMPLE_HW_ID(sampler_id);
+	tcf_sample_eth_dst_addr(a, mlxsw_sp_port->sample.eth_dst);
+	tcf_sample_eth_src_addr(a, mlxsw_sp_port->sample.eth_src);
+
+	netdev_dbg(mlxsw_sp_port->dev, "Activate hardware sample\n");
+
+	mall_tc_entry = kzalloc(sizeof(*mall_tc_entry), GFP_KERNEL);
+	if (!mall_tc_entry)
+		return -ENOMEM;
+
+	mall_tc_entry->cookie = cls->cookie;
+	mall_tc_entry->type = MLXSW_SP_PORT_MALL_SAMPLE;
+	list_add_tail(&mall_tc_entry->list, &mlxsw_sp_port->mall_tc_list);
+
+	return 0;
+}
+
 static int mlxsw_sp_port_add_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port,
 					  __be16 protocol,
 					  struct tc_cls_matchall_offload *cls,
@@ -1236,17 +1291,19 @@ static int mlxsw_sp_port_add_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port,
 	}
 
 	tcf_exts_to_list(cls->exts, &actions);
-	list_for_each_entry(a, &actions, list) {
-		if (!is_tcf_mirred_egress_mirror(a) ||
-		    protocol != htons(ETH_P_ALL)) {
-			return -ENOTSUPP;
-		}
+	a = list_first_entry(&actions, struct tc_action, list);
 
+	if (is_tcf_mirred_egress_mirror(a) && protocol == htons(ETH_P_ALL))
 		err = mlxsw_sp_port_add_cls_matchall_mirror(mlxsw_sp_port, cls,
 							    a, ingress);
-		if (err)
-			return err;
-	}
+	else if (is_tcf_sample(a) && protocol == htons(ETH_P_ALL))
+		err = mlxsw_sp_port_add_cls_matchall_sample(mlxsw_sp_port, cls,
+							    a, ingress);
+	else
+		return -ENOTSUPP;
+
+	if (err)
+		return err;
 
 	return 0;
 }
@@ -1274,6 +1331,10 @@ static void mlxsw_sp_port_del_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port,
 
 		mlxsw_sp_span_mirror_remove(mlxsw_sp_port, to_port, span_type);
 		break;
+	case MLXSW_SP_PORT_MALL_SAMPLE:
+		mlxsw_sp_port->sample.enable = false;
+		mlxsw_sp_port_sample_set(mlxsw_sp_port, false, 1);
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -2774,6 +2835,46 @@ static void mlxsw_sp_rx_listener_mark_func(struct sk_buff *skb, u8 local_port,
 	return mlxsw_sp_rx_listener_func(skb, local_port, priv);
 }
 
+static void mlxsw_sp_rx_listener_sample_func(struct sk_buff *skb, u8 local_port,
+					     void *priv)
+{
+	struct mlxsw_sp *mlxsw_sp = priv;
+	struct mlxsw_sp_port *mlxsw_sp_port = mlxsw_sp->ports[local_port];
+	static struct ethhdr *ethhdr;
+	int orig_size;
+
+	if (unlikely(!mlxsw_sp_port)) {
+		dev_warn_ratelimited(mlxsw_sp->bus_info->dev, "Port %d: sample skb received for non-existent port\n",
+				     local_port);
+		return;
+	}
+
+	skb->dev = mlxsw_sp_port->dev;
+	skb->mac_len = skb->dev->hard_header_len;
+	skb->mark = mlxsw_sp_port->sample.mark;
+
+	orig_size = skb->len;
+
+	if (mlxsw_sp_port->sample.truncate)
+		skb_trim(skb, mlxsw_sp_port->sample.trunc_size);
+
+	ethhdr = ife_packet_info_pack(skb, orig_size, skb->dev->ifindex, 0,
+				      mlxsw_sp_port->sample.sampler_id,
+				      mlxsw_sp_port->sample.seq++);
+	if (!ethhdr)
+		return;
+
+	if (!is_zero_ether_addr(mlxsw_sp_port->sample.eth_src))
+		ether_addr_copy(ethhdr->h_source,
+				mlxsw_sp_port->sample.eth_src);
+	if (!is_zero_ether_addr(mlxsw_sp_port->sample.eth_dst))
+		ether_addr_copy(ethhdr->h_dest, mlxsw_sp_port->sample.eth_dst);
+	ethhdr->h_proto = htons(mlxsw_sp_port->sample.eth_type);
+
+	skb->protocol = eth_type_trans(skb, skb->dev);
+	netif_receive_skb(skb);
+}
+
 #define MLXSW_SP_RXL(_func, _trap_id, _action)			\
 	{							\
 		.func = _func,					\
@@ -2784,6 +2885,9 @@ static void mlxsw_sp_rx_listener_mark_func(struct sk_buff *skb, u8 local_port,
 
 static const struct mlxsw_rx_listener mlxsw_sp_rx_listener[] = {
 	MLXSW_SP_RXL(mlxsw_sp_rx_listener_func, FDB_MC, TRAP_TO_CPU),
+	MLXSW_SP_RXL(mlxsw_sp_rx_listener_sample_func, PKT_SAMPLE,
+		     MIRROR_TO_CPU),
+
 	/* Traps for specific L2 packet types, not trapped as FDB MC */
 	MLXSW_SP_RXL(mlxsw_sp_rx_listener_func, STP, TRAP_TO_CPU),
 	MLXSW_SP_RXL(mlxsw_sp_rx_listener_func, LACP, TRAP_TO_CPU),
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 04a2bc7..7f2e6fc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -229,6 +229,7 @@ struct mlxsw_sp_span_entry {
 
 enum mlxsw_sp_port_mall_action_type {
 	MLXSW_SP_PORT_MALL_MIRROR,
+	MLXSW_SP_PORT_MALL_SAMPLE,
 };
 
 struct mlxsw_sp_port_mall_mirror_tc_entry {
@@ -361,6 +362,17 @@ struct mlxsw_sp_port {
 		struct rtnl_link_stats64 *cache;
 		struct delayed_work update_dw;
 	} hw_stats;
+	struct {
+		bool enable;
+		u32 mark;
+		bool truncate;
+		u32 trunc_size;
+		u8 eth_src[ETH_ALEN];
+		u8 eth_dst[ETH_ALEN];
+		u16 eth_type;
+		u32 sampler_id;
+		u32 seq;
+	} sample;
 };
 
 struct mlxsw_sp_port *mlxsw_sp_port_lower_dev_hold(struct net_device *dev);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/trap.h b/drivers/net/ethernet/mellanox/mlxsw/trap.h
index ed8e301..eebd135 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/trap.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/trap.h
@@ -54,6 +54,7 @@ enum {
 	MLXSW_TRAP_ID_IGMP_V2_REPORT = 0x32,
 	MLXSW_TRAP_ID_IGMP_V2_LEAVE = 0x33,
 	MLXSW_TRAP_ID_IGMP_V3_REPORT = 0x34,
+	MLXSW_TRAP_ID_PKT_SAMPLE = 0x38,
 	MLXSW_TRAP_ID_ARPBC = 0x50,
 	MLXSW_TRAP_ID_ARPUC = 0x51,
 	MLXSW_TRAP_ID_MTUERROR = 0x52,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [patch net-next 1/8] Introduce ife encapsulation module
  2016-11-10 11:23 ` [patch net-next 1/8] Introduce ife encapsulation module Jiri Pirko
@ 2016-11-10 19:17   ` David Miller
  2016-11-10 19:52     ` Yotam Gigi
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2016-11-10 19:17 UTC (permalink / raw)
  To: jiri
  Cc: netdev, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 10 Nov 2016 12:23:01 +0100

> +void *ife_encode(struct sk_buff *skb, u16 metalen)

metalen is u16

> +	metalen = htons(metalen);

htons() returns be16.

> +int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval)
> +{
> +	u32 *tlv = (u32 *) (skbdata);
 ...
> +	*tlv = htonl(htlv);

Similar situation here.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next 5/8] Introduce sample tc action
  2016-11-10 11:23 ` [patch net-next 5/8] Introduce sample tc action Jiri Pirko
@ 2016-11-10 19:35   ` John Fastabend
  2016-11-10 19:38     ` John Fastabend
  0 siblings, 1 reply; 20+ messages in thread
From: John Fastabend @ 2016-11-10 19:35 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

On 16-11-10 03:23 AM, Jiri Pirko wrote:
> From: Yotam Gigi <yotamg@mellanox.com>
> 
> This action allow the user to sample traffic matched by tc classifier.
> The sampling consists of choosing packets randomly, truncating them,
> adding some informative metadata regarding the interface and the original
> packet size and mark them with specific mark, to allow further tc rules to
> match and process. The marked sample packets are then injected into the
> device ingress qdisc using netif_receive_skb.
> 
> The packets metadata is packed using the ife encapsulation protocol, and
> the outer packet's ethernet dest, source and eth_type, along with the
> rate, mark and the optional truncation size can be configured from
> userspace.
> 
> Example:
> To sample ingress traffic from interface eth1, and redirect the sampled
> the sampled packets to interface dummy0, one may use the commands:
> 
> tc qdisc add dev eth1 handle ffff: ingress
> 
> tc filter add dev eth1 parent ffff: \
> 	   matchall action sample rate 12 mark 17
> 
> tc filter add parent ffff: dev eth1 protocol all \
> 	   u32 match mark 17 0xff \
> 	   action mirred egress redirect dev dummy0
> 
> Where the first command adds an ingress qdisc and the second starts
> sampling every 12'th packet on dev eth1 and marks the sampled packets with
> 17. The third command catches the sampled packets, which are marked with
> 17, and redirects them to dev dummy0.

The sampling algorithm was not randomized based on the above commit
log? It really needs to be for all the reasons Roopa mentioned earlier.
Did I miss some email on why it didn't get implemented?

Also there was an indication the already is actually implemented
correctly so don't we need the hw/sw to behave the same. The whole
argument about sw/hw parity, etc.

> 
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next 5/8] Introduce sample tc action
  2016-11-10 19:35   ` John Fastabend
@ 2016-11-10 19:38     ` John Fastabend
  2016-11-10 19:58       ` Yotam Gigi
  2016-11-11  8:28       ` Yotam Gigi
  0 siblings, 2 replies; 20+ messages in thread
From: John Fastabend @ 2016-11-10 19:38 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, eladr, nogahf, ogerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

On 16-11-10 11:35 AM, John Fastabend wrote:
> On 16-11-10 03:23 AM, Jiri Pirko wrote:
>> From: Yotam Gigi <yotamg@mellanox.com>
>>
>> This action allow the user to sample traffic matched by tc classifier.
>> The sampling consists of choosing packets randomly, truncating them,
>> adding some informative metadata regarding the interface and the original
>> packet size and mark them with specific mark, to allow further tc rules to
>> match and process. The marked sample packets are then injected into the
>> device ingress qdisc using netif_receive_skb.
>>
>> The packets metadata is packed using the ife encapsulation protocol, and
>> the outer packet's ethernet dest, source and eth_type, along with the
>> rate, mark and the optional truncation size can be configured from
>> userspace.
>>
>> Example:
>> To sample ingress traffic from interface eth1, and redirect the sampled
>> the sampled packets to interface dummy0, one may use the commands:
>>
>> tc qdisc add dev eth1 handle ffff: ingress
>>
>> tc filter add dev eth1 parent ffff: \
>> 	   matchall action sample rate 12 mark 17
>>
>> tc filter add parent ffff: dev eth1 protocol all \
>> 	   u32 match mark 17 0xff \
>> 	   action mirred egress redirect dev dummy0
>>
>> Where the first command adds an ingress qdisc and the second starts
>> sampling every 12'th packet on dev eth1 and marks the sampled packets with
>> 17. The third command catches the sampled packets, which are marked with
>> 17, and redirects them to dev dummy0.
> 
> The sampling algorithm was not randomized based on the above commit
> log? It really needs to be for all the reasons Roopa mentioned earlier.
> Did I miss some email on why it didn't get implemented?
> 
> Also there was an indication the already is actually implemented
> correctly so don't we need the hw/sw to behave the same. The whole
> argument about sw/hw parity, etc.

sorry bit of a typo there corrected 2nd paragraph here...

Also there was an indication the hardware is already implemented \
correctly so don't we need the hw/sw to behave the same. The argument
about sw/hw parity, etc.

> 
>>
>> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [patch net-next 1/8] Introduce ife encapsulation module
  2016-11-10 19:17   ` David Miller
@ 2016-11-10 19:52     ` Yotam Gigi
  0 siblings, 0 replies; 20+ messages in thread
From: Yotam Gigi @ 2016-11-10 19:52 UTC (permalink / raw)
  To: David Miller, jiri
  Cc: netdev, Ido Schimmel, Elad Raz, Nogah Frankel, Or Gerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net]
>Sent: Thursday, November 10, 2016 9:18 PM
>To: jiri@resnulli.us
>Cc: netdev@vger.kernel.org; Yotam Gigi <yotamg@mellanox.com>; Ido Schimmel
><idosch@mellanox.com>; Elad Raz <eladr@mellanox.com>; Nogah Frankel
><nogahf@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>;
>jhs@mojatatu.com; geert+renesas@glider.be; stephen@networkplumber.org;
>xiyou.wangcong@gmail.com; linux@roeck-us.net; roopa@cumulusnetworks.com
>Subject: Re: [patch net-next 1/8] Introduce ife encapsulation module
>
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Thu, 10 Nov 2016 12:23:01 +0100
>
>> +void *ife_encode(struct sk_buff *skb, u16 metalen)
>
>metalen is u16
>
>> +	metalen = htons(metalen);
>
>htons() returns be16.
>
>> +int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void
>*dval)
>> +{
>> +	u32 *tlv = (u32 *) (skbdata);
> ...
>> +	*tlv = htonl(htlv);
>
>Similar situation here.

I will fix those and we will send a v2.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [patch net-next 5/8] Introduce sample tc action
  2016-11-10 19:38     ` John Fastabend
@ 2016-11-10 19:58       ` Yotam Gigi
  2016-11-10 20:16         ` John Fastabend
  2016-11-11  8:28       ` Yotam Gigi
  1 sibling, 1 reply; 20+ messages in thread
From: Yotam Gigi @ 2016-11-10 19:58 UTC (permalink / raw)
  To: John Fastabend, Jiri Pirko, netdev
  Cc: davem, Ido Schimmel, Elad Raz, Nogah Frankel, Or Gerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa



>-----Original Message-----
>From: John Fastabend [mailto:john.fastabend@gmail.com]
>Sent: Thursday, November 10, 2016 9:38 PM
>To: Jiri Pirko <jiri@resnulli.us>; netdev@vger.kernel.org
>Cc: davem@davemloft.net; Yotam Gigi <yotamg@mellanox.com>; Ido Schimmel
><idosch@mellanox.com>; Elad Raz <eladr@mellanox.com>; Nogah Frankel
><nogahf@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>;
>jhs@mojatatu.com; geert+renesas@glider.be; stephen@networkplumber.org;
>xiyou.wangcong@gmail.com; linux@roeck-us.net; roopa@cumulusnetworks.com
>Subject: Re: [patch net-next 5/8] Introduce sample tc action
>
>On 16-11-10 11:35 AM, John Fastabend wrote:
>> On 16-11-10 03:23 AM, Jiri Pirko wrote:
>>> From: Yotam Gigi <yotamg@mellanox.com>
>>>
>>> This action allow the user to sample traffic matched by tc classifier.
>>> The sampling consists of choosing packets randomly, truncating them,
>>> adding some informative metadata regarding the interface and the original
>>> packet size and mark them with specific mark, to allow further tc rules to
>>> match and process. The marked sample packets are then injected into the
>>> device ingress qdisc using netif_receive_skb.
>>>
>>> The packets metadata is packed using the ife encapsulation protocol, and
>>> the outer packet's ethernet dest, source and eth_type, along with the
>>> rate, mark and the optional truncation size can be configured from
>>> userspace.
>>>
>>> Example:
>>> To sample ingress traffic from interface eth1, and redirect the sampled
>>> the sampled packets to interface dummy0, one may use the commands:
>>>
>>> tc qdisc add dev eth1 handle ffff: ingress
>>>
>>> tc filter add dev eth1 parent ffff: \
>>> 	   matchall action sample rate 12 mark 17
>>>
>>> tc filter add parent ffff: dev eth1 protocol all \
>>> 	   u32 match mark 17 0xff \
>>> 	   action mirred egress redirect dev dummy0
>>>
>>> Where the first command adds an ingress qdisc and the second starts
>>> sampling every 12'th packet on dev eth1 and marks the sampled packets with
>>> 17. The third command catches the sampled packets, which are marked with
>>> 17, and redirects them to dev dummy0.
>>
>> The sampling algorithm was not randomized based on the above commit
>> log? It really needs to be for all the reasons Roopa mentioned earlier.
>> Did I miss some email on why it didn't get implemented?
>>
>> Also there was an indication the already is actually implemented
>> correctly so don't we need the hw/sw to behave the same. The whole
>> argument about sw/hw parity, etc.
>
>sorry bit of a typo there corrected 2nd paragraph here...
>
>Also there was an indication the hardware is already implemented \
>correctly so don't we need the hw/sw to behave the same. The argument
>about sw/hw parity, etc.

Our hardware currently does not support sampling with random behavior, so 
we did implement it in software too. 

But, the API is extensible and it is possible to add a random keyword to 
the tc action to allow random sampling. In that case, the keyword will be
implemented in sw only and our driver will fail offloading it.

>
>>
>>>
>>> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next 5/8] Introduce sample tc action
  2016-11-10 19:58       ` Yotam Gigi
@ 2016-11-10 20:16         ` John Fastabend
  0 siblings, 0 replies; 20+ messages in thread
From: John Fastabend @ 2016-11-10 20:16 UTC (permalink / raw)
  To: Yotam Gigi, Jiri Pirko, netdev
  Cc: davem, Ido Schimmel, Elad Raz, Nogah Frankel, Or Gerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

On 16-11-10 11:58 AM, Yotam Gigi wrote:
> 
> 
>> -----Original Message-----
>> From: John Fastabend [mailto:john.fastabend@gmail.com]
>> Sent: Thursday, November 10, 2016 9:38 PM
>> To: Jiri Pirko <jiri@resnulli.us>; netdev@vger.kernel.org
>> Cc: davem@davemloft.net; Yotam Gigi <yotamg@mellanox.com>; Ido Schimmel
>> <idosch@mellanox.com>; Elad Raz <eladr@mellanox.com>; Nogah Frankel
>> <nogahf@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>;
>> jhs@mojatatu.com; geert+renesas@glider.be; stephen@networkplumber.org;
>> xiyou.wangcong@gmail.com; linux@roeck-us.net; roopa@cumulusnetworks.com
>> Subject: Re: [patch net-next 5/8] Introduce sample tc action
>>
>> On 16-11-10 11:35 AM, John Fastabend wrote:
>>> On 16-11-10 03:23 AM, Jiri Pirko wrote:
>>>> From: Yotam Gigi <yotamg@mellanox.com>
>>>>
>>>> This action allow the user to sample traffic matched by tc classifier.
>>>> The sampling consists of choosing packets randomly, truncating them,
>>>> adding some informative metadata regarding the interface and the original
>>>> packet size and mark them with specific mark, to allow further tc rules to
>>>> match and process. The marked sample packets are then injected into the
>>>> device ingress qdisc using netif_receive_skb.
>>>>
>>>> The packets metadata is packed using the ife encapsulation protocol, and
>>>> the outer packet's ethernet dest, source and eth_type, along with the
>>>> rate, mark and the optional truncation size can be configured from
>>>> userspace.
>>>>
>>>> Example:
>>>> To sample ingress traffic from interface eth1, and redirect the sampled
>>>> the sampled packets to interface dummy0, one may use the commands:
>>>>
>>>> tc qdisc add dev eth1 handle ffff: ingress
>>>>
>>>> tc filter add dev eth1 parent ffff: \
>>>> 	   matchall action sample rate 12 mark 17
>>>>
>>>> tc filter add parent ffff: dev eth1 protocol all \
>>>> 	   u32 match mark 17 0xff \
>>>> 	   action mirred egress redirect dev dummy0
>>>>
>>>> Where the first command adds an ingress qdisc and the second starts
>>>> sampling every 12'th packet on dev eth1 and marks the sampled packets with
>>>> 17. The third command catches the sampled packets, which are marked with
>>>> 17, and redirects them to dev dummy0.
>>>
>>> The sampling algorithm was not randomized based on the above commit
>>> log? It really needs to be for all the reasons Roopa mentioned earlier.
>>> Did I miss some email on why it didn't get implemented?
>>>
>>> Also there was an indication the already is actually implemented
>>> correctly so don't we need the hw/sw to behave the same. The whole
>>> argument about sw/hw parity, etc.
>>
>> sorry bit of a typo there corrected 2nd paragraph here...
>>
>> Also there was an indication the hardware is already implemented \
>> correctly so don't we need the hw/sw to behave the same. The argument
>> about sw/hw parity, etc.
> 
> Our hardware currently does not support sampling with random behavior, so 
> we did implement it in software too. 
> 
> But, the API is extensible and it is possible to add a random keyword to 
> the tc action to allow random sampling. In that case, the keyword will be
> implemented in sw only and our driver will fail offloading it.
> 

For many use cases this will be limiting but OK maybe this is good
enough for something and we can add a flag/attribute to support random
sampling. Works for me.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [patch net-next 5/8] Introduce sample tc action
  2016-11-10 19:38     ` John Fastabend
  2016-11-10 19:58       ` Yotam Gigi
@ 2016-11-11  8:28       ` Yotam Gigi
  2016-11-11 12:43         ` Simon Horman
  1 sibling, 1 reply; 20+ messages in thread
From: Yotam Gigi @ 2016-11-11  8:28 UTC (permalink / raw)
  To: John Fastabend, Jiri Pirko, netdev
  Cc: davem, Ido Schimmel, Elad Raz, Nogah Frankel, Or Gerlitz, jhs,
	geert+renesas, stephen, xiyou.wangcong, linux, roopa

>-----Original Message-----
>From: Yotam Gigi
>Sent: Thursday, November 10, 2016 9:59 PM
>To: 'John Fastabend' <john.fastabend@gmail.com>; Jiri Pirko <jiri@resnulli.us>;
>netdev@vger.kernel.org
>Cc: davem@davemloft.net; Ido Schimmel <idosch@mellanox.com>; Elad Raz
><eladr@mellanox.com>; Nogah Frankel <nogahf@mellanox.com>; Or Gerlitz
><ogerlitz@mellanox.com>; jhs@mojatatu.com; geert+renesas@glider.be;
>stephen@networkplumber.org; xiyou.wangcong@gmail.com; linux@roeck-us.net;
>roopa@cumulusnetworks.com
>Subject: RE: [patch net-next 5/8] Introduce sample tc action
>
>
>
>>-----Original Message-----
>>From: John Fastabend [mailto:john.fastabend@gmail.com]
>>Sent: Thursday, November 10, 2016 9:38 PM
>>To: Jiri Pirko <jiri@resnulli.us>; netdev@vger.kernel.org
>>Cc: davem@davemloft.net; Yotam Gigi <yotamg@mellanox.com>; Ido Schimmel
>><idosch@mellanox.com>; Elad Raz <eladr@mellanox.com>; Nogah Frankel
>><nogahf@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>;
>>jhs@mojatatu.com; geert+renesas@glider.be; stephen@networkplumber.org;
>>xiyou.wangcong@gmail.com; linux@roeck-us.net; roopa@cumulusnetworks.com
>>Subject: Re: [patch net-next 5/8] Introduce sample tc action
>>
>>On 16-11-10 11:35 AM, John Fastabend wrote:
>>> On 16-11-10 03:23 AM, Jiri Pirko wrote:
>>>> From: Yotam Gigi <yotamg@mellanox.com>
>>>>
>>>> This action allow the user to sample traffic matched by tc classifier.
>>>> The sampling consists of choosing packets randomly, truncating them,
>>>> adding some informative metadata regarding the interface and the original
>>>> packet size and mark them with specific mark, to allow further tc rules to
>>>> match and process. The marked sample packets are then injected into the
>>>> device ingress qdisc using netif_receive_skb.
>>>>
>>>> The packets metadata is packed using the ife encapsulation protocol, and
>>>> the outer packet's ethernet dest, source and eth_type, along with the
>>>> rate, mark and the optional truncation size can be configured from
>>>> userspace.
>>>>
>>>> Example:
>>>> To sample ingress traffic from interface eth1, and redirect the sampled
>>>> the sampled packets to interface dummy0, one may use the commands:
>>>>
>>>> tc qdisc add dev eth1 handle ffff: ingress
>>>>
>>>> tc filter add dev eth1 parent ffff: \
>>>> 	   matchall action sample rate 12 mark 17
>>>>
>>>> tc filter add parent ffff: dev eth1 protocol all \
>>>> 	   u32 match mark 17 0xff \
>>>> 	   action mirred egress redirect dev dummy0
>>>>
>>>> Where the first command adds an ingress qdisc and the second starts
>>>> sampling every 12'th packet on dev eth1 and marks the sampled packets with
>>>> 17. The third command catches the sampled packets, which are marked with
>>>> 17, and redirects them to dev dummy0.
>>>
>>> The sampling algorithm was not randomized based on the above commit
>>> log? It really needs to be for all the reasons Roopa mentioned earlier.
>>> Did I miss some email on why it didn't get implemented?
>>>
>>> Also there was an indication the already is actually implemented
>>> correctly so don't we need the hw/sw to behave the same. The whole
>>> argument about sw/hw parity, etc.
>>
>>sorry bit of a typo there corrected 2nd paragraph here...
>>
>>Also there was an indication the hardware is already implemented \
>>correctly so don't we need the hw/sw to behave the same. The argument
>>about sw/hw parity, etc.
>
>Our hardware currently does not support sampling with random behavior, so
>we did implement it in software too.
>
>But, the API is extensible and it is possible to add a random keyword to
>the tc action to allow random sampling. In that case, the keyword will be
>implemented in sw only and our driver will fail offloading it.
>

John, as a result of your question I realized that our hardware does do 
randomized sampling that I was not aware of. I will use the extensibility of
the API and implement a random keyword, that will be offloaded in our 
hardware. Those changes will be sent on v2.

Eventually, your question was very relevant :) Thanks!

>>
>>>
>>>>
>>>> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next 5/8] Introduce sample tc action
  2016-11-11  8:28       ` Yotam Gigi
@ 2016-11-11 12:43         ` Simon Horman
  2016-11-11 14:52           ` John Fastabend
  2016-11-11 16:34           ` Yotam Gigi
  0 siblings, 2 replies; 20+ messages in thread
From: Simon Horman @ 2016-11-11 12:43 UTC (permalink / raw)
  To: Yotam Gigi
  Cc: John Fastabend, Jiri Pirko, netdev, davem, Ido Schimmel,
	Elad Raz, Nogah Frankel, Or Gerlitz, jhs, geert+renesas, stephen,
	xiyou.wangcong, linux, roopa

On Fri, Nov 11, 2016 at 08:28:50AM +0000, Yotam Gigi wrote:

...

> John, as a result of your question I realized that our hardware does do
> randomized sampling that I was not aware of. I will use the extensibility of
> the API and implement a random keyword, that will be offloaded in our
> hardware. Those changes will be sent on v2.
>
> Eventually, your question was very relevant :) Thanks!

Perhaps I am missing the point but why not just make random the default and
implement the inverse as an extension if it turns out to be needed in
future?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next 5/8] Introduce sample tc action
  2016-11-11 12:43         ` Simon Horman
@ 2016-11-11 14:52           ` John Fastabend
  2016-11-11 17:47             ` David Miller
  2016-11-11 16:34           ` Yotam Gigi
  1 sibling, 1 reply; 20+ messages in thread
From: John Fastabend @ 2016-11-11 14:52 UTC (permalink / raw)
  To: Simon Horman, Yotam Gigi
  Cc: Jiri Pirko, netdev, davem, Ido Schimmel, Elad Raz, Nogah Frankel,
	Or Gerlitz, jhs, geert+renesas, stephen, xiyou.wangcong, linux,
	roopa

On 16-11-11 04:43 AM, Simon Horman wrote:
> On Fri, Nov 11, 2016 at 08:28:50AM +0000, Yotam Gigi wrote:
> 
> ...
> 
>> John, as a result of your question I realized that our hardware does do
>> randomized sampling that I was not aware of. I will use the extensibility of
>> the API and implement a random keyword, that will be offloaded in our
>> hardware. Those changes will be sent on v2.
>>
>> Eventually, your question was very relevant :) Thanks!
> 
> Perhaps I am missing the point but why not just make random the default and
> implement the inverse as an extension if it turns out to be needed in
> future?
> 

+1 just implement the random one.

.John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [patch net-next 5/8] Introduce sample tc action
  2016-11-11 12:43         ` Simon Horman
  2016-11-11 14:52           ` John Fastabend
@ 2016-11-11 16:34           ` Yotam Gigi
  1 sibling, 0 replies; 20+ messages in thread
From: Yotam Gigi @ 2016-11-11 16:34 UTC (permalink / raw)
  To: Simon Horman
  Cc: John Fastabend, Jiri Pirko, netdev, davem, Ido Schimmel,
	Elad Raz, Nogah Frankel, Or Gerlitz, jhs, geert+renesas, stephen,
	xiyou.wangcong, linux, roopa

>-----Original Message-----
>From: Simon Horman [mailto:simon.horman@netronome.com]
>Sent: Friday, November 11, 2016 2:44 PM
>To: Yotam Gigi <yotamg@mellanox.com>
>Cc: John Fastabend <john.fastabend@gmail.com>; Jiri Pirko <jiri@resnulli.us>;
>netdev@vger.kernel.org; davem@davemloft.net; Ido Schimmel
><idosch@mellanox.com>; Elad Raz <eladr@mellanox.com>; Nogah Frankel
><nogahf@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>;
>jhs@mojatatu.com; geert+renesas@glider.be; stephen@networkplumber.org;
>xiyou.wangcong@gmail.com; linux@roeck-us.net; roopa@cumulusnetworks.com
>Subject: Re: [patch net-next 5/8] Introduce sample tc action
>
>On Fri, Nov 11, 2016 at 08:28:50AM +0000, Yotam Gigi wrote:
>
>...
>
>> John, as a result of your question I realized that our hardware does do
>> randomized sampling that I was not aware of. I will use the extensibility of
>> the API and implement a random keyword, that will be offloaded in our
>> hardware. Those changes will be sent on v2.
>>
>> Eventually, your question was very relevant :) Thanks!
>
>Perhaps I am missing the point but why not just make random the default and
>implement the inverse as an extension if it turns out to be needed in
>future?

It makes sense. It does seem to me that the average user does prefer random
sampling over deterministic one. 

We will consider that. Thanks for the comment!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next 5/8] Introduce sample tc action
  2016-11-11 14:52           ` John Fastabend
@ 2016-11-11 17:47             ` David Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2016-11-11 17:47 UTC (permalink / raw)
  To: john.fastabend
  Cc: simon.horman, yotamg, jiri, netdev, idosch, eladr, nogahf,
	ogerlitz, jhs, geert+renesas, stephen, xiyou.wangcong, linux,
	roopa

From: John Fastabend <john.fastabend@gmail.com>
Date: Fri, 11 Nov 2016 06:52:31 -0800

> On 16-11-11 04:43 AM, Simon Horman wrote:
>> On Fri, Nov 11, 2016 at 08:28:50AM +0000, Yotam Gigi wrote:
>> 
>> ...
>> 
>>> John, as a result of your question I realized that our hardware does do
>>> randomized sampling that I was not aware of. I will use the extensibility of
>>> the API and implement a random keyword, that will be offloaded in our
>>> hardware. Those changes will be sent on v2.
>>>
>>> Eventually, your question was very relevant :) Thanks!
>> 
>> Perhaps I am missing the point but why not just make random the default and
>> implement the inverse as an extension if it turns out to be needed in
>> future?
>> 
> 
> +1 just implement the random one.

Agreed.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-11-11 17:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-10 11:23 [patch net-next 0/8] Add support for offloading packet-sampling Jiri Pirko
2016-11-10 11:23 ` [patch net-next 1/8] Introduce ife encapsulation module Jiri Pirko
2016-11-10 19:17   ` David Miller
2016-11-10 19:52     ` Yotam Gigi
2016-11-10 11:23 ` [patch net-next 2/8] act_ife: Change to use ife module Jiri Pirko
2016-11-10 11:23 ` [patch net-next 3/8] net: ife: Introduce new metadata tlv types Jiri Pirko
2016-11-10 11:23 ` [patch net-next 4/8] net: ife: Introduce packet info packing method Jiri Pirko
2016-11-10 11:23 ` [patch net-next 5/8] Introduce sample tc action Jiri Pirko
2016-11-10 19:35   ` John Fastabend
2016-11-10 19:38     ` John Fastabend
2016-11-10 19:58       ` Yotam Gigi
2016-11-10 20:16         ` John Fastabend
2016-11-11  8:28       ` Yotam Gigi
2016-11-11 12:43         ` Simon Horman
2016-11-11 14:52           ` John Fastabend
2016-11-11 17:47             ` David Miller
2016-11-11 16:34           ` Yotam Gigi
2016-11-10 11:23 ` [patch net-next 6/8] tc: sample: Add sequence number and sampler_id fields Jiri Pirko
2016-11-10 11:23 ` [patch net-next 7/8] mlxsw: reg: add the Monitoring Packet Sampling Configuration Register Jiri Pirko
2016-11-10 11:23 ` [patch net-next 8/8] mlxsw: packet sample: Add packet sample offloading support Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.