linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] net: openvswitch: Add sample multicasting.
@ 2024-04-24 13:50 Adrian Moreno
  2024-04-24 13:50 ` [PATCH net-next 1/8] net: netlink: export genl private pointer getters Adrian Moreno
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno, dev,
	linux-kernel, linux-kselftest

** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.

A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.

** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.

Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.

Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.

** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.

The main existing user of psample is tc's act_sample. It is also
xtended to forward the action's cookie to psample.

Finally, OVS sample action is extended with a couple of attributes
(OVS_SAMPLE_ATTR_PSAMPLE_{GROUP,COOKIE}) that contain a 32 group_id
and a variable length cookie. When provided, OVS sends the packet
to psample for observability.

In order to make it easier for users to receive samples coming from
a specific source, group_id filtering is added to psample as well
as a tracepoint for troubleshooting.

--
rfc_v2 -> v1:
- Accomodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.

rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.

Adrian Moreno (8):
  net: netlink: export genl private pointer getters
  net: psample: add multicast filtering on group_id
  net: psample: add user cookie
  net: psample: add tracepoint
  net: sched: act_sample: add action cookie to sample
  net:openvswitch: add psample support
  selftests: openvswitch: add sample action.
  selftests: openvswitch: add psample test

 Documentation/netlink/specs/ovs_flow.yaml     |   6 +
 include/net/psample.h                         |   2 +
 include/uapi/linux/openvswitch.h              |  49 ++++-
 include/uapi/linux/psample.h                  |   2 +
 net/netlink/genetlink.c                       |   2 +
 net/openvswitch/actions.c                     |  51 ++++-
 net/openvswitch/flow_netlink.c                |  80 +++++--
 net/psample/psample.c                         | 131 ++++++++++-
 net/psample/trace.h                           |  62 ++++++
 net/sched/act_sample.c                        |  12 +
 .../selftests/net/openvswitch/openvswitch.sh  |  97 +++++++-
 .../selftests/net/openvswitch/ovs-dpctl.py    | 207 +++++++++++++++++-
 12 files changed, 655 insertions(+), 46 deletions(-)
 create mode 100644 net/psample/trace.h

-- 
2.44.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next 1/8] net: netlink: export genl private pointer getters
  2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
@ 2024-04-24 13:50 ` Adrian Moreno
  2024-04-24 13:50 ` [PATCH net-next 2/8] net: psample: add multicast filtering on group_id Adrian Moreno
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Jiri Pirko, Jacob Keller, Johannes Berg, Ido Schimmel,
	Stanislaw Gruszka, linux-kernel

Both "__genl_sk_priv_get" and "genl_sk_priv_get" are useful functions to
handle private pointers attached to genetlink sockets.
Export them so they can be used in modules.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 net/netlink/genetlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index feb54c63a116..beafa415a9f5 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -726,6 +726,7 @@ void *__genl_sk_priv_get(struct genl_family *family, struct sock *sk)
 		return ERR_PTR(-EINVAL);
 	return xa_load(family->sock_privs, (unsigned long) sk);
 }
+EXPORT_SYMBOL(__genl_sk_priv_get);
 
 /**
  * genl_sk_priv_get - Get family private pointer for socket
@@ -764,6 +765,7 @@ void *genl_sk_priv_get(struct genl_family *family, struct sock *sk)
 	}
 	return priv;
 }
+EXPORT_SYMBOL(genl_sk_priv_get);
 
 /**
  * genl_register_family - register a generic netlink family
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 2/8] net: psample: add multicast filtering on group_id
  2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
  2024-04-24 13:50 ` [PATCH net-next 1/8] net: netlink: export genl private pointer getters Adrian Moreno
@ 2024-04-24 13:50 ` Adrian Moreno
  2024-04-24 14:54   ` Jiri Pirko
  2024-04-24 13:50 ` [PATCH net-next 3/8] net: psample: add user cookie Adrian Moreno
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

Packet samples can come from several places (e.g: different tc sample
actions), typically using the sample group (PSAMPLE_ATTR_SAMPLE_GROUP)
to differentiate them.

Likewise, sample consumers that listen on the multicast group may only
be interested on a single group. However, they are currently forced to
receive all samples and discard the ones that are not relevant, causing
unnecessary overhead.

Allow users to filter on the desired group_id by adding a new command
PSAMPLE_SET_FILTER that can be used to pass the desired group id.
Store this filter on the per-socket private pointer and use it for
filtering multicasted samples.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 include/uapi/linux/psample.h |   1 +
 net/psample/psample.c        | 110 +++++++++++++++++++++++++++++++++--
 2 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
index e585db5bf2d2..9d62983af0a4 100644
--- a/include/uapi/linux/psample.h
+++ b/include/uapi/linux/psample.h
@@ -28,6 +28,7 @@ enum psample_command {
 	PSAMPLE_CMD_GET_GROUP,
 	PSAMPLE_CMD_NEW_GROUP,
 	PSAMPLE_CMD_DEL_GROUP,
+	PSAMPLE_CMD_SET_FILTER,
 };
 
 enum psample_tunnel_key_attr {
diff --git a/net/psample/psample.c b/net/psample/psample.c
index a5d9b8446f77..f5f77515b969 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -98,13 +98,77 @@ static int psample_nl_cmd_get_group_dumpit(struct sk_buff *msg,
 	return msg->len;
 }
 
-static const struct genl_small_ops psample_nl_ops[] = {
+struct psample_obj_desc {
+	struct rcu_head rcu;
+	u32 group_num;
+};
+
+struct psample_nl_sock_priv {
+	struct psample_obj_desc __rcu *filter;
+	spinlock_t filter_lock; /* Protects filter. */
+};
+
+static void psample_nl_sock_priv_init(void *priv)
+{
+	struct psample_nl_sock_priv *sk_priv = priv;
+
+	spin_lock_init(&sk_priv->filter_lock);
+}
+
+static void psample_nl_sock_priv_destroy(void *priv)
+{
+	struct psample_nl_sock_priv *sk_priv = priv;
+	struct psample_obj_desc *filter;
+
+	filter = rcu_dereference_protected(sk_priv->filter, true);
+	kfree_rcu(filter, rcu);
+}
+
+static int psample_nl_set_filter_doit(struct sk_buff *skb,
+				      struct genl_info *info)
+{
+	struct psample_obj_desc *filter = NULL;
+	struct psample_nl_sock_priv *sk_priv;
+	struct nlattr **attrs = info->attrs;
+
+	if (attrs[PSAMPLE_ATTR_SAMPLE_GROUP]) {
+		filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+		filter->group_num =
+			nla_get_u32(attrs[PSAMPLE_ATTR_SAMPLE_GROUP]);
+	}
+
+	sk_priv = genl_sk_priv_get(&psample_nl_family, NETLINK_CB(skb).sk);
+	if (IS_ERR(sk_priv)) {
+		kfree(filter);
+		return PTR_ERR(sk_priv);
+	}
+
+	spin_lock(&sk_priv->filter_lock);
+	filter = rcu_replace_pointer(sk_priv->filter, filter,
+				     lockdep_is_held(&sk_priv->filter_lock));
+	spin_unlock(&sk_priv->filter_lock);
+	kfree_rcu(filter, rcu);
+	return 0;
+}
+
+static const struct nla_policy
+psample_set_filter_policy[PSAMPLE_ATTR_SAMPLE_GROUP + 1] = {
+	[PSAMPLE_ATTR_SAMPLE_GROUP] = { .type = NLA_U32, },
+};
+
+static const struct genl_ops psample_nl_ops[] = {
 	{
 		.cmd = PSAMPLE_CMD_GET_GROUP,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
 		.dumpit = psample_nl_cmd_get_group_dumpit,
 		/* can be retrieved by unprivileged users */
-	}
+	},
+	{
+		.cmd		= PSAMPLE_CMD_SET_FILTER,
+		.doit		= psample_nl_set_filter_doit,
+		.policy		= psample_set_filter_policy,
+		.flags		= 0,
+	},
 };
 
 static struct genl_family psample_nl_family __ro_after_init = {
@@ -114,10 +178,13 @@ static struct genl_family psample_nl_family __ro_after_init = {
 	.netnsok	= true,
 	.module		= THIS_MODULE,
 	.mcgrps		= psample_nl_mcgrps,
-	.small_ops	= psample_nl_ops,
-	.n_small_ops	= ARRAY_SIZE(psample_nl_ops),
+	.ops		= psample_nl_ops,
+	.n_ops		= ARRAY_SIZE(psample_nl_ops),
 	.resv_start_op	= PSAMPLE_CMD_GET_GROUP + 1,
 	.n_mcgrps	= ARRAY_SIZE(psample_nl_mcgrps),
+	.sock_priv_size		= sizeof(struct psample_nl_sock_priv),
+	.sock_priv_init		= psample_nl_sock_priv_init,
+	.sock_priv_destroy	= psample_nl_sock_priv_destroy,
 };
 
 static void psample_group_notify(struct psample_group *group,
@@ -360,6 +427,32 @@ static int psample_tunnel_meta_len(struct ip_tunnel_info *tun_info)
 }
 #endif
 
+static inline void psample_nl_obj_desc_init(struct psample_obj_desc *desc,
+					    u32 group_num)
+{
+	memset(desc, 0, sizeof(*desc));
+	desc->group_num = group_num;
+}
+
+static int psample_nl_sample_filter(struct sock *dsk, struct sk_buff *skb,
+				    void *data)
+{
+	struct psample_obj_desc *desc = data;
+	struct psample_nl_sock_priv *sk_priv;
+	struct psample_obj_desc *filter;
+	int ret = 0;
+
+	rcu_read_lock();
+	sk_priv = __genl_sk_priv_get(&psample_nl_family, dsk);
+	if (!IS_ERR_OR_NULL(sk_priv)) {
+		filter = rcu_dereference(sk_priv->filter);
+		if (filter && desc)
+			ret = (filter->group_num != desc->group_num);
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
 void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
 			   u32 sample_rate, const struct psample_metadata *md)
 {
@@ -370,6 +463,7 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
 #ifdef CONFIG_INET
 	struct ip_tunnel_info *tun_info;
 #endif
+	struct psample_obj_desc desc;
 	struct sk_buff *nl_skb;
 	int data_len;
 	int meta_len;
@@ -487,8 +581,12 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
 #endif
 
 	genlmsg_end(nl_skb, data);
-	genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0,
-				PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);
+	psample_nl_obj_desc_init(&desc, group->group_num);
+	genlmsg_multicast_netns_filtered(&psample_nl_family,
+					 group->net, nl_skb, 0,
+					 PSAMPLE_NL_MCGRP_SAMPLE,
+					 GFP_ATOMIC, psample_nl_sample_filter,
+					 &desc);
 
 	return;
 error:
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 3/8] net: psample: add user cookie
  2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
  2024-04-24 13:50 ` [PATCH net-next 1/8] net: netlink: export genl private pointer getters Adrian Moreno
  2024-04-24 13:50 ` [PATCH net-next 2/8] net: psample: add multicast filtering on group_id Adrian Moreno
@ 2024-04-24 13:50 ` Adrian Moreno
  2024-04-25  7:32   ` Ido Schimmel
  2024-04-24 13:50 ` [PATCH net-next 4/8] net: psample: add tracepoint Adrian Moreno
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

Add a user cookie to the sample metadata so that sample emitters can
provide more contextual information to samples.

If present, send the user cookie in a new attribute:
PSAMPLE_ATTR_USER_COOKIE.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 include/net/psample.h        |  2 ++
 include/uapi/linux/psample.h |  1 +
 net/psample/psample.c        | 12 +++++++++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/net/psample.h b/include/net/psample.h
index 0509d2d6be67..2ac71260a546 100644
--- a/include/net/psample.h
+++ b/include/net/psample.h
@@ -25,6 +25,8 @@ struct psample_metadata {
 	   out_tc_occ_valid:1,
 	   latency_valid:1,
 	   unused:5;
+	const u8 *user_cookie;
+	u32 user_cookie_len;
 };
 
 struct psample_group *psample_group_get(struct net *net, u32 group_num);
diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
index 9d62983af0a4..83dc919c4023 100644
--- a/include/uapi/linux/psample.h
+++ b/include/uapi/linux/psample.h
@@ -19,6 +19,7 @@ enum {
 	PSAMPLE_ATTR_LATENCY,		/* u64, nanoseconds */
 	PSAMPLE_ATTR_TIMESTAMP,		/* u64, nanoseconds */
 	PSAMPLE_ATTR_PROTO,		/* u16 */
+	PSAMPLE_ATTR_USER_COOKIE,	/* binary, user provided data */
 
 	__PSAMPLE_ATTR_MAX
 };
diff --git a/net/psample/psample.c b/net/psample/psample.c
index f5f77515b969..476aaad7a885 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -480,7 +480,8 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
 		   nla_total_size(sizeof(u32)) +	/* group_num */
 		   nla_total_size(sizeof(u32)) +	/* seq */
 		   nla_total_size_64bit(sizeof(u64)) +	/* timestamp */
-		   nla_total_size(sizeof(u16));		/* protocol */
+		   nla_total_size(sizeof(u16)) +	/* protocol */
+		   nla_total_size(md->user_cookie_len);	/* user_cookie */
 
 #ifdef CONFIG_INET
 	tun_info = skb_tunnel_info(skb);
@@ -579,6 +580,15 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
 			goto error;
 	}
 #endif
+	if (md->user_cookie && md->user_cookie_len) {
+		int nla_len = nla_total_size(md->user_cookie_len);
+		struct nlattr *nla;
+
+		nla = skb_put(nl_skb, nla_len);
+		nla->nla_type = PSAMPLE_ATTR_USER_COOKIE;
+		nla->nla_len = nla_attr_size(md->user_cookie_len);
+		memcpy(nla_data(nla), md->user_cookie, md->user_cookie_len);
+	}
 
 	genlmsg_end(nl_skb, data);
 	psample_nl_obj_desc_init(&desc, group->group_num);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 4/8] net: psample: add tracepoint
  2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
                   ` (2 preceding siblings ...)
  2024-04-24 13:50 ` [PATCH net-next 3/8] net: psample: add user cookie Adrian Moreno
@ 2024-04-24 13:50 ` Adrian Moreno
  2024-04-25  7:18   ` Ido Schimmel
  2024-04-24 13:50 ` [PATCH net-next 5/8] net: sched: act_sample: add action cookie to sample Adrian Moreno
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

Currently there are no widely-available tools to dump the metadata and
group information when a packet is sampled, making it difficult to
troubleshoot related issues.

This makes psample use the event tracing framework to log the sampling
of a packet so that it's easier to quickly identify the source
(i.e: group) and context (i.e: metadata) of a packet being sampled.

This patch creates some checkpatch splats, but the style of the
tracepoint definition mimics that of other modules so it seems
acceptable.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 net/psample/psample.c |  9 +++++++
 net/psample/trace.h   | 62 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)
 create mode 100644 net/psample/trace.h

diff --git a/net/psample/psample.c b/net/psample/psample.c
index 476aaad7a885..92db8ffa2ba2 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -18,6 +18,12 @@
 #include <net/ip_tunnels.h>
 #include <net/dst_metadata.h>
 
+#ifndef __CHECKER__
+#define CREATE_TRACE_POINTS
+#endif
+
+#include "trace.h"
+
 #define PSAMPLE_MAX_PACKET_SIZE 0xffff
 
 static LIST_HEAD(psample_groups_list);
@@ -470,6 +476,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
 	void *data;
 	int ret;
 
+	if (trace_psample_sample_packet_enabled())
+		trace_psample_sample_packet(group, skb, sample_rate, md);
+
 	meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
 		   (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
 		   (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +
diff --git a/net/psample/trace.h b/net/psample/trace.h
new file mode 100644
index 000000000000..2d32a846989b
--- /dev/null
+++ b/net/psample/trace.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM psample
+
+#if !defined(_TRACE_PSAMPLE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PSAMPLE__H
+
+#include <linux/skbuff.h>
+#include <linux/tracepoint.h>
+#include <net/psample.h>
+
+TRACE_EVENT(psample_sample_packet,
+
+	TP_PROTO(struct psample_group *group, struct sk_buff *skb,
+		 u32 sample_rate, const struct psample_metadata *md),
+
+	TP_ARGS(group, skb, sample_rate, md),
+
+	TP_STRUCT__entry(
+		__field(u32, group_num)
+		__field(u32, refcount)
+		__field(u32, seq)
+		__field(void *, skbaddr)
+		__field(unsigned int, len)
+		__field(unsigned int, data_len)
+		__field(u32, sample_rate)
+		__field(int, in_ifindex)
+		__field(int, out_ifindex)
+		__field(const void *, user_cookie)
+		__field(u32, user_cookie_len)
+	),
+
+	TP_fast_assign(
+		__entry->group_num = group->group_num;
+		__entry->refcount = group->refcount;
+		__entry->seq = group->seq;
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__entry->data_len = skb->data_len;
+		__entry->sample_rate = sample_rate;
+		__entry->in_ifindex = md->in_ifindex;
+		__entry->out_ifindex = md->out_ifindex;
+		__entry->user_cookie = &md->user_cookie[0];
+		__entry->user_cookie_len = md->user_cookie_len;
+	),
+
+	TP_printk("group_num=%u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%p user_cookie_len=%u",
+		  __entry->group_num, __entry->refcount, __entry->seq,
+		  __entry->skbaddr, __entry->len, __entry->data_len,
+		  __entry->sample_rate, __entry->in_ifindex,
+		  __entry->out_ifindex, __entry->user_cookie,
+		  __entry->user_cookie_len)
+);
+
+#endif /* _TRACE_PSAMPLE_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ../../net/psample
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 5/8] net: sched: act_sample: add action cookie to sample
  2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
                   ` (3 preceding siblings ...)
  2024-04-24 13:50 ` [PATCH net-next 4/8] net: psample: add tracepoint Adrian Moreno
@ 2024-04-24 13:50 ` Adrian Moreno
  2024-04-25  7:39   ` Ido Schimmel
  2024-04-25 21:43   ` Jamal Hadi Salim
  2024-04-24 13:50 ` [PATCH net-next 6/8] net:openvswitch: add psample support Adrian Moreno
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

If the action has a user_cookie, pass it along to the sample so it can
be easily identified.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 net/sched/act_sample.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index a69b53d54039..5c3f86ec964a 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -165,9 +165,11 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
 				     const struct tc_action *a,
 				     struct tcf_result *res)
 {
+	u8 cookie_data[TC_COOKIE_MAX_SIZE] = {};
 	struct tcf_sample *s = to_sample(a);
 	struct psample_group *psample_group;
 	struct psample_metadata md = {};
+	struct tc_cookie *user_cookie;
 	int retval;
 
 	tcf_lastuse_update(&s->tcf_tm);
@@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
 		if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev))
 			skb_push(skb, skb->mac_len);
 
+		rcu_read_lock();
+		user_cookie = rcu_dereference(a->user_cookie);
+		if (user_cookie) {
+			memcpy(cookie_data, user_cookie->data,
+			       user_cookie->len);
+			md.user_cookie = cookie_data;
+			md.user_cookie_len = user_cookie->len;
+		}
+		rcu_read_unlock();
+
 		md.trunc_size = s->truncate ? s->trunc_size : skb->len;
 		psample_sample_packet(psample_group, skb, s->rate, &md);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 6/8] net:openvswitch: add psample support
  2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
                   ` (4 preceding siblings ...)
  2024-04-24 13:50 ` [PATCH net-next 5/8] net: sched: act_sample: add action cookie to sample Adrian Moreno
@ 2024-04-24 13:50 ` Adrian Moreno
  2024-04-30  7:29   ` Dan Carpenter
  2024-05-03  9:43   ` Eelco Chaudron
  2024-04-24 13:50 ` [PATCH net-next 7/8] selftests: openvswitch: add sample action Adrian Moreno
  2024-04-24 13:50 ` [PATCH net-next 8/8] selftests: openvswitch: add psample test Adrian Moreno
  7 siblings, 2 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Pravin B Shelar, Donald Hunter, linux-kernel, dev

Add support for psample sampling via two new attributes to the
OVS_ACTION_ATTR_SAMPLE action.

OVS_SAMPLE_ATTR_PSAMPLE_GROUP used to pass an integer psample group_id.
OVS_SAMPLE_ATTR_PSAMPLE_COOKIE used to pass a variable-length binary
cookie that will be forwared to psample.

The maximum length of the user-defined cookie is set to 16, same as
tc_cookie, to discourage using cookies that will not be offloadable.

In order to simplify the internal processing of the action and given the
maximum size of the cookie is relatively small, add both fields to the
internal-only struct sample_arg.

The presence of a group_id mandates that the action shall called the
psample module to multicast the packet with such group_id and the
user-provided cookie if present. This behavior is orthonogal to
also executing the nested actions if present.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 Documentation/netlink/specs/ovs_flow.yaml |  6 ++
 include/uapi/linux/openvswitch.h          | 49 ++++++++++----
 net/openvswitch/actions.c                 | 51 +++++++++++++--
 net/openvswitch/flow_netlink.c            | 80 ++++++++++++++++++-----
 4 files changed, 153 insertions(+), 33 deletions(-)

diff --git a/Documentation/netlink/specs/ovs_flow.yaml b/Documentation/netlink/specs/ovs_flow.yaml
index 4fdfc6b5cae9..5543c2937225 100644
--- a/Documentation/netlink/specs/ovs_flow.yaml
+++ b/Documentation/netlink/specs/ovs_flow.yaml
@@ -825,6 +825,12 @@ attribute-sets:
         name: actions
         type: nest
         nested-attributes: action-attrs
+      -
+        name: psample_group
+        type: u32
+      -
+        name: psample_cookie
+        type: binary
   -
     name: userspace-attrs
     enum-name: ovs-userspace-attr
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index efc82c318fa2..e9cd6f3a952d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -639,6 +639,7 @@ enum ovs_flow_attr {
 #define OVS_UFID_F_OMIT_MASK     (1 << 1)
 #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
 
+#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
 /**
  * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
  * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
@@ -646,15 +647,27 @@ enum ovs_flow_attr {
  * %UINT32_MAX samples all packets and intermediate values sample intermediate
  * fractions of packets.
  * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
- * Actions are passed as nested attributes.
+ * Actions are passed as nested attributes. Optional if
+ * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
+ * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
+ * provided, will be copied to the psample cookie.
  *
- * Executes the specified actions with the given probability on a per-packet
- * basis.
+ * Either OVS_SAMPLE_ATTR_PSAMPLE_GROUP or OVS_SAMPLE_ATTR_ACTIONS must be
+ * specified.
+ *
+ * Executes the specified actions and/or sends the packet to psample
+ * with the given probability on a per-packet basis.
  */
 enum ovs_sample_attr {
 	OVS_SAMPLE_ATTR_UNSPEC,
-	OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
-	OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
+	OVS_SAMPLE_ATTR_PROBABILITY,	/* u32 number */
+	OVS_SAMPLE_ATTR_ACTIONS,	/* Nested OVS_ACTION_ATTR_
+					 * attributes.
+					 */
+	OVS_SAMPLE_ATTR_PSAMPLE_GROUP,	/* u32 number */
+	OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,	/* binary */
 	__OVS_SAMPLE_ATTR_MAX,
 
 #ifdef __KERNEL__
@@ -665,13 +678,27 @@ enum ovs_sample_attr {
 #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
 
 #ifdef __KERNEL__
+
+/* Definition for flags in struct sample_arg. */
+enum {
+	/* When set, actions in sample will not change the flows. */
+	OVS_SAMPLE_ARG_FLAG_EXEC = 1 << 0,
+	/* When set, the packet will be sent to psample. */
+	OVS_SAMPLE_ARG_FLAG_PSAMPLE = 1 << 1,
+};
+
 struct sample_arg {
-	bool exec;                   /* When true, actions in sample will not
-				      * change flow keys. False otherwise.
-				      */
-	u32  probability;            /* Same value as
-				      * 'OVS_SAMPLE_ATTR_PROBABILITY'.
-				      */
+	u16 flags;		/* Flags that modify the behavior of the
+				 * action. See SAMPLE_ARG_FLAG_*.
+				 */
+	u32  probability;       /* Same value as
+				 * 'OVS_SAMPLE_ATTR_PROBABILITY'.
+				 */
+	u32  group_id;		/* Same value as
+				 * 'OVS_SAMPLE_ATTR_PSAMPLE_GROUP'.
+				 */
+	u8  cookie_len;		/* Length of psample cookie. */
+	char cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE]; /* psample cookie data. */
 };
 #endif
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 6fcd7e2ca81f..eb3166986fd2 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -24,6 +24,7 @@
 #include <net/checksum.h>
 #include <net/dsfield.h>
 #include <net/mpls.h>
+#include <net/psample.h>
 #include <net/sctp/checksum.h>
 
 #include "datapath.h"
@@ -1025,6 +1026,34 @@ static int dec_ttl_exception_handler(struct datapath *dp, struct sk_buff *skb,
 	return 0;
 }
 
+static int ovs_psample_packet(struct datapath *dp, struct sw_flow_key *key,
+			      const struct sample_arg *arg,
+			      struct sk_buff *skb)
+{
+	struct psample_group psample_group = {};
+	struct psample_metadata md = {};
+	struct vport *input_vport;
+	u32 rate;
+
+	psample_group.group_num = arg->group_id;
+	psample_group.net = ovs_dp_get_net(dp);
+
+	input_vport = ovs_vport_rcu(dp, key->phy.in_port);
+	if (!input_vport)
+		input_vport = ovs_vport_rcu(dp, OVSP_LOCAL);
+
+	md.in_ifindex = input_vport->dev->ifindex;
+	md.user_cookie = arg->cookie_len ? &arg->cookie[0] : NULL;
+	md.user_cookie_len = arg->cookie_len;
+	md.trunc_size = skb->len;
+
+	rate = arg->probability ? U32_MAX / arg->probability : 0;
+
+	psample_sample_packet(&psample_group, skb, rate, &md);
+
+	return 0;
+}
+
 /* When 'last' is true, sample() should always consume the 'skb'.
  * Otherwise, sample() should keep 'skb' intact regardless what
  * actions are executed within sample().
@@ -1033,11 +1062,12 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
 		  struct sw_flow_key *key, const struct nlattr *attr,
 		  bool last)
 {
-	struct nlattr *actions;
+	const struct sample_arg *arg;
 	struct nlattr *sample_arg;
 	int rem = nla_len(attr);
-	const struct sample_arg *arg;
+	struct nlattr *actions;
 	bool clone_flow_key;
+	int ret;
 
 	/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
 	sample_arg = nla_data(attr);
@@ -1051,9 +1081,20 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
 		return 0;
 	}
 
-	clone_flow_key = !arg->exec;
-	return clone_execute(dp, skb, key, 0, actions, rem, last,
-			     clone_flow_key);
+	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
+		ret = ovs_psample_packet(dp, key, arg, skb);
+		if (ret)
+			return ret;
+	}
+
+	if (nla_ok(actions, rem)) {
+		clone_flow_key = !(arg->flags & OVS_SAMPLE_ARG_FLAG_EXEC);
+		ret = clone_execute(dp, skb, key, 0, actions, rem, last,
+				    clone_flow_key);
+	} else if (last) {
+		ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
+	}
+	return ret;
 }
 
 /* When 'last' is true, clone() should always consume the 'skb'.
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index f224d9bcea5e..1a348d3905fc 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2561,6 +2561,9 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 				  u32 mpls_label_count, bool log,
 				  u32 depth);
 
+static int copy_action(const struct nlattr *from,
+		       struct sw_flow_actions **sfa, bool log);
+
 static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 				    const struct sw_flow_key *key,
 				    struct sw_flow_actions **sfa,
@@ -2569,10 +2572,10 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 				    u32 depth)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
-	const struct nlattr *probability, *actions;
+	const struct nlattr *probability, *actions, *group, *cookie;
+	struct sample_arg arg = {};
 	const struct nlattr *a;
 	int rem, start, err;
-	struct sample_arg arg;
 
 	memset(attrs, 0, sizeof(attrs));
 	nla_for_each_nested(a, attr, rem) {
@@ -2589,7 +2592,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 		return -EINVAL;
 
 	actions = attrs[OVS_SAMPLE_ATTR_ACTIONS];
-	if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN))
+	if (actions && (!nla_len(actions) || nla_len(actions) < NLA_HDRLEN))
+		return -EINVAL;
+
+	group = attrs[OVS_SAMPLE_ATTR_PSAMPLE_GROUP];
+	if (group && nla_len(group) != sizeof(u32))
+		return -EINVAL;
+
+	cookie = attrs[OVS_SAMPLE_ATTR_PSAMPLE_COOKIE];
+	if (cookie &&
+	    (!group || nla_len(cookie) > OVS_PSAMPLE_COOKIE_MAX_SIZE))
+		return -EINVAL;
+
+	if (!group && !actions)
 		return -EINVAL;
 
 	/* validation done, copy sample action. */
@@ -2608,7 +2623,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 	 * If the sample is the last action, it can always be excuted
 	 * rather than deferred.
 	 */
-	arg.exec = last || !actions_may_change_flow(actions);
+	if (actions && (last || !actions_may_change_flow(actions)))
+		arg.flags |= OVS_SAMPLE_ARG_FLAG_EXEC;
+
+	if (group) {
+		arg.flags |= OVS_SAMPLE_ARG_FLAG_PSAMPLE;
+		arg.group_id = nla_get_u32(group);
+	}
+
+	if (cookie) {
+		memcpy(&arg.cookie[0], nla_data(cookie), nla_len(cookie));
+		arg.cookie_len = nla_len(cookie);
+	}
+
 	arg.probability = nla_get_u32(probability);
 
 	err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg),
@@ -2616,12 +2643,13 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 	if (err)
 		return err;
 
-	err = __ovs_nla_copy_actions(net, actions, key, sfa,
-				     eth_type, vlan_tci, mpls_label_count, log,
-				     depth + 1);
-
-	if (err)
-		return err;
+	if (actions) {
+		err = __ovs_nla_copy_actions(net, actions, key, sfa,
+					     eth_type, vlan_tci,
+					     mpls_label_count, log, depth + 1);
+		if (err)
+			return err;
+	}
 
 	add_nested_action_end(*sfa, start);
 
@@ -3553,20 +3581,38 @@ static int sample_action_to_attr(const struct nlattr *attr,
 		goto out;
 	}
 
-	ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
-	if (!ac_start) {
-		err = -EMSGSIZE;
-		goto out;
+	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
+		if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PSAMPLE_GROUP,
+				arg->group_id)) {
+			err = -EMSGSIZE;
+			goto out;
+		}
+
+		if (arg->cookie_len &&
+		    nla_put(skb, OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,
+			    arg->cookie_len, &arg->cookie[0])) {
+			err = -EMSGSIZE;
+			goto out;
+		}
 	}
 
-	err = ovs_nla_put_actions(actions, rem, skb);
+	if (nla_ok(actions, rem)) {
+		ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
+		if (!ac_start) {
+			err = -EMSGSIZE;
+			goto out;
+		}
+		err = ovs_nla_put_actions(actions, rem, skb);
+	}
 
 out:
 	if (err) {
-		nla_nest_cancel(skb, ac_start);
+		if (ac_start)
+			nla_nest_cancel(skb, ac_start);
 		nla_nest_cancel(skb, start);
 	} else {
-		nla_nest_end(skb, ac_start);
+		if (ac_start)
+			nla_nest_end(skb, ac_start);
 		nla_nest_end(skb, start);
 	}
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 7/8] selftests: openvswitch: add sample action.
  2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
                   ` (5 preceding siblings ...)
  2024-04-24 13:50 ` [PATCH net-next 6/8] net:openvswitch: add psample support Adrian Moreno
@ 2024-04-24 13:50 ` Adrian Moreno
  2024-04-24 13:50 ` [PATCH net-next 8/8] selftests: openvswitch: add psample test Adrian Moreno
  7 siblings, 0 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno,
	Pravin B Shelar, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, dev, linux-kselftest, linux-kernel

Add sample action support to ovs-dpctl.py.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 .../selftests/net/openvswitch/ovs-dpctl.py    | 96 ++++++++++++++++++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 1dd057afd3fb..3a2dddc57e42 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -8,6 +8,7 @@ import argparse
 import errno
 import ipaddress
 import logging
+import math
 import multiprocessing
 import re
 import struct
@@ -58,6 +59,7 @@ OVS_FLOW_CMD_DEL = 2
 OVS_FLOW_CMD_GET = 3
 OVS_FLOW_CMD_SET = 4
 
+UINT32_MAX = 0xFFFFFFFF
 
 def macstr(mac):
     outstr = ":".join(["%02X" % i for i in mac])
@@ -285,7 +287,7 @@ class ovsactions(nla):
         ("OVS_ACTION_ATTR_SET", "none"),
         ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
         ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
-        ("OVS_ACTION_ATTR_SAMPLE", "none"),
+        ("OVS_ACTION_ATTR_SAMPLE", "sample"),
         ("OVS_ACTION_ATTR_RECIRC", "uint32"),
         ("OVS_ACTION_ATTR_HASH", "none"),
         ("OVS_ACTION_ATTR_PUSH_MPLS", "none"),
@@ -306,6 +308,91 @@ class ovsactions(nla):
         ("OVS_ACTION_ATTR_DROP", "uint32"),
     )
 
+    class sample(nla):
+        nla_flags = NLA_F_NESTED
+
+        nla_map = (
+            ("OVS_SAMPLE_ATTR_UNSPEC", "none"),
+            ("OVS_SAMPLE_ATTR_PROBABILITY", "uint32"),
+            ("OVS_SAMPLE_ATTR_ACTIONS", "ovsactions"),
+            ("OVS_SAMPLE_ATTR_PSAMPLE_GROUP", "uint32"),
+            ("OVS_SAMPLE_ATTR_PSAMPLE_COOKIE", "array(uint8)"),
+        )
+
+        def dpstr(self, more=False):
+            args = []
+
+            args.append("sample={:.2f}%".format(
+                100 * self.get_attr("OVS_SAMPLE_ATTR_PROBABILITY") /
+                UINT32_MAX))
+
+            group = self.get_attr("OVS_SAMPLE_ATTR_PSAMPLE_GROUP")
+            cookie = self.get_attr("OVS_SAMPLE_ATTR_PSAMPLE_COOKIE")
+            actions = self.get_attr("OVS_SAMPLE_ATTR_ACTIONS")
+
+            if group:
+                args.append("group_id=%d" % group)
+            if cookie:
+                args.append("cookie=%s" %
+                            "".join(format(x, "02x") for x in cookie))
+            if actions:
+                args.append("actions(%s)" % actions.dpstr(more))
+
+            return "sample(%s)" % ",".join(args)
+
+        def parse(self, actstr):
+            """ Parses the input action string and populates the internal
+            attributes. The input string must start with "sample("
+
+            Returns the remaining action string.
+            Raises ValueError if the action string has invalid content.
+            """
+
+            def parse_nested_actions(actstr):
+                subacts = ovsactions()
+                parsed_len = subacts.parse(actstr)
+                return subacts, parsed_len
+
+            def percent_to_rate(percent):
+                percent = float(percent.strip('%'))
+                return int(math.floor(UINT32_MAX * (percent / 100.0) + .5))
+
+            for (key, attr, func) in (
+                ("sample", "OVS_SAMPLE_ATTR_PROBABILITY", percent_to_rate),
+                ("group_id", "OVS_SAMPLE_ATTR_PSAMPLE_GROUP", int),
+                ("cookie", "OVS_SAMPLE_ATTR_PSAMPLE_COOKIE",
+                    lambda x: list(bytearray.fromhex(x))),
+                ("actions", "OVS_SAMPLE_ATTR_ACTIONS", parse_nested_actions),
+            ):
+                if not actstr.startswith(key):
+                    continue
+
+                actstr = actstr[len(key) :]
+
+                if not func:
+                    self["attrs"].append([attr, None])
+                    continue
+
+                # The length of complex attributes cannot be determined
+                # beforehand and must be reported by the parsing func.
+                delim = actstr[0]
+                actstr = actstr[1:]
+                if delim == "=":
+                    pos = strcspn(actstr, ",)")
+                    datum = func(actstr[:pos])
+                elif delim == "(":
+                    datum, pos = func(actstr)
+
+                self["attrs"].append([attr, datum])
+                actstr = actstr[pos:]
+                actstr = actstr[strspn(actstr, ", ") :]
+
+            if actstr[0] != ")":
+                raise ValueError("Action str: '%s' unbalanced" % actstr)
+
+            return actstr[1:]
+
+
     class ctact(nla):
         nla_flags = NLA_F_NESTED
 
@@ -637,6 +724,13 @@ class ovsactions(nla):
                 self["attrs"].append(["OVS_ACTION_ATTR_CT", ctact])
                 parsed = True
 
+            elif parse_starts_block(actstr, "sample(", False):
+                sampleact = self.sample()
+                actstr = sampleact.parse(actstr[len("sample(") : ])
+                self["attrs"].append(["OVS_ACTION_ATTR_SAMPLE", sampleact])
+                parsed = True
+
+
             actstr = actstr[strspn(actstr, ", ") :]
             while parencount > 0:
                 parencount -= 1
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next 8/8] selftests: openvswitch: add psample test
  2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
                   ` (6 preceding siblings ...)
  2024-04-24 13:50 ` [PATCH net-next 7/8] selftests: openvswitch: add sample action Adrian Moreno
@ 2024-04-24 13:50 ` Adrian Moreno
  7 siblings, 0 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-04-24 13:50 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, horms, i.maximets, Adrian Moreno,
	Pravin B Shelar, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, dev, linux-kselftest, linux-kernel

Add a test to verify sampling packets via psample works.

In order to do that, create a subcommand in ovs-dpctl.py to listen to
on the psample multicast group and print samples.

In order to also test simultaneous sFlow and psample actions, add
missing parsing support for "userspace" action (via refactoring the one
in sample).

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 .../selftests/net/openvswitch/openvswitch.sh  |  97 +++++++++-
 .../selftests/net/openvswitch/ovs-dpctl.py    | 167 ++++++++++++++----
 2 files changed, 231 insertions(+), 33 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 5cae53543849..7a2307a384a9 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -20,7 +20,8 @@ tests="
 	nat_related_v4				ip4-nat-related: ICMP related matches work with SNAT
 	netlink_checks				ovsnl: validate netlink attrs and settings
 	upcall_interfaces			ovs: test the upcall interfaces
-	drop_reason				drop: test drop reasons are emitted"
+	drop_reason				drop: test drop reasons are emitted
+	psample					psample: Sampling packets with psample"
 
 info() {
     [ $VERBOSE = 0 ] || echo $*
@@ -170,6 +171,19 @@ ovs_drop_reason_count()
 	return `echo "$perf_output" | grep "$pattern" | wc -l`
 }
 
+ovs_test_flow_fails () {
+	ERR_MSG="Flow actions may not be safe on all matching packets"
+
+	PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
+    ovs_add_flow $@ &> /dev/null $@ && return 1
+	POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
+
+	if [ "$PRE_TEST" == "$POST_TEST" ]; then
+		return 1
+	fi
+    return 0
+}
+
 usage() {
 	echo
 	echo "$0 [OPTIONS] [TEST]..."
@@ -184,6 +198,87 @@ usage() {
 	exit 1
 }
 
+
+# psample test
+# - samples packets with psample
+test_psample() {
+	sbx_add "test_psample" || return $?
+
+	# Add a datapath with per-vport dispatching.
+	ovs_add_dp "test_psample" psample -V 2:1 || return 1
+
+	info "create namespaces"
+	ovs_add_netns_and_veths "test_psample" "psample" \
+		client c0 c1 172.31.110.10/24 -u || return 1
+	ovs_add_netns_and_veths "test_psample" "psample" \
+		server s0 s1 172.31.110.20/24 -u || return 1
+
+	# Check if psample actions can be configured.
+	ovs_add_flow "test_psample" psample \
+	'in_port(1),eth(),eth_type(0x0806),arp()' 'sample(sample=100%,group_id=1,cookie=0102)'
+	if [ $? == 1 ]; then
+		info "no support for psample - skipping"
+		ovs_exit_sig
+		return $ksft_skip
+	fi
+
+	ovs_del_flows "test_psample" psample
+
+	# Allow ARP
+	ovs_add_flow "test_psample" psample \
+		'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
+	ovs_add_flow "test_psample" psample \
+		'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
+
+    # Test action verification.
+	OLDIFS=$IFS
+	IFS='*'
+	min_key='in_port(1),eth(),eth_type(0x800),ipv4()'
+	for testcase in \
+		"cookie to large"*"sample(sample=100%,group_id=1,cookie=1615141312111009080706050403020100)" \
+		"no group or action"*"sample(sample=100%)" \
+		"no group or action with cookie"*"sample(sample=100%,cookie=deadbeef)";
+	do
+		set -- $testcase;
+		ovs_test_flow_fails "test_psample" psample $min_key $2
+		if [ $? == 1 ]; then
+			info "failed - $1"
+			return 1
+		fi
+	done
+	IFS=$OLDIFS
+
+	# Sample all traffic. In this case the sample action only has psample
+	# arguments.
+	ovs_add_flow "test_psample" psample \
+	"in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10,proto=1),icmp()" "sample(sample=100%,group_id=1,cookie=c0ffee),2"
+
+	# Sample all traffic. In this case the sample action has both psample
+	# arguments and an upcall emulating simultaneous psample and
+	# sFlow / IPFIX.
+	nlpid=$(grep -E "listening on upcall packet handler" $ovs_dir/s0.out | cut -d ":" -f 2 | tr -d ' ')
+	ovs_add_flow "test_psample" psample \
+	"in_port(2),eth(),eth_type(0x0800),ipv4(src=172.31.110.20,proto=1),icmp()" "sample(sample=100%,group_id=2,cookie=eeff0c,actions(userspace(pid=${nlpid},userdata=eeff0c))),1"
+
+	# Record psample data.
+	python3 $ovs_base/ovs-dpctl.py psample  >$ovs_dir/psample.out 2>$ovs_dir/psample.err &
+	pid=$!
+	on_exit "ovs_sbx test_psample kill -TERM $pid 2>/dev/null"
+
+	# Send a single ping.
+	sleep 1
+	ovs_sbx "test_psample" ip netns exec client ping -I c1 172.31.110.20 -c 1 || return 1
+	sleep 1
+
+	# We should have received one userspace action upcall and 2 psample packets.
+	grep -E "userspace action command" $ovs_dir/s0.out >/dev/null 2>&1 || return 1
+
+	grep -E "rate:1,group:1,cookie:c0ffee" $ovs_dir/psample.out >/dev/null 2>&1 || return 1
+	grep -E "rate:1,group:2,cookie:eeff0c" $ovs_dir/psample.out >/dev/null 2>&1 || return 1
+
+	return 0
+}
+
 # drop_reason test
 # - drop packets and verify the right drop reason is reported
 test_drop_reason() {
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 3a2dddc57e42..2fb5bcfe9c36 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -27,8 +27,10 @@ try:
     from pyroute2.netlink import genlmsg
     from pyroute2.netlink import nla
     from pyroute2.netlink import nlmsg_atoms
-    from pyroute2.netlink.exceptions import NetlinkError
+    from pyroute2.netlink.event import EventSocket
     from pyroute2.netlink.generic import GenericNetlinkSocket
+    from pyroute2.netlink.nlsocket import Marshal
+    from pyroute2.netlink.exceptions import NetlinkError
     import pyroute2
 
 except ModuleNotFoundError:
@@ -269,6 +271,47 @@ def parse_extract_field(
     return str_skipped, data
 
 
+def parse_attributes(actstr, attributes):
+    """Parses actstr according to attribute description. attributes must be
+    a list of tuples (name, attribute, parse_func), e.g:
+        ("pid", OVS_USERSPACE_ATTR_PID, int)
+
+    Returns a list of parsed attributes followed by the remaining string.
+    """
+    attrs = []
+    for (key, attr, func) in attributes:
+        if not actstr.startswith(key):
+            continue
+
+        actstr = actstr[len(key) :]
+
+        if not func:
+            attrs.append([attr])
+            continue
+
+        # The length of complex attributes cannot be determined
+        # beforehand and must be reported by the parsing func.
+        delim = actstr[0]
+        actstr = actstr[1:]
+        if delim == "=":
+            pos = strcspn(actstr, ",)")
+            datum = func(actstr[:pos])
+        elif delim == "(":
+            datum, pos = func(actstr)
+
+        attrs.append([attr, datum])
+        actstr = actstr[pos:]
+
+        if delim == "(":
+            actstr = actstr[1:]
+
+        actstr = actstr[strspn(actstr, ", ") :]
+
+    if actstr[0] != ")":
+        raise ValueError("Action str: '%s' unbalanced" % actstr)
+
+    return attrs, actstr[1:]
+
 class ovs_dp_msg(genlmsg):
     # include the OVS version
     # We need a custom header rather than just being able to rely on
@@ -357,41 +400,19 @@ class ovsactions(nla):
                 percent = float(percent.strip('%'))
                 return int(math.floor(UINT32_MAX * (percent / 100.0) + .5))
 
-            for (key, attr, func) in (
+            attrs_desc = (
                 ("sample", "OVS_SAMPLE_ATTR_PROBABILITY", percent_to_rate),
                 ("group_id", "OVS_SAMPLE_ATTR_PSAMPLE_GROUP", int),
                 ("cookie", "OVS_SAMPLE_ATTR_PSAMPLE_COOKIE",
                     lambda x: list(bytearray.fromhex(x))),
                 ("actions", "OVS_SAMPLE_ATTR_ACTIONS", parse_nested_actions),
-            ):
-                if not actstr.startswith(key):
-                    continue
-
-                actstr = actstr[len(key) :]
-
-                if not func:
-                    self["attrs"].append([attr, None])
-                    continue
-
-                # The length of complex attributes cannot be determined
-                # beforehand and must be reported by the parsing func.
-                delim = actstr[0]
-                actstr = actstr[1:]
-                if delim == "=":
-                    pos = strcspn(actstr, ",)")
-                    datum = func(actstr[:pos])
-                elif delim == "(":
-                    datum, pos = func(actstr)
-
-                self["attrs"].append([attr, datum])
-                actstr = actstr[pos:]
-                actstr = actstr[strspn(actstr, ", ") :]
-
-            if actstr[0] != ")":
-                raise ValueError("Action str: '%s' unbalanced" % actstr)
+            )
 
-            return actstr[1:]
+            attrs, actstr = parse_attributes(actstr, attrs_desc)
+            for attr in attrs:
+                self["attrs"].append(attr)
 
+            return actstr
 
     class ctact(nla):
         nla_flags = NLA_F_NESTED
@@ -521,6 +542,18 @@ class ovsactions(nla):
             print_str += ")"
             return print_str
 
+        def parse(self, actstr):
+            attrs_desc = (
+                ("pid", "OVS_USERSPACE_ATTR_PID", int),
+                ("userdata", "OVS_USERSPACE_ATTR_USERDATA",
+                    lambda x: list(bytearray.fromhex(x))),
+                ("egress_tun_port", "OVS_USERSPACE_ATTR_EGRESS_TUN_PORT", int)
+            )
+            attrs, actstr = parse_attributes(actstr, attrs_desc)
+            for attr in attrs:
+                self["attrs"].append(attr)
+            return actstr
+
     def dpstr(self, more=False):
         print_str = ""
 
@@ -730,6 +763,11 @@ class ovsactions(nla):
                 self["attrs"].append(["OVS_ACTION_ATTR_SAMPLE", sampleact])
                 parsed = True
 
+            elif parse_starts_block(actstr, "userspace(", False):
+                uact = self.userspace()
+                actstr = uact.parse(actstr[len("userpsace(") : ])
+                self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact])
+                parsed = True
 
             actstr = actstr[strspn(actstr, ", ") :]
             while parencount > 0:
@@ -2112,10 +2150,70 @@ class OvsFlow(GenericNetlinkSocket):
         print("MISS upcall[%d/%s]: %s" % (seq, pktpres, keystr), flush=True)
 
     def execute(self, packetmsg):
-        print("userspace execute command")
+        print("userspace execute command", flush=True)
 
     def action(self, packetmsg):
-        print("userspace action command")
+        print("userspace action command", flush=True)
+
+
+class psample_sample(genlmsg):
+    nla_map = (
+        ("PSAMPLE_ATTR_IIFINDEX", "none"),
+        ("PSAMPLE_ATTR_OIFINDEX", "none"),
+        ("PSAMPLE_ATTR_ORIGSIZE", "none"),
+        ("PSAMPLE_ATTR_SAMPLE_GROUP", "uint32"),
+        ("PSAMPLE_ATTR_GROUP_SEQ", "none"),
+        ("PSAMPLE_ATTR_SAMPLE_RATE", "uint32"),
+        ("PSAMPLE_ATTR_DATA", "array(uint8)"),
+        ("PSAMPLE_ATTR_GROUP_REFCOUNT", "none"),
+        ("PSAMPLE_ATTR_TUNNEL", "none"),
+        ("PSAMPLE_ATTR_PAD", "none"),
+        ("PSAMPLE_ATTR_OUT_TC", "none"),
+        ("PSAMPLE_ATTR_OUT_TC_OCC", "none"),
+        ("PSAMPLE_ATTR_LATENCY", "none"),
+        ("PSAMPLE_ATTR_TIMESTAMP", "none"),
+        ("PSAMPLE_ATTR_PROTO", "none"),
+        ("PSAMPLE_ATTR_USER_COOKIE", "array(uint8)"),
+    )
+
+    def dpstr(self):
+        fields = []
+        data = ""
+        for (attr, value) in self["attrs"]:
+            if attr == "PSAMPLE_ATTR_SAMPLE_GROUP":
+                fields.append("group:%d" % value)
+            if attr == "PSAMPLE_ATTR_SAMPLE_RATE":
+                fields.append("rate:%d" % value)
+            if attr == "PSAMPLE_ATTR_USER_COOKIE":
+                value = "".join(format(x, "02x") for x in value)
+                fields.append("cookie:%s" % value)
+            if attr == "PSAMPLE_ATTR_DATA" and len(value) > 0:
+                data = "data:%s" % "".join(format(x, "02x") for x in value)
+
+        return ("%s %s" % (",".join(fields), data)).strip()
+
+
+class psample_msg(Marshal):
+    PSAMPLE_CMD_SAMPLE = 0
+    PSAMPLE_CMD_GET_GROUP = 1
+    PSAMPLE_CMD_NEW_GROUP = 2
+    PSAMPLE_CMD_DEL_GROUP = 3
+    PSAMPLE_CMD_SET_FILTER = 4
+    msg_map = {PSAMPLE_CMD_SAMPLE: psample_sample}
+
+
+class Psample(EventSocket):
+    genl_family = "psample"
+    mcast_groups = ["packets"]
+    marshal_class = psample_msg
+
+    def read_samples(self):
+        while True:
+            try:
+                for msg in self.get():
+                    print(msg.dpstr(), flush=True)
+            except NetlinkError as ne:
+                raise ne
 
 
 def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB(), vpl=OvsVport()):
@@ -2175,7 +2273,7 @@ def main(argv):
         help="Increment 'verbose' output counter.",
         default=0,
     )
-    subparsers = parser.add_subparsers()
+    subparsers = parser.add_subparsers(dest="subcommand")
 
     showdpcmd = subparsers.add_parser("show")
     showdpcmd.add_argument(
@@ -2232,6 +2330,8 @@ def main(argv):
     delfscmd = subparsers.add_parser("del-flows")
     delfscmd.add_argument("flsbr", help="Datapath name")
 
+    subparsers.add_parser("psample")
+
     args = parser.parse_args()
 
     if args.verbose > 0:
@@ -2246,6 +2346,9 @@ def main(argv):
 
     sys.setrecursionlimit(100000)
 
+    if args.subcommand == "psample":
+        Psample().read_samples()
+
     if hasattr(args, "showdp"):
         found = False
         for iface in ndb.interfaces:
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 2/8] net: psample: add multicast filtering on group_id
  2024-04-24 13:50 ` [PATCH net-next 2/8] net: psample: add multicast filtering on group_id Adrian Moreno
@ 2024-04-24 14:54   ` Jiri Pirko
  2024-04-25  7:23     ` Adrian Moreno
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Pirko @ 2024-04-24 14:54 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

Wed, Apr 24, 2024 at 03:50:49PM CEST, amorenoz@redhat.com wrote:
>Packet samples can come from several places (e.g: different tc sample
>actions), typically using the sample group (PSAMPLE_ATTR_SAMPLE_GROUP)
>to differentiate them.
>
>Likewise, sample consumers that listen on the multicast group may only
>be interested on a single group. However, they are currently forced to
>receive all samples and discard the ones that are not relevant, causing
>unnecessary overhead.
>
>Allow users to filter on the desired group_id by adding a new command
>PSAMPLE_SET_FILTER that can be used to pass the desired group id.
>Store this filter on the per-socket private pointer and use it for
>filtering multicasted samples.
>
>Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
>---
> include/uapi/linux/psample.h |   1 +
> net/psample/psample.c        | 110 +++++++++++++++++++++++++++++++++--
> 2 files changed, 105 insertions(+), 6 deletions(-)
>
>diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
>index e585db5bf2d2..9d62983af0a4 100644
>--- a/include/uapi/linux/psample.h
>+++ b/include/uapi/linux/psample.h
>@@ -28,6 +28,7 @@ enum psample_command {
> 	PSAMPLE_CMD_GET_GROUP,
> 	PSAMPLE_CMD_NEW_GROUP,
> 	PSAMPLE_CMD_DEL_GROUP,
>+	PSAMPLE_CMD_SET_FILTER,
> };
> 
> enum psample_tunnel_key_attr {
>diff --git a/net/psample/psample.c b/net/psample/psample.c
>index a5d9b8446f77..f5f77515b969 100644
>--- a/net/psample/psample.c
>+++ b/net/psample/psample.c
>@@ -98,13 +98,77 @@ static int psample_nl_cmd_get_group_dumpit(struct sk_buff *msg,
> 	return msg->len;
> }
> 
>-static const struct genl_small_ops psample_nl_ops[] = {
>+struct psample_obj_desc {
>+	struct rcu_head rcu;
>+	u32 group_num;
>+};
>+
>+struct psample_nl_sock_priv {
>+	struct psample_obj_desc __rcu *filter;
>+	spinlock_t filter_lock; /* Protects filter. */
>+};
>+
>+static void psample_nl_sock_priv_init(void *priv)
>+{
>+	struct psample_nl_sock_priv *sk_priv = priv;
>+
>+	spin_lock_init(&sk_priv->filter_lock);
>+}
>+
>+static void psample_nl_sock_priv_destroy(void *priv)
>+{
>+	struct psample_nl_sock_priv *sk_priv = priv;
>+	struct psample_obj_desc *filter;
>+
>+	filter = rcu_dereference_protected(sk_priv->filter, true);
>+	kfree_rcu(filter, rcu);
>+}
>+
>+static int psample_nl_set_filter_doit(struct sk_buff *skb,
>+				      struct genl_info *info)
>+{
>+	struct psample_obj_desc *filter = NULL;
>+	struct psample_nl_sock_priv *sk_priv;
>+	struct nlattr **attrs = info->attrs;
>+
>+	if (attrs[PSAMPLE_ATTR_SAMPLE_GROUP]) {
>+		filter = kzalloc(sizeof(*filter), GFP_KERNEL);
>+		filter->group_num =
>+			nla_get_u32(attrs[PSAMPLE_ATTR_SAMPLE_GROUP]);
>+	}
>+
>+	sk_priv = genl_sk_priv_get(&psample_nl_family, NETLINK_CB(skb).sk);
>+	if (IS_ERR(sk_priv)) {
>+		kfree(filter);
>+		return PTR_ERR(sk_priv);
>+	}
>+
>+	spin_lock(&sk_priv->filter_lock);
>+	filter = rcu_replace_pointer(sk_priv->filter, filter,
>+				     lockdep_is_held(&sk_priv->filter_lock));
>+	spin_unlock(&sk_priv->filter_lock);
>+	kfree_rcu(filter, rcu);
>+	return 0;
>+}
>+
>+static const struct nla_policy
>+psample_set_filter_policy[PSAMPLE_ATTR_SAMPLE_GROUP + 1] = {
>+	[PSAMPLE_ATTR_SAMPLE_GROUP] = { .type = NLA_U32, },
>+};
>+
>+static const struct genl_ops psample_nl_ops[] = {
> 	{
> 		.cmd = PSAMPLE_CMD_GET_GROUP,
> 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> 		.dumpit = psample_nl_cmd_get_group_dumpit,
> 		/* can be retrieved by unprivileged users */
>-	}
>+	},
>+	{
>+		.cmd		= PSAMPLE_CMD_SET_FILTER,
>+		.doit		= psample_nl_set_filter_doit,
>+		.policy		= psample_set_filter_policy,
>+		.flags		= 0,
>+	},

Sidenote:
Did you think about converting psample to split ops and to introcude
ynl spec file for it?


> };
> 
> static struct genl_family psample_nl_family __ro_after_init = {
>@@ -114,10 +178,13 @@ static struct genl_family psample_nl_family __ro_after_init = {
> 	.netnsok	= true,
> 	.module		= THIS_MODULE,
> 	.mcgrps		= psample_nl_mcgrps,
>-	.small_ops	= psample_nl_ops,
>-	.n_small_ops	= ARRAY_SIZE(psample_nl_ops),
>+	.ops		= psample_nl_ops,
>+	.n_ops		= ARRAY_SIZE(psample_nl_ops),
> 	.resv_start_op	= PSAMPLE_CMD_GET_GROUP + 1,
> 	.n_mcgrps	= ARRAY_SIZE(psample_nl_mcgrps),
>+	.sock_priv_size		= sizeof(struct psample_nl_sock_priv),
>+	.sock_priv_init		= psample_nl_sock_priv_init,
>+	.sock_priv_destroy	= psample_nl_sock_priv_destroy,
> };
> 
> static void psample_group_notify(struct psample_group *group,
>@@ -360,6 +427,32 @@ static int psample_tunnel_meta_len(struct ip_tunnel_info *tun_info)
> }
> #endif
> 
>+static inline void psample_nl_obj_desc_init(struct psample_obj_desc *desc,
>+					    u32 group_num)
>+{
>+	memset(desc, 0, sizeof(*desc));
>+	desc->group_num = group_num;
>+}
>+
>+static int psample_nl_sample_filter(struct sock *dsk, struct sk_buff *skb,
>+				    void *data)
>+{
>+	struct psample_obj_desc *desc = data;
>+	struct psample_nl_sock_priv *sk_priv;
>+	struct psample_obj_desc *filter;
>+	int ret = 0;
>+
>+	rcu_read_lock();
>+	sk_priv = __genl_sk_priv_get(&psample_nl_family, dsk);
>+	if (!IS_ERR_OR_NULL(sk_priv)) {
>+		filter = rcu_dereference(sk_priv->filter);
>+		if (filter && desc)
>+			ret = (filter->group_num != desc->group_num);
>+	}
>+	rcu_read_unlock();
>+	return ret;
>+}
>+
> void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
> 			   u32 sample_rate, const struct psample_metadata *md)
> {
>@@ -370,6 +463,7 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
> #ifdef CONFIG_INET
> 	struct ip_tunnel_info *tun_info;
> #endif
>+	struct psample_obj_desc desc;
> 	struct sk_buff *nl_skb;
> 	int data_len;
> 	int meta_len;
>@@ -487,8 +581,12 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
> #endif
> 
> 	genlmsg_end(nl_skb, data);
>-	genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0,
>-				PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);
>+	psample_nl_obj_desc_init(&desc, group->group_num);
>+	genlmsg_multicast_netns_filtered(&psample_nl_family,
>+					 group->net, nl_skb, 0,
>+					 PSAMPLE_NL_MCGRP_SAMPLE,
>+					 GFP_ATOMIC, psample_nl_sample_filter,
>+					 &desc);
> 
> 	return;
> error:
>-- 
>2.44.0
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 4/8] net: psample: add tracepoint
  2024-04-24 13:50 ` [PATCH net-next 4/8] net: psample: add tracepoint Adrian Moreno
@ 2024-04-25  7:18   ` Ido Schimmel
  2024-04-25  8:06     ` Adrian Moreno
  0 siblings, 1 reply; 25+ messages in thread
From: Ido Schimmel @ 2024-04-25  7:18 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
> Currently there are no widely-available tools to dump the metadata and
> group information when a packet is sampled, making it difficult to
> troubleshoot related issues.
> 
> This makes psample use the event tracing framework to log the sampling
> of a packet so that it's easier to quickly identify the source
> (i.e: group) and context (i.e: metadata) of a packet being sampled.
> 
> This patch creates some checkpatch splats, but the style of the
> tracepoint definition mimics that of other modules so it seems
> acceptable.

I don't see a good reason to add this tracepoint (which we won't be able
to remove) when you can easily do that with bpftrace which by now should
be widely available:

#!/usr/bin/bpftrace

kfunc:psample_sample_packet
{
        $ts_us = nsecs() / 1000;
        $secs = $ts_us / 1000000;
        $us = $ts_us % 1000000;
        $group = args.group;
        $skb = args.skb;
        $md = args.md;

        printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
               comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
               $skb, $skb->len, $skb->data_len, args.sample_rate,
               $md->in_ifindex, $md->out_ifindex,
               buf($md->user_cookie, $md->user_cookie_len));
}

Example output:

mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
\xde\xad\xbe\xef
mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
\xde\xad\xbe\xef

Note that it prints the cookie itself unlike the tracepoint which only
prints the hashed pointer.

> 
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> ---
>  net/psample/psample.c |  9 +++++++
>  net/psample/trace.h   | 62 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+)
>  create mode 100644 net/psample/trace.h
> 
> diff --git a/net/psample/psample.c b/net/psample/psample.c
> index 476aaad7a885..92db8ffa2ba2 100644
> --- a/net/psample/psample.c
> +++ b/net/psample/psample.c
> @@ -18,6 +18,12 @@
>  #include <net/ip_tunnels.h>
>  #include <net/dst_metadata.h>
>  
> +#ifndef __CHECKER__
> +#define CREATE_TRACE_POINTS
> +#endif
> +
> +#include "trace.h"
> +
>  #define PSAMPLE_MAX_PACKET_SIZE 0xffff
>  
>  static LIST_HEAD(psample_groups_list);
> @@ -470,6 +476,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>  	void *data;
>  	int ret;
>  
> +	if (trace_psample_sample_packet_enabled())
> +		trace_psample_sample_packet(group, skb, sample_rate, md);

My understanding is that trace_x_enabled() should only be used if you
need to do some work to prepare parameters for the tracepoint.

> +
>  	meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>  		   (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>  		   (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 2/8] net: psample: add multicast filtering on group_id
  2024-04-24 14:54   ` Jiri Pirko
@ 2024-04-25  7:23     ` Adrian Moreno
  0 siblings, 0 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-04-25  7:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel



On 4/24/24 16:54, Jiri Pirko wrote:
> Wed, Apr 24, 2024 at 03:50:49PM CEST, amorenoz@redhat.com wrote:
>> Packet samples can come from several places (e.g: different tc sample
>> actions), typically using the sample group (PSAMPLE_ATTR_SAMPLE_GROUP)
>> to differentiate them.
>>
>> Likewise, sample consumers that listen on the multicast group may only
>> be interested on a single group. However, they are currently forced to
>> receive all samples and discard the ones that are not relevant, causing
>> unnecessary overhead.
>>
>> Allow users to filter on the desired group_id by adding a new command
>> PSAMPLE_SET_FILTER that can be used to pass the desired group id.
>> Store this filter on the per-socket private pointer and use it for
>> filtering multicasted samples.
>>
>> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
>> ---
>> include/uapi/linux/psample.h |   1 +
>> net/psample/psample.c        | 110 +++++++++++++++++++++++++++++++++--
>> 2 files changed, 105 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
>> index e585db5bf2d2..9d62983af0a4 100644
>> --- a/include/uapi/linux/psample.h
>> +++ b/include/uapi/linux/psample.h
>> @@ -28,6 +28,7 @@ enum psample_command {
>> 	PSAMPLE_CMD_GET_GROUP,
>> 	PSAMPLE_CMD_NEW_GROUP,
>> 	PSAMPLE_CMD_DEL_GROUP,
>> +	PSAMPLE_CMD_SET_FILTER,
>> };
>>
>> enum psample_tunnel_key_attr {
>> diff --git a/net/psample/psample.c b/net/psample/psample.c
>> index a5d9b8446f77..f5f77515b969 100644
>> --- a/net/psample/psample.c
>> +++ b/net/psample/psample.c
>> @@ -98,13 +98,77 @@ static int psample_nl_cmd_get_group_dumpit(struct sk_buff *msg,
>> 	return msg->len;
>> }
>>
>> -static const struct genl_small_ops psample_nl_ops[] = {
>> +struct psample_obj_desc {
>> +	struct rcu_head rcu;
>> +	u32 group_num;
>> +};
>> +
>> +struct psample_nl_sock_priv {
>> +	struct psample_obj_desc __rcu *filter;
>> +	spinlock_t filter_lock; /* Protects filter. */
>> +};
>> +
>> +static void psample_nl_sock_priv_init(void *priv)
>> +{
>> +	struct psample_nl_sock_priv *sk_priv = priv;
>> +
>> +	spin_lock_init(&sk_priv->filter_lock);
>> +}
>> +
>> +static void psample_nl_sock_priv_destroy(void *priv)
>> +{
>> +	struct psample_nl_sock_priv *sk_priv = priv;
>> +	struct psample_obj_desc *filter;
>> +
>> +	filter = rcu_dereference_protected(sk_priv->filter, true);
>> +	kfree_rcu(filter, rcu);
>> +}
>> +
>> +static int psample_nl_set_filter_doit(struct sk_buff *skb,
>> +				      struct genl_info *info)
>> +{
>> +	struct psample_obj_desc *filter = NULL;
>> +	struct psample_nl_sock_priv *sk_priv;
>> +	struct nlattr **attrs = info->attrs;
>> +
>> +	if (attrs[PSAMPLE_ATTR_SAMPLE_GROUP]) {
>> +		filter = kzalloc(sizeof(*filter), GFP_KERNEL);
>> +		filter->group_num =
>> +			nla_get_u32(attrs[PSAMPLE_ATTR_SAMPLE_GROUP]);
>> +	}
>> +
>> +	sk_priv = genl_sk_priv_get(&psample_nl_family, NETLINK_CB(skb).sk);
>> +	if (IS_ERR(sk_priv)) {
>> +		kfree(filter);
>> +		return PTR_ERR(sk_priv);
>> +	}
>> +
>> +	spin_lock(&sk_priv->filter_lock);
>> +	filter = rcu_replace_pointer(sk_priv->filter, filter,
>> +				     lockdep_is_held(&sk_priv->filter_lock));
>> +	spin_unlock(&sk_priv->filter_lock);
>> +	kfree_rcu(filter, rcu);
>> +	return 0;
>> +}
>> +
>> +static const struct nla_policy
>> +psample_set_filter_policy[PSAMPLE_ATTR_SAMPLE_GROUP + 1] = {
>> +	[PSAMPLE_ATTR_SAMPLE_GROUP] = { .type = NLA_U32, },
>> +};
>> +
>> +static const struct genl_ops psample_nl_ops[] = {
>> 	{
>> 		.cmd = PSAMPLE_CMD_GET_GROUP,
>> 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> 		.dumpit = psample_nl_cmd_get_group_dumpit,
>> 		/* can be retrieved by unprivileged users */
>> -	}
>> +	},
>> +	{
>> +		.cmd		= PSAMPLE_CMD_SET_FILTER,
>> +		.doit		= psample_nl_set_filter_doit,
>> +		.policy		= psample_set_filter_policy,
>> +		.flags		= 0,
>> +	},
> 
> Sidenote:
> Did you think about converting psample to split ops and to introcude
> ynl spec file for it?
>  

If split opts are preferred then sure, I can do that.

Thanks.

>> };
>>
>> static struct genl_family psample_nl_family __ro_after_init = {
>> @@ -114,10 +178,13 @@ static struct genl_family psample_nl_family __ro_after_init = {
>> 	.netnsok	= true,
>> 	.module		= THIS_MODULE,
>> 	.mcgrps		= psample_nl_mcgrps,
>> -	.small_ops	= psample_nl_ops,
>> -	.n_small_ops	= ARRAY_SIZE(psample_nl_ops),
>> +	.ops		= psample_nl_ops,
>> +	.n_ops		= ARRAY_SIZE(psample_nl_ops),
>> 	.resv_start_op	= PSAMPLE_CMD_GET_GROUP + 1,
>> 	.n_mcgrps	= ARRAY_SIZE(psample_nl_mcgrps),
>> +	.sock_priv_size		= sizeof(struct psample_nl_sock_priv),
>> +	.sock_priv_init		= psample_nl_sock_priv_init,
>> +	.sock_priv_destroy	= psample_nl_sock_priv_destroy,
>> };
>>
>> static void psample_group_notify(struct psample_group *group,
>> @@ -360,6 +427,32 @@ static int psample_tunnel_meta_len(struct ip_tunnel_info *tun_info)
>> }
>> #endif
>>
>> +static inline void psample_nl_obj_desc_init(struct psample_obj_desc *desc,
>> +					    u32 group_num)
>> +{
>> +	memset(desc, 0, sizeof(*desc));
>> +	desc->group_num = group_num;
>> +}
>> +
>> +static int psample_nl_sample_filter(struct sock *dsk, struct sk_buff *skb,
>> +				    void *data)
>> +{
>> +	struct psample_obj_desc *desc = data;
>> +	struct psample_nl_sock_priv *sk_priv;
>> +	struct psample_obj_desc *filter;
>> +	int ret = 0;
>> +
>> +	rcu_read_lock();
>> +	sk_priv = __genl_sk_priv_get(&psample_nl_family, dsk);
>> +	if (!IS_ERR_OR_NULL(sk_priv)) {
>> +		filter = rcu_dereference(sk_priv->filter);
>> +		if (filter && desc)
>> +			ret = (filter->group_num != desc->group_num);
>> +	}
>> +	rcu_read_unlock();
>> +	return ret;
>> +}
>> +
>> void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>> 			   u32 sample_rate, const struct psample_metadata *md)
>> {
>> @@ -370,6 +463,7 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>> #ifdef CONFIG_INET
>> 	struct ip_tunnel_info *tun_info;
>> #endif
>> +	struct psample_obj_desc desc;
>> 	struct sk_buff *nl_skb;
>> 	int data_len;
>> 	int meta_len;
>> @@ -487,8 +581,12 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>> #endif
>>
>> 	genlmsg_end(nl_skb, data);
>> -	genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0,
>> -				PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);
>> +	psample_nl_obj_desc_init(&desc, group->group_num);
>> +	genlmsg_multicast_netns_filtered(&psample_nl_family,
>> +					 group->net, nl_skb, 0,
>> +					 PSAMPLE_NL_MCGRP_SAMPLE,
>> +					 GFP_ATOMIC, psample_nl_sample_filter,
>> +					 &desc);
>>
>> 	return;
>> error:
>> -- 
>> 2.44.0
>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 3/8] net: psample: add user cookie
  2024-04-24 13:50 ` [PATCH net-next 3/8] net: psample: add user cookie Adrian Moreno
@ 2024-04-25  7:32   ` Ido Schimmel
  2024-04-25  8:09     ` Adrian Moreno
  0 siblings, 1 reply; 25+ messages in thread
From: Ido Schimmel @ 2024-04-25  7:32 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

On Wed, Apr 24, 2024 at 03:50:50PM +0200, Adrian Moreno wrote:
> @@ -579,6 +580,15 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>  			goto error;
>  	}
>  #endif
> +	if (md->user_cookie && md->user_cookie_len) {
> +		int nla_len = nla_total_size(md->user_cookie_len);
> +		struct nlattr *nla;
> +
> +		nla = skb_put(nl_skb, nla_len);
> +		nla->nla_type = PSAMPLE_ATTR_USER_COOKIE;
> +		nla->nla_len = nla_attr_size(md->user_cookie_len);
> +		memcpy(nla_data(nla), md->user_cookie, md->user_cookie_len);
> +	}

Did you consider using nla_put() instead?

diff --git a/net/psample/psample.c b/net/psample/psample.c
index 92db8ffa2ba2..ea59ca46b119 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -589,15 +589,10 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
                        goto error;
        }
 #endif
-       if (md->user_cookie && md->user_cookie_len) {
-               int nla_len = nla_total_size(md->user_cookie_len);
-               struct nlattr *nla;
-
-               nla = skb_put(nl_skb, nla_len);
-               nla->nla_type = PSAMPLE_ATTR_USER_COOKIE;
-               nla->nla_len = nla_attr_size(md->user_cookie_len);
-               memcpy(nla_data(nla), md->user_cookie, md->user_cookie_len);
-       }
+       if (md->user_cookie && md->user_cookie_len &&
+           nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len,
+                   md->user_cookie))
+               goto error;
 
        genlmsg_end(nl_skb, data);
        psample_nl_obj_desc_init(&desc, group->group_num);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 5/8] net: sched: act_sample: add action cookie to sample
  2024-04-24 13:50 ` [PATCH net-next 5/8] net: sched: act_sample: add action cookie to sample Adrian Moreno
@ 2024-04-25  7:39   ` Ido Schimmel
  2024-04-25 21:43   ` Jamal Hadi Salim
  1 sibling, 0 replies; 25+ messages in thread
From: Ido Schimmel @ 2024-04-25  7:39 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, echaudro, horms, i.maximets, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel

On Wed, Apr 24, 2024 at 03:50:52PM +0200, Adrian Moreno wrote:
> If the action has a user_cookie, pass it along to the sample so it can
> be easily identified.
> 
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 4/8] net: psample: add tracepoint
  2024-04-25  7:18   ` Ido Schimmel
@ 2024-04-25  8:06     ` Adrian Moreno
  2024-04-25 15:25       ` Ido Schimmel
  0 siblings, 1 reply; 25+ messages in thread
From: Adrian Moreno @ 2024-04-25  8:06 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel



On 4/25/24 09:18, Ido Schimmel wrote:
> On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
>> Currently there are no widely-available tools to dump the metadata and
>> group information when a packet is sampled, making it difficult to
>> troubleshoot related issues.
>>
>> This makes psample use the event tracing framework to log the sampling
>> of a packet so that it's easier to quickly identify the source
>> (i.e: group) and context (i.e: metadata) of a packet being sampled.
>>
>> This patch creates some checkpatch splats, but the style of the
>> tracepoint definition mimics that of other modules so it seems
>> acceptable.
> 
> I don't see a good reason to add this tracepoint (which we won't be able
> to remove) when you can easily do that with bpftrace which by now should
> be widely available:
> 
> #!/usr/bin/bpftrace
> 
> kfunc:psample_sample_packet
> {
>          $ts_us = nsecs() / 1000;
>          $secs = $ts_us / 1000000;
>          $us = $ts_us % 1000000;
>          $group = args.group;
>          $skb = args.skb;
>          $md = args.md;
> 
>          printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
>                 comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
>                 $skb, $skb->len, $skb->data_len, args.sample_rate,
>                 $md->in_ifindex, $md->out_ifindex,
>                 buf($md->user_cookie, $md->user_cookie_len));
> }
> 
> Example output:
> 
> mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> \xde\xad\xbe\xef
> mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> \xde\xad\xbe\xef
> 
> Note that it prints the cookie itself unlike the tracepoint which only
> prints the hashed pointer.
> 

I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess 
that also true for many other tracepoints out there, right?

For development and labs bpftrace is perfectly fine, but using kfuncs and 
requiring recompilation is harder in production systems compared with using 
smaller CO-RE tools.

If OVS starts using psample heavily and users need to troubleshoot or merely 
observe packets as they are sampled in a more efficient way, they are likely to 
use ebpf for that. I think making it a bit easier (as in, providing a sligthly 
more stable tracepoint) is worth considering.

Can you please expand on your concerns about the tracepoint? It's on the main 
internal function of the module so, even though the function name or its 
arguments might change, it doesn't seem probable that it'll disappear 
altogether. Why else would we want to remove the tracepoint?

>>
>> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
>> ---
>>   net/psample/psample.c |  9 +++++++
>>   net/psample/trace.h   | 62 +++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 71 insertions(+)
>>   create mode 100644 net/psample/trace.h
>>
>> diff --git a/net/psample/psample.c b/net/psample/psample.c
>> index 476aaad7a885..92db8ffa2ba2 100644
>> --- a/net/psample/psample.c
>> +++ b/net/psample/psample.c
>> @@ -18,6 +18,12 @@
>>   #include <net/ip_tunnels.h>
>>   #include <net/dst_metadata.h>
>>   
>> +#ifndef __CHECKER__
>> +#define CREATE_TRACE_POINTS
>> +#endif
>> +
>> +#include "trace.h"
>> +
>>   #define PSAMPLE_MAX_PACKET_SIZE 0xffff
>>   
>>   static LIST_HEAD(psample_groups_list);
>> @@ -470,6 +476,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>>   	void *data;
>>   	int ret;
>>   
>> +	if (trace_psample_sample_packet_enabled())
>> +		trace_psample_sample_packet(group, skb, sample_rate, md);
> 
> My understanding is that trace_x_enabled() should only be used if you
> need to do some work to prepare parameters for the tracepoint.
> 

Oh, thanks for that! I was not aware.

>> +
>>   	meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>>   		   (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>>   		   (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 3/8] net: psample: add user cookie
  2024-04-25  7:32   ` Ido Schimmel
@ 2024-04-25  8:09     ` Adrian Moreno
  0 siblings, 0 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-04-25  8:09 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel



On 4/25/24 09:32, Ido Schimmel wrote:
> On Wed, Apr 24, 2024 at 03:50:50PM +0200, Adrian Moreno wrote:
>> @@ -579,6 +580,15 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>>   			goto error;
>>   	}
>>   #endif
>> +	if (md->user_cookie && md->user_cookie_len) {
>> +		int nla_len = nla_total_size(md->user_cookie_len);
>> +		struct nlattr *nla;
>> +
>> +		nla = skb_put(nl_skb, nla_len);
>> +		nla->nla_type = PSAMPLE_ATTR_USER_COOKIE;
>> +		nla->nla_len = nla_attr_size(md->user_cookie_len);
>> +		memcpy(nla_data(nla), md->user_cookie, md->user_cookie_len);
>> +	}
> 
> Did you consider using nla_put() instead?
> 

That's way simpler, thanks. I'll update it and send another version.


> diff --git a/net/psample/psample.c b/net/psample/psample.c
> index 92db8ffa2ba2..ea59ca46b119 100644
> --- a/net/psample/psample.c
> +++ b/net/psample/psample.c
> @@ -589,15 +589,10 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
>                          goto error;
>          }
>   #endif
> -       if (md->user_cookie && md->user_cookie_len) {
> -               int nla_len = nla_total_size(md->user_cookie_len);
> -               struct nlattr *nla;
> -
> -               nla = skb_put(nl_skb, nla_len);
> -               nla->nla_type = PSAMPLE_ATTR_USER_COOKIE;
> -               nla->nla_len = nla_attr_size(md->user_cookie_len);
> -               memcpy(nla_data(nla), md->user_cookie, md->user_cookie_len);
> -       }
> +       if (md->user_cookie && md->user_cookie_len &&
> +           nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len,
> +                   md->user_cookie))
> +               goto error;
>   
>          genlmsg_end(nl_skb, data);
>          psample_nl_obj_desc_init(&desc, group->group_num);
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 4/8] net: psample: add tracepoint
  2024-04-25  8:06     ` Adrian Moreno
@ 2024-04-25 15:25       ` Ido Schimmel
  2024-04-29  5:33         ` Adrian Moreno
  0 siblings, 1 reply; 25+ messages in thread
From: Ido Schimmel @ 2024-04-25 15:25 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

On Thu, Apr 25, 2024 at 10:06:20AM +0200, Adrian Moreno wrote:
> 
> 
> On 4/25/24 09:18, Ido Schimmel wrote:
> > On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
> > > Currently there are no widely-available tools to dump the metadata and
> > > group information when a packet is sampled, making it difficult to
> > > troubleshoot related issues.
> > > 
> > > This makes psample use the event tracing framework to log the sampling
> > > of a packet so that it's easier to quickly identify the source
> > > (i.e: group) and context (i.e: metadata) of a packet being sampled.
> > > 
> > > This patch creates some checkpatch splats, but the style of the
> > > tracepoint definition mimics that of other modules so it seems
> > > acceptable.
> > 
> > I don't see a good reason to add this tracepoint (which we won't be able
> > to remove) when you can easily do that with bpftrace which by now should
> > be widely available:
> > 
> > #!/usr/bin/bpftrace
> > 
> > kfunc:psample_sample_packet
> > {
> >          $ts_us = nsecs() / 1000;
> >          $secs = $ts_us / 1000000;
> >          $us = $ts_us % 1000000;
> >          $group = args.group;
> >          $skb = args.skb;
> >          $md = args.md;
> > 
> >          printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
> >                 comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
> >                 $skb, $skb->len, $skb->data_len, args.sample_rate,
> >                 $md->in_ifindex, $md->out_ifindex,
> >                 buf($md->user_cookie, $md->user_cookie_len));
> > }
> > 
> > Example output:
> > 
> > mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> > \xde\xad\xbe\xef
> > mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> > \xde\xad\xbe\xef
> > 
> > Note that it prints the cookie itself unlike the tracepoint which only
> > prints the hashed pointer.
> > 
> 
> I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess
> that also true for many other tracepoints out there, right?

Maybe, but this particular tracepoint is not buried deep inside some
complex function with manipulated data being passed as arguments.
Instead, this tracepoint is placed at the very beginning of the function
and takes the function arguments as its own arguments. The tracepoint
can be easily replaced with fentry/kprobes like I've shown with the
example above.

> For development and labs bpftrace is perfectly fine, but using kfuncs and
> requiring recompilation is harder in production systems compared with using
> smaller CO-RE tools.

I used bpftrace because it is very easy to write, but I could have done
the same with libbpf. I have a bunch of such tools that I wrote over the
years that I compiled once on my laptop and which I copy to various
machines where I need them.

> If OVS starts using psample heavily and users need to troubleshoot or merely
> observe packets as they are sampled in a more efficient way, they are likely
> to use ebpf for that. I think making it a bit easier (as in, providing a
> sligthly more stable tracepoint) is worth considering.

I'm not saying that it's not worth considering, I'm simply saying that
it should be done after gathering operational experience with existing
mechanisms. It's possible you will conclude that this tracepoint is not
actually needed.

Also, there are some disadvantages in using tracepoints compared to
fentry:

https://github.com/Mellanox/mlxsw/commit/e996fd583eff1c43aacb9c79e55f5add12402d7d
https://lore.kernel.org/all/CAEf4BzbhvD_f=y3SDAiFqNvuErcnXt4fErMRSfanjYQg5=7GJg@mail.gmail.com/#t

Not saying that's the case here, but worth considering / being aware.

> Can you please expand on your concerns about the tracepoint? It's on the
> main internal function of the module so, even though the function name or
> its arguments might change, it doesn't seem probable that it'll disappear
> altogether. Why else would we want to remove the tracepoint?

It's not really concerns, but dissatisfaction. It's my impression (might
be wrong) that this series commits to adding new interfaces without
first seriously evaluating existing ones. This is true for this patch
and patch #2 that adds a new netlink command instead of using
SO_ATTACH_FILTER like existing applications are doing to achieve the
same goal.

I guess some will disagree, but wanted to voice my opinion nonetheless.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 5/8] net: sched: act_sample: add action cookie to sample
  2024-04-24 13:50 ` [PATCH net-next 5/8] net: sched: act_sample: add action cookie to sample Adrian Moreno
  2024-04-25  7:39   ` Ido Schimmel
@ 2024-04-25 21:43   ` Jamal Hadi Salim
  1 sibling, 0 replies; 25+ messages in thread
From: Jamal Hadi Salim @ 2024-04-25 21:43 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, echaudro, horms, i.maximets, Cong Wang,
	Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-kernel

On Wed, Apr 24, 2024 at 9:54 AM Adrian Moreno <amorenoz@redhat.com> wrote:
>
> If the action has a user_cookie, pass it along to the sample so it can
> be easily identified.
>
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

> ---
>  net/sched/act_sample.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> index a69b53d54039..5c3f86ec964a 100644
> --- a/net/sched/act_sample.c
> +++ b/net/sched/act_sample.c
> @@ -165,9 +165,11 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
>                                      const struct tc_action *a,
>                                      struct tcf_result *res)
>  {
> +       u8 cookie_data[TC_COOKIE_MAX_SIZE] = {};
>         struct tcf_sample *s = to_sample(a);
>         struct psample_group *psample_group;
>         struct psample_metadata md = {};
> +       struct tc_cookie *user_cookie;
>         int retval;
>
>         tcf_lastuse_update(&s->tcf_tm);
> @@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
>                 if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev))
>                         skb_push(skb, skb->mac_len);
>
> +               rcu_read_lock();
> +               user_cookie = rcu_dereference(a->user_cookie);
> +               if (user_cookie) {
> +                       memcpy(cookie_data, user_cookie->data,
> +                              user_cookie->len);
> +                       md.user_cookie = cookie_data;
> +                       md.user_cookie_len = user_cookie->len;
> +               }
> +               rcu_read_unlock();
> +
>                 md.trunc_size = s->truncate ? s->trunc_size : skb->len;
>                 psample_sample_packet(psample_group, skb, s->rate, &md);
>
> --
> 2.44.0
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 4/8] net: psample: add tracepoint
  2024-04-25 15:25       ` Ido Schimmel
@ 2024-04-29  5:33         ` Adrian Moreno
  2024-04-30 12:53           ` Ido Schimmel
  0 siblings, 1 reply; 25+ messages in thread
From: Adrian Moreno @ 2024-04-29  5:33 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel



On 4/25/24 17:25, Ido Schimmel wrote:
> On Thu, Apr 25, 2024 at 10:06:20AM +0200, Adrian Moreno wrote:
>>
>>
>> On 4/25/24 09:18, Ido Schimmel wrote:
>>> On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
>>>> Currently there are no widely-available tools to dump the metadata and
>>>> group information when a packet is sampled, making it difficult to
>>>> troubleshoot related issues.
>>>>
>>>> This makes psample use the event tracing framework to log the sampling
>>>> of a packet so that it's easier to quickly identify the source
>>>> (i.e: group) and context (i.e: metadata) of a packet being sampled.
>>>>
>>>> This patch creates some checkpatch splats, but the style of the
>>>> tracepoint definition mimics that of other modules so it seems
>>>> acceptable.
>>>
>>> I don't see a good reason to add this tracepoint (which we won't be able
>>> to remove) when you can easily do that with bpftrace which by now should
>>> be widely available:
>>>
>>> #!/usr/bin/bpftrace
>>>
>>> kfunc:psample_sample_packet
>>> {
>>>           $ts_us = nsecs() / 1000;
>>>           $secs = $ts_us / 1000000;
>>>           $us = $ts_us % 1000000;
>>>           $group = args.group;
>>>           $skb = args.skb;
>>>           $md = args.md;
>>>
>>>           printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
>>>                  comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
>>>                  $skb, $skb->len, $skb->data_len, args.sample_rate,
>>>                  $md->in_ifindex, $md->out_ifindex,
>>>                  buf($md->user_cookie, $md->user_cookie_len));
>>> }
>>>
>>> Example output:
>>>
>>> mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
>>> \xde\xad\xbe\xef
>>> mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
>>> \xde\xad\xbe\xef
>>>
>>> Note that it prints the cookie itself unlike the tracepoint which only
>>> prints the hashed pointer.
>>>
>>
>> I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess
>> that also true for many other tracepoints out there, right?
> 
> Maybe, but this particular tracepoint is not buried deep inside some
> complex function with manipulated data being passed as arguments.
> Instead, this tracepoint is placed at the very beginning of the function
> and takes the function arguments as its own arguments. The tracepoint
> can be easily replaced with fentry/kprobes like I've shown with the
> example above.
> 
>> For development and labs bpftrace is perfectly fine, but using kfuncs and
>> requiring recompilation is harder in production systems compared with using
>> smaller CO-RE tools.
> 
> I used bpftrace because it is very easy to write, but I could have done
> the same with libbpf. I have a bunch of such tools that I wrote over the
> years that I compiled once on my laptop and which I copy to various
> machines where I need them.
>

My worry is that if tools are built around a particular kprobe/kfunc they will 
break if the function name or its arguments change, where as a tracepoint give 
them a bit more stability across kernel versions. This breakage might not be a 
huge problem for bpftrace since the user can change the script at runtime, but 
libbpf programs will need recompilation or some kind of version-detection mechanism.

Given the observability-oriented nature of psample I can very much see tools 
like this being built (I myself plan to write one for OVS repo) and my concern 
is having their stability depend on a function name or arguments not changing 
across versions.


>> If OVS starts using psample heavily and users need to troubleshoot or merely
>> observe packets as they are sampled in a more efficient way, they are likely
>> to use ebpf for that. I think making it a bit easier (as in, providing a
>> sligthly more stable tracepoint) is worth considering.
> 
> I'm not saying that it's not worth considering, I'm simply saying that
> it should be done after gathering operational experience with existing
> mechanisms. It's possible you will conclude that this tracepoint is not
> actually needed.
> 
> Also, there are some disadvantages in using tracepoints compared to
> fentry:
> 
> https://github.com/Mellanox/mlxsw/commit/e996fd583eff1c43aacb9c79e55f5add12402d7d
> https://lore.kernel.org/all/CAEf4BzbhvD_f=y3SDAiFqNvuErcnXt4fErMRSfanjYQg5=7GJg@mail.gmail.com/#t
> 
> Not saying that's the case here, but worth considering / being aware.
> 
>> Can you please expand on your concerns about the tracepoint? It's on the
>> main internal function of the module so, even though the function name or
>> its arguments might change, it doesn't seem probable that it'll disappear
>> altogether. Why else would we want to remove the tracepoint?
> 
> It's not really concerns, but dissatisfaction. It's my impression (might
> be wrong) that this series commits to adding new interfaces without
> first seriously evaluating existing ones. This is true for this patch
> and patch #2 that adds a new netlink command instead of using
> SO_ATTACH_FILTER like existing applications are doing to achieve the
> same goal.
> 
> I guess some will disagree, but wanted to voice my opinion nonetheless.
> 

That's a fair point and I appreciate the feedback.

For patch #2, I can concede that it's just making applications slightly simpler 
without providing any further stability guarantees. I'm OK removing it.

And, I fail to convince you of the usefulness of the tracepoint, I can remove it 
as well.

Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 6/8] net:openvswitch: add psample support
  2024-04-24 13:50 ` [PATCH net-next 6/8] net:openvswitch: add psample support Adrian Moreno
@ 2024-04-30  7:29   ` Dan Carpenter
  2024-05-03  9:43   ` Eelco Chaudron
  1 sibling, 0 replies; 25+ messages in thread
From: Dan Carpenter @ 2024-04-30  7:29 UTC (permalink / raw)
  To: oe-kbuild, Adrian Moreno, netdev
  Cc: lkp, oe-kbuild-all, aconole, echaudro, horms, i.maximets,
	Adrian Moreno, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Pravin B Shelar, Donald Hunter, linux-kernel, dev

Hi Adrian,

kernel test robot noticed the following build warnings:

url:    https://github.com/intel-lab-lkp/linux/commits/Adrian-Moreno/net-netlink-export-genl-private-pointer-getters/20240424-215821
base:   net-next/main
patch link:    https://lore.kernel.org/r/20240424135109.3524355-7-amorenoz%40redhat.com
patch subject: [PATCH net-next 6/8] net:openvswitch: add psample support
config: i386-randconfig-141-20240430 (https://download.01.org/0day-ci/archive/20240430/202404300611.kJOZL2KI-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202404300611.kJOZL2KI-lkp@intel.com/

New smatch warnings:
net/openvswitch/actions.c:1097 sample() error: uninitialized symbol 'ret'.

vim +/ret +1097 net/openvswitch/actions.c

ccb1352e76cff0 Jesse Gross        2011-10-25  1061  static int sample(struct datapath *dp, struct sk_buff *skb,
ccea74457bbdaf Neil McKee         2015-05-26  1062  		  struct sw_flow_key *key, const struct nlattr *attr,
798c166173ffb5 andy zhou          2017-03-20  1063  		  bool last)
ccb1352e76cff0 Jesse Gross        2011-10-25  1064  {
ccc0b9e4657efd Adrian Moreno      2024-04-24  1065  	const struct sample_arg *arg;
798c166173ffb5 andy zhou          2017-03-20  1066  	struct nlattr *sample_arg;
798c166173ffb5 andy zhou          2017-03-20  1067  	int rem = nla_len(attr);
ccc0b9e4657efd Adrian Moreno      2024-04-24  1068  	struct nlattr *actions;
bef7f7567a104a andy zhou          2017-03-20  1069  	bool clone_flow_key;
ccc0b9e4657efd Adrian Moreno      2024-04-24  1070  	int ret;
ccb1352e76cff0 Jesse Gross        2011-10-25  1071  
798c166173ffb5 andy zhou          2017-03-20  1072  	/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
798c166173ffb5 andy zhou          2017-03-20  1073  	sample_arg = nla_data(attr);
798c166173ffb5 andy zhou          2017-03-20  1074  	arg = nla_data(sample_arg);
798c166173ffb5 andy zhou          2017-03-20  1075  	actions = nla_next(sample_arg, &rem);
e05176a3283822 Wenyu Zhang        2015-08-05  1076  
798c166173ffb5 andy zhou          2017-03-20  1077  	if ((arg->probability != U32_MAX) &&
a251c17aa558d8 Jason A. Donenfeld 2022-10-05  1078  	    (!arg->probability || get_random_u32() > arg->probability)) {
798c166173ffb5 andy zhou          2017-03-20  1079  		if (last)
9d802da40b7c82 Adrian Moreno      2023-08-11  1080  			ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
ccb1352e76cff0 Jesse Gross        2011-10-25  1081  		return 0;
ccb1352e76cff0 Jesse Gross        2011-10-25  1082  	}
651887b0c22cff Simon Horman       2014-07-21  1083  
ccc0b9e4657efd Adrian Moreno      2024-04-24  1084  	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
ccc0b9e4657efd Adrian Moreno      2024-04-24  1085  		ret = ovs_psample_packet(dp, key, arg, skb);
ccc0b9e4657efd Adrian Moreno      2024-04-24  1086  		if (ret)
ccc0b9e4657efd Adrian Moreno      2024-04-24  1087  			return ret;
ccc0b9e4657efd Adrian Moreno      2024-04-24  1088  	}
ccc0b9e4657efd Adrian Moreno      2024-04-24  1089  
ccc0b9e4657efd Adrian Moreno      2024-04-24  1090  	if (nla_ok(actions, rem)) {
ccc0b9e4657efd Adrian Moreno      2024-04-24  1091  		clone_flow_key = !(arg->flags & OVS_SAMPLE_ARG_FLAG_EXEC);
ccc0b9e4657efd Adrian Moreno      2024-04-24  1092  		ret = clone_execute(dp, skb, key, 0, actions, rem, last,
bef7f7567a104a andy zhou          2017-03-20  1093  				    clone_flow_key);
ccc0b9e4657efd Adrian Moreno      2024-04-24  1094  	} else if (last) {
ccc0b9e4657efd Adrian Moreno      2024-04-24  1095  		ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);

ret is uninitialized on this last path.

ccc0b9e4657efd Adrian Moreno      2024-04-24  1096  	}
ccc0b9e4657efd Adrian Moreno      2024-04-24 @1097  	return ret;
971427f353f3c4 Andy Zhou          2014-09-15  1098  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 4/8] net: psample: add tracepoint
  2024-04-29  5:33         ` Adrian Moreno
@ 2024-04-30 12:53           ` Ido Schimmel
  0 siblings, 0 replies; 25+ messages in thread
From: Ido Schimmel @ 2024-04-30 12:53 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, echaudro, horms, i.maximets, Yotam Gigi,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel

On Mon, Apr 29, 2024 at 07:33:59AM +0200, Adrian Moreno wrote:
> 
> 
> On 4/25/24 17:25, Ido Schimmel wrote:
> > On Thu, Apr 25, 2024 at 10:06:20AM +0200, Adrian Moreno wrote:
> > > 
> > > 
> > > On 4/25/24 09:18, Ido Schimmel wrote:
> > > > On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
> > > > > Currently there are no widely-available tools to dump the metadata and
> > > > > group information when a packet is sampled, making it difficult to
> > > > > troubleshoot related issues.
> > > > > 
> > > > > This makes psample use the event tracing framework to log the sampling
> > > > > of a packet so that it's easier to quickly identify the source
> > > > > (i.e: group) and context (i.e: metadata) of a packet being sampled.
> > > > > 
> > > > > This patch creates some checkpatch splats, but the style of the
> > > > > tracepoint definition mimics that of other modules so it seems
> > > > > acceptable.
> > > > 
> > > > I don't see a good reason to add this tracepoint (which we won't be able
> > > > to remove) when you can easily do that with bpftrace which by now should
> > > > be widely available:
> > > > 
> > > > #!/usr/bin/bpftrace
> > > > 
> > > > kfunc:psample_sample_packet
> > > > {
> > > >           $ts_us = nsecs() / 1000;
> > > >           $secs = $ts_us / 1000000;
> > > >           $us = $ts_us % 1000000;
> > > >           $group = args.group;
> > > >           $skb = args.skb;
> > > >           $md = args.md;
> > > > 
> > > >           printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
> > > >                  comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
> > > >                  $skb, $skb->len, $skb->data_len, args.sample_rate,
> > > >                  $md->in_ifindex, $md->out_ifindex,
> > > >                  buf($md->user_cookie, $md->user_cookie_len));
> > > > }
> > > > 
> > > > Example output:
> > > > 
> > > > mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> > > > \xde\xad\xbe\xef
> > > > mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
> > > > \xde\xad\xbe\xef
> > > > 
> > > > Note that it prints the cookie itself unlike the tracepoint which only
> > > > prints the hashed pointer.
> > > > 
> > > 
> > > I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess
> > > that also true for many other tracepoints out there, right?
> > 
> > Maybe, but this particular tracepoint is not buried deep inside some
> > complex function with manipulated data being passed as arguments.
> > Instead, this tracepoint is placed at the very beginning of the function
> > and takes the function arguments as its own arguments. The tracepoint
> > can be easily replaced with fentry/kprobes like I've shown with the
> > example above.
> > 
> > > For development and labs bpftrace is perfectly fine, but using kfuncs and
> > > requiring recompilation is harder in production systems compared with using
> > > smaller CO-RE tools.
> > 
> > I used bpftrace because it is very easy to write, but I could have done
> > the same with libbpf. I have a bunch of such tools that I wrote over the
> > years that I compiled once on my laptop and which I copy to various
> > machines where I need them.
> > 
> 
> My worry is that if tools are built around a particular kprobe/kfunc they
> will break if the function name or its arguments change, where as a
> tracepoint give them a bit more stability across kernel versions. This
> breakage might not be a huge problem for bpftrace since the user can change
> the script at runtime, but libbpf programs will need recompilation or some
> kind of version-detection mechanism.
> 
> Given the observability-oriented nature of psample I can very much see tools
> like this being built (I myself plan to write one for OVS repo) and my
> concern is having their stability depend on a function name or arguments not
> changing across versions.

There are a lot of tools in BCC that are using kprobes/fentry so
experience shows that it is possible to build observability tools on top
of these interfaces. My preference would be to avoid preemptively adding
a new tracepoint.

> 
> 
> > > If OVS starts using psample heavily and users need to troubleshoot or merely
> > > observe packets as they are sampled in a more efficient way, they are likely
> > > to use ebpf for that. I think making it a bit easier (as in, providing a
> > > sligthly more stable tracepoint) is worth considering.
> > 
> > I'm not saying that it's not worth considering, I'm simply saying that
> > it should be done after gathering operational experience with existing
> > mechanisms. It's possible you will conclude that this tracepoint is not
> > actually needed.
> > 
> > Also, there are some disadvantages in using tracepoints compared to
> > fentry:
> > 
> > https://github.com/Mellanox/mlxsw/commit/e996fd583eff1c43aacb9c79e55f5add12402d7d
> > https://lore.kernel.org/all/CAEf4BzbhvD_f=y3SDAiFqNvuErcnXt4fErMRSfanjYQg5=7GJg@mail.gmail.com/#t
> > 
> > Not saying that's the case here, but worth considering / being aware.
> > 
> > > Can you please expand on your concerns about the tracepoint? It's on the
> > > main internal function of the module so, even though the function name or
> > > its arguments might change, it doesn't seem probable that it'll disappear
> > > altogether. Why else would we want to remove the tracepoint?
> > 
> > It's not really concerns, but dissatisfaction. It's my impression (might
> > be wrong) that this series commits to adding new interfaces without
> > first seriously evaluating existing ones. This is true for this patch
> > and patch #2 that adds a new netlink command instead of using
> > SO_ATTACH_FILTER like existing applications are doing to achieve the
> > same goal.
> > 
> > I guess some will disagree, but wanted to voice my opinion nonetheless.
> > 
> 
> That's a fair point and I appreciate the feedback.
> 
> For patch #2, I can concede that it's just making applications slightly
> simpler without providing any further stability guarantees. I'm OK removing
> it.
> 
> And, I fail to convince you of the usefulness of the tracepoint, I can
> remove it as well.

Great, thank you. To be clear, my goal is not to make your life more
difficult, but simply to avoid merging changes that cannot be undone
when their goal can be achieved using existing interfaces.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 6/8] net:openvswitch: add psample support
  2024-04-24 13:50 ` [PATCH net-next 6/8] net:openvswitch: add psample support Adrian Moreno
  2024-04-30  7:29   ` Dan Carpenter
@ 2024-05-03  9:43   ` Eelco Chaudron
  2024-05-07 14:18     ` Adrian Moreno
  1 sibling, 1 reply; 25+ messages in thread
From: Eelco Chaudron @ 2024-05-03  9:43 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, horms, i.maximets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Pravin B Shelar,
	Donald Hunter, linux-kernel, dev



On 24 Apr 2024, at 15:50, Adrian Moreno wrote:

> Add support for psample sampling via two new attributes to the
> OVS_ACTION_ATTR_SAMPLE action.
>
> OVS_SAMPLE_ATTR_PSAMPLE_GROUP used to pass an integer psample group_id.
> OVS_SAMPLE_ATTR_PSAMPLE_COOKIE used to pass a variable-length binary
> cookie that will be forwared to psample.
>
> The maximum length of the user-defined cookie is set to 16, same as
> tc_cookie, to discourage using cookies that will not be offloadable.
>
> In order to simplify the internal processing of the action and given the
> maximum size of the cookie is relatively small, add both fields to the
> internal-only struct sample_arg.
>
> The presence of a group_id mandates that the action shall called the
> psample module to multicast the packet with such group_id and the
> user-provided cookie if present. This behavior is orthonogal to
> also executing the nested actions if present.
>
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>

This is not a full review yet. Just some comments, as I’m looking at the user-space patch first and added similar comments.

I’ll do a proper review of this series once I’m done with user-space part.

//Eelco

> ---
>  Documentation/netlink/specs/ovs_flow.yaml |  6 ++
>  include/uapi/linux/openvswitch.h          | 49 ++++++++++----
>  net/openvswitch/actions.c                 | 51 +++++++++++++--
>  net/openvswitch/flow_netlink.c            | 80 ++++++++++++++++++-----
>  4 files changed, 153 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/netlink/specs/ovs_flow.yaml b/Documentation/netlink/specs/ovs_flow.yaml
> index 4fdfc6b5cae9..5543c2937225 100644
> --- a/Documentation/netlink/specs/ovs_flow.yaml
> +++ b/Documentation/netlink/specs/ovs_flow.yaml
> @@ -825,6 +825,12 @@ attribute-sets:
>          name: actions
>          type: nest
>          nested-attributes: action-attrs
> +      -
> +        name: psample_group
> +        type: u32
> +      -
> +        name: psample_cookie
> +        type: binary
>    -
>      name: userspace-attrs
>      enum-name: ovs-userspace-attr
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index efc82c318fa2..e9cd6f3a952d 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -639,6 +639,7 @@ enum ovs_flow_attr {
>  #define OVS_UFID_F_OMIT_MASK     (1 << 1)
>  #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>
> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>  /**
>   * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
>   * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
> @@ -646,15 +647,27 @@ enum ovs_flow_attr {
>   * %UINT32_MAX samples all packets and intermediate values sample intermediate
>   * fractions of packets.
>   * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
> - * Actions are passed as nested attributes.
> + * Actions are passed as nested attributes. Optional if
> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
> + * provided, will be copied to the psample cookie.

As there is a limit of to the cookie should we mention it here?

>   *
> - * Executes the specified actions with the given probability on a per-packet
> - * basis.
> + * Either OVS_SAMPLE_ATTR_PSAMPLE_GROUP or OVS_SAMPLE_ATTR_ACTIONS must be
> + * specified.
> + *
> + * Executes the specified actions and/or sends the packet to psample
> + * with the given probability on a per-packet basis.
>   */
>  enum ovs_sample_attr {
>  	OVS_SAMPLE_ATTR_UNSPEC,
> -	OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
> -	OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
> +	OVS_SAMPLE_ATTR_PROBABILITY,	/* u32 number */
> +	OVS_SAMPLE_ATTR_ACTIONS,	/* Nested OVS_ACTION_ATTR_

Missing * after OVS_ACTION_ATTR_

> +					 * attributes.
> +					 */
> +	OVS_SAMPLE_ATTR_PSAMPLE_GROUP,	/* u32 number */
> +	OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,	/* binary */

As these are general sample options, I would not add the PSAMPLE reference. Other data paths could use a different implementation. So I guess OVS_SAMPLE_ATTR_GROUP_ID and OVS_SAMPLE_ATTR_COOKIE would be enough.

>  	__OVS_SAMPLE_ATTR_MAX,
>
>  #ifdef __KERNEL__
> @@ -665,13 +678,27 @@ enum ovs_sample_attr {
>  #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
>
>  #ifdef __KERNEL__
> +
> +/* Definition for flags in struct sample_arg. */
> +enum {
> +	/* When set, actions in sample will not change the flows. */
> +	OVS_SAMPLE_ARG_FLAG_EXEC = 1 << 0,
> +	/* When set, the packet will be sent to psample. */
> +	OVS_SAMPLE_ARG_FLAG_PSAMPLE = 1 << 1,
> +};
> +
>  struct sample_arg {
> -	bool exec;                   /* When true, actions in sample will not
> -				      * change flow keys. False otherwise.
> -				      */
> -	u32  probability;            /* Same value as
> -				      * 'OVS_SAMPLE_ATTR_PROBABILITY'.
> -				      */


Not sure if you can actually do this, you are changing a structure that is part of the UAPI. This change breaks backwards compatibility.


> +	u16 flags;		/* Flags that modify the behavior of the
> +				 * action. See SAMPLE_ARG_FLAG_*.
> +				 */
> +	u32  probability;       /* Same value as
> +				 * 'OVS_SAMPLE_ATTR_PROBABILITY'.
> +				 */
> +	u32  group_id;		/* Same value as
> +				 * 'OVS_SAMPLE_ATTR_PSAMPLE_GROUP'.
> +				 */
> +	u8  cookie_len;		/* Length of psample cookie. */
> +	char cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE]; /* psample cookie data. */

Would it make sense for the cookie also to be u8?

>  };
>  #endif
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 6fcd7e2ca81f..eb3166986fd2 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -24,6 +24,7 @@
>  #include <net/checksum.h>
>  #include <net/dsfield.h>
>  #include <net/mpls.h>
> +#include <net/psample.h>
>  #include <net/sctp/checksum.h>
>
>  #include "datapath.h"
> @@ -1025,6 +1026,34 @@ static int dec_ttl_exception_handler(struct datapath *dp, struct sk_buff *skb,
>  	return 0;
>  }
>
> +static int ovs_psample_packet(struct datapath *dp, struct sw_flow_key *key,
> +			      const struct sample_arg *arg,
> +			      struct sk_buff *skb)
> +{
> +	struct psample_group psample_group = {};
> +	struct psample_metadata md = {};
> +	struct vport *input_vport;
> +	u32 rate;
> +
> +	psample_group.group_num = arg->group_id;
> +	psample_group.net = ovs_dp_get_net(dp);
> +
> +	input_vport = ovs_vport_rcu(dp, key->phy.in_port);
> +	if (!input_vport)
> +		input_vport = ovs_vport_rcu(dp, OVSP_LOCAL);
> +
> +	md.in_ifindex = input_vport->dev->ifindex;
> +	md.user_cookie = arg->cookie_len ? &arg->cookie[0] : NULL;
> +	md.user_cookie_len = arg->cookie_len;
> +	md.trunc_size = skb->len;
> +
> +	rate = arg->probability ? U32_MAX / arg->probability : 0;
> +
> +	psample_sample_packet(&psample_group, skb, rate, &md);

Does this mean now the ovs module, now is dependent on the presence of psample? I think we should only support sampling to psample if the module exists, else we should return an error. There might be distributions not including psample by default.

> +
> +	return 0;
> +}
> +
>  /* When 'last' is true, sample() should always consume the 'skb'.
>   * Otherwise, sample() should keep 'skb' intact regardless what
>   * actions are executed within sample().
> @@ -1033,11 +1062,12 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>  		  struct sw_flow_key *key, const struct nlattr *attr,
>  		  bool last)
>  {
> -	struct nlattr *actions;
> +	const struct sample_arg *arg;
>  	struct nlattr *sample_arg;
>  	int rem = nla_len(attr);
> -	const struct sample_arg *arg;
> +	struct nlattr *actions;
>  	bool clone_flow_key;
> +	int ret;
>
>  	/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
>  	sample_arg = nla_data(attr);
> @@ -1051,9 +1081,20 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>  		return 0;
>  	}
>
> -	clone_flow_key = !arg->exec;
> -	return clone_execute(dp, skb, key, 0, actions, rem, last,
> -			     clone_flow_key);
> +	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
> +		ret = ovs_psample_packet(dp, key, arg, skb);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (nla_ok(actions, rem)) {
> +		clone_flow_key = !(arg->flags & OVS_SAMPLE_ARG_FLAG_EXEC);
> +		ret = clone_execute(dp, skb, key, 0, actions, rem, last,
> +				    clone_flow_key);
> +	} else if (last) {
> +		ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> +	}
> +	return ret;
>  }
>
>  /* When 'last' is true, clone() should always consume the 'skb'.
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index f224d9bcea5e..1a348d3905fc 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -2561,6 +2561,9 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>  				  u32 mpls_label_count, bool log,
>  				  u32 depth);
>
> +static int copy_action(const struct nlattr *from,
> +		       struct sw_flow_actions **sfa, bool log);
> +
>  static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>  				    const struct sw_flow_key *key,
>  				    struct sw_flow_actions **sfa,
> @@ -2569,10 +2572,10 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>  				    u32 depth)
>  {
>  	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
> -	const struct nlattr *probability, *actions;
> +	const struct nlattr *probability, *actions, *group, *cookie;
> +	struct sample_arg arg = {};
>  	const struct nlattr *a;
>  	int rem, start, err;
> -	struct sample_arg arg;
>
>  	memset(attrs, 0, sizeof(attrs));
>  	nla_for_each_nested(a, attr, rem) {
> @@ -2589,7 +2592,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>  		return -EINVAL;
>
>  	actions = attrs[OVS_SAMPLE_ATTR_ACTIONS];
> -	if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN))
> +	if (actions && (!nla_len(actions) || nla_len(actions) < NLA_HDRLEN))
> +		return -EINVAL;
> +
> +	group = attrs[OVS_SAMPLE_ATTR_PSAMPLE_GROUP];
> +	if (group && nla_len(group) != sizeof(u32))
> +		return -EINVAL;
> +
> +	cookie = attrs[OVS_SAMPLE_ATTR_PSAMPLE_COOKIE];
> +	if (cookie &&
> +	    (!group || nla_len(cookie) > OVS_PSAMPLE_COOKIE_MAX_SIZE))
> +		return -EINVAL;
> +
> +	if (!group && !actions)
>  		return -EINVAL;
>
>  	/* validation done, copy sample action. */
> @@ -2608,7 +2623,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>  	 * If the sample is the last action, it can always be excuted
>  	 * rather than deferred.
>  	 */
> -	arg.exec = last || !actions_may_change_flow(actions);
> +	if (actions && (last || !actions_may_change_flow(actions)))
> +		arg.flags |= OVS_SAMPLE_ARG_FLAG_EXEC;
> +
> +	if (group) {
> +		arg.flags |= OVS_SAMPLE_ARG_FLAG_PSAMPLE;
> +		arg.group_id = nla_get_u32(group);
> +	}
> +
> +	if (cookie) {
> +		memcpy(&arg.cookie[0], nla_data(cookie), nla_len(cookie));
> +		arg.cookie_len = nla_len(cookie);
> +	}
> +
>  	arg.probability = nla_get_u32(probability);
>
>  	err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg),
> @@ -2616,12 +2643,13 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>  	if (err)
>  		return err;
>
> -	err = __ovs_nla_copy_actions(net, actions, key, sfa,
> -				     eth_type, vlan_tci, mpls_label_count, log,
> -				     depth + 1);
> -
> -	if (err)
> -		return err;
> +	if (actions) {
> +		err = __ovs_nla_copy_actions(net, actions, key, sfa,
> +					     eth_type, vlan_tci,
> +					     mpls_label_count, log, depth + 1);
> +		if (err)
> +			return err;
> +	}
>
>  	add_nested_action_end(*sfa, start);
>
> @@ -3553,20 +3581,38 @@ static int sample_action_to_attr(const struct nlattr *attr,
>  		goto out;
>  	}
>
> -	ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
> -	if (!ac_start) {
> -		err = -EMSGSIZE;
> -		goto out;
> +	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
> +		if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PSAMPLE_GROUP,
> +				arg->group_id)) {
> +			err = -EMSGSIZE;
> +			goto out;
> +		}
> +
> +		if (arg->cookie_len &&
> +		    nla_put(skb, OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,
> +			    arg->cookie_len, &arg->cookie[0])) {
> +			err = -EMSGSIZE;
> +			goto out;
> +		}
>  	}
>
> -	err = ovs_nla_put_actions(actions, rem, skb);
> +	if (nla_ok(actions, rem)) {
> +		ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
> +		if (!ac_start) {
> +			err = -EMSGSIZE;
> +			goto out;
> +		}
> +		err = ovs_nla_put_actions(actions, rem, skb);
> +	}
>
>  out:
>  	if (err) {
> -		nla_nest_cancel(skb, ac_start);
> +		if (ac_start)
> +			nla_nest_cancel(skb, ac_start);
>  		nla_nest_cancel(skb, start);
>  	} else {
> -		nla_nest_end(skb, ac_start);
> +		if (ac_start)
> +			nla_nest_end(skb, ac_start);
>  		nla_nest_end(skb, start);
>  	}
>
> -- 
> 2.44.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 6/8] net:openvswitch: add psample support
  2024-05-03  9:43   ` Eelco Chaudron
@ 2024-05-07 14:18     ` Adrian Moreno
  2024-05-08  9:48       ` Eelco Chaudron
  2024-05-08 15:25       ` Aaron Conole
  0 siblings, 2 replies; 25+ messages in thread
From: Adrian Moreno @ 2024-05-07 14:18 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: netdev, aconole, horms, i.maximets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Pravin B Shelar,
	Donald Hunter, linux-kernel, dev



On 5/3/24 11:43 AM, Eelco Chaudron wrote:
> 
> 
> On 24 Apr 2024, at 15:50, Adrian Moreno wrote:
> 
>> Add support for psample sampling via two new attributes to the
>> OVS_ACTION_ATTR_SAMPLE action.
>>
>> OVS_SAMPLE_ATTR_PSAMPLE_GROUP used to pass an integer psample group_id.
>> OVS_SAMPLE_ATTR_PSAMPLE_COOKIE used to pass a variable-length binary
>> cookie that will be forwared to psample.
>>
>> The maximum length of the user-defined cookie is set to 16, same as
>> tc_cookie, to discourage using cookies that will not be offloadable.
>>
>> In order to simplify the internal processing of the action and given the
>> maximum size of the cookie is relatively small, add both fields to the
>> internal-only struct sample_arg.
>>
>> The presence of a group_id mandates that the action shall called the
>> psample module to multicast the packet with such group_id and the
>> user-provided cookie if present. This behavior is orthonogal to
>> also executing the nested actions if present.
>>
>> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> 
> This is not a full review yet. Just some comments, as I’m looking at the user-space patch first and added similar comments.
> 
> I’ll do a proper review of this series once I’m done with user-space part.
> 
> //Eelco
> 
>> ---
>>   Documentation/netlink/specs/ovs_flow.yaml |  6 ++
>>   include/uapi/linux/openvswitch.h          | 49 ++++++++++----
>>   net/openvswitch/actions.c                 | 51 +++++++++++++--
>>   net/openvswitch/flow_netlink.c            | 80 ++++++++++++++++++-----
>>   4 files changed, 153 insertions(+), 33 deletions(-)
>>
>> diff --git a/Documentation/netlink/specs/ovs_flow.yaml b/Documentation/netlink/specs/ovs_flow.yaml
>> index 4fdfc6b5cae9..5543c2937225 100644
>> --- a/Documentation/netlink/specs/ovs_flow.yaml
>> +++ b/Documentation/netlink/specs/ovs_flow.yaml
>> @@ -825,6 +825,12 @@ attribute-sets:
>>           name: actions
>>           type: nest
>>           nested-attributes: action-attrs
>> +      -
>> +        name: psample_group
>> +        type: u32
>> +      -
>> +        name: psample_cookie
>> +        type: binary
>>     -
>>       name: userspace-attrs
>>       enum-name: ovs-userspace-attr
>> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
>> index efc82c318fa2..e9cd6f3a952d 100644
>> --- a/include/uapi/linux/openvswitch.h
>> +++ b/include/uapi/linux/openvswitch.h
>> @@ -639,6 +639,7 @@ enum ovs_flow_attr {
>>   #define OVS_UFID_F_OMIT_MASK     (1 << 1)
>>   #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>>
>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>>   /**
>>    * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
>>    * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
>> @@ -646,15 +647,27 @@ enum ovs_flow_attr {
>>    * %UINT32_MAX samples all packets and intermediate values sample intermediate
>>    * fractions of packets.
>>    * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
>> - * Actions are passed as nested attributes.
>> + * Actions are passed as nested attributes. Optional if
>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
>> + * provided, will be copied to the psample cookie.
> 
> As there is a limit of to the cookie should we mention it here?
> 

I thought OVS_PSAMPLE_COOKIE_MAX_SIZE was expressive enough but sure, we can 
also mention it here.

>>    *
>> - * Executes the specified actions with the given probability on a per-packet
>> - * basis.
>> + * Either OVS_SAMPLE_ATTR_PSAMPLE_GROUP or OVS_SAMPLE_ATTR_ACTIONS must be
>> + * specified.
>> + *
>> + * Executes the specified actions and/or sends the packet to psample
>> + * with the given probability on a per-packet basis.
>>    */
>>   enum ovs_sample_attr {
>>   	OVS_SAMPLE_ATTR_UNSPEC,
>> -	OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
>> -	OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
>> +	OVS_SAMPLE_ATTR_PROBABILITY,	/* u32 number */
>> +	OVS_SAMPLE_ATTR_ACTIONS,	/* Nested OVS_ACTION_ATTR_
> 
> Missing * after OVS_ACTION_ATTR_

As a matter of fact, adding an * makes checkpatch generate a warning IIRC. 
That's why I initially removed it. I can look at fixing checkpatch instead.

> 
>> +					 * attributes.
>> +					 */
>> +	OVS_SAMPLE_ATTR_PSAMPLE_GROUP,	/* u32 number */
>> +	OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,	/* binary */
> 
> As these are general sample options, I would not add the PSAMPLE reference. Other data paths could use a different implementation. So I guess OVS_SAMPLE_ATTR_GROUP_ID and OVS_SAMPLE_ATTR_COOKIE would be enough.
> 

OK. But isn't the API already psample-ish? I mean that the group_id is something 
specific to psample that might not be present in other datapath implementation.


>>   	__OVS_SAMPLE_ATTR_MAX,
>>
>>   #ifdef __KERNEL__
>> @@ -665,13 +678,27 @@ enum ovs_sample_attr {
>>   #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
>>
>>   #ifdef __KERNEL__
>> +
>> +/* Definition for flags in struct sample_arg. */
>> +enum {
>> +	/* When set, actions in sample will not change the flows. */
>> +	OVS_SAMPLE_ARG_FLAG_EXEC = 1 << 0,
>> +	/* When set, the packet will be sent to psample. */
>> +	OVS_SAMPLE_ARG_FLAG_PSAMPLE = 1 << 1,
>> +};
>> +
>>   struct sample_arg {
>> -	bool exec;                   /* When true, actions in sample will not
>> -				      * change flow keys. False otherwise.
>> -				      */
>> -	u32  probability;            /* Same value as
>> -				      * 'OVS_SAMPLE_ATTR_PROBABILITY'.
>> -				      */
> 
> 
> Not sure if you can actually do this, you are changing a structure that is part of the UAPI. This change breaks backwards compatibility.
> 

Hmmm... this the internal argument structure protected by #ifdef __KERNEL__. It 
is used in several actions to optimize the internal action handling (i.e: using 
a compact struct instead of a list of netlink attributes). I guess the reason 
for having it in this file is because the attribute enum is being reused, but I 
wouldn't think of this struct as part of the uAPI.

If the (#ifdef __KERNEL__) does not exclude this struct from the uAPI I think we 
should move it (all of them actually) to some internal file.


> 
>> +	u16 flags;		/* Flags that modify the behavior of the
>> +				 * action. See SAMPLE_ARG_FLAG_*.
>> +				 */
>> +	u32  probability;       /* Same value as
>> +				 * 'OVS_SAMPLE_ATTR_PROBABILITY'.
>> +				 */
>> +	u32  group_id;		/* Same value as
>> +				 * 'OVS_SAMPLE_ATTR_PSAMPLE_GROUP'.
>> +				 */
>> +	u8  cookie_len;		/* Length of psample cookie. */
>> +	char cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE]; /* psample cookie data. */
> 
> Would it make sense for the cookie also to be u8?
> 

Yes, probably.

>>   };
>>   #endif
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 6fcd7e2ca81f..eb3166986fd2 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -24,6 +24,7 @@
>>   #include <net/checksum.h>
>>   #include <net/dsfield.h>
>>   #include <net/mpls.h>
>> +#include <net/psample.h>
>>   #include <net/sctp/checksum.h>
>>
>>   #include "datapath.h"
>> @@ -1025,6 +1026,34 @@ static int dec_ttl_exception_handler(struct datapath *dp, struct sk_buff *skb,
>>   	return 0;
>>   }
>>
>> +static int ovs_psample_packet(struct datapath *dp, struct sw_flow_key *key,
>> +			      const struct sample_arg *arg,
>> +			      struct sk_buff *skb)
>> +{
>> +	struct psample_group psample_group = {};
>> +	struct psample_metadata md = {};
>> +	struct vport *input_vport;
>> +	u32 rate;
>> +
>> +	psample_group.group_num = arg->group_id;
>> +	psample_group.net = ovs_dp_get_net(dp);
>> +
>> +	input_vport = ovs_vport_rcu(dp, key->phy.in_port);
>> +	if (!input_vport)
>> +		input_vport = ovs_vport_rcu(dp, OVSP_LOCAL);
>> +
>> +	md.in_ifindex = input_vport->dev->ifindex;
>> +	md.user_cookie = arg->cookie_len ? &arg->cookie[0] : NULL;
>> +	md.user_cookie_len = arg->cookie_len;
>> +	md.trunc_size = skb->len;
>> +
>> +	rate = arg->probability ? U32_MAX / arg->probability : 0;
>> +
>> +	psample_sample_packet(&psample_group, skb, rate, &md);
> 
> Does this mean now the ovs module, now is dependent on the presence of psample? I think we should only support sampling to psample if the module exists, else we should return an error. There might be distributions not including psample by default.

Agree. I'll add some compile-time checks the same way we do with nf_nat.

> 
>> +
>> +	return 0;
>> +}
>> +
>>   /* When 'last' is true, sample() should always consume the 'skb'.
>>    * Otherwise, sample() should keep 'skb' intact regardless what
>>    * actions are executed within sample().
>> @@ -1033,11 +1062,12 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>>   		  struct sw_flow_key *key, const struct nlattr *attr,
>>   		  bool last)
>>   {
>> -	struct nlattr *actions;
>> +	const struct sample_arg *arg;
>>   	struct nlattr *sample_arg;
>>   	int rem = nla_len(attr);
>> -	const struct sample_arg *arg;
>> +	struct nlattr *actions;
>>   	bool clone_flow_key;
>> +	int ret;
>>
>>   	/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
>>   	sample_arg = nla_data(attr);
>> @@ -1051,9 +1081,20 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>>   		return 0;
>>   	}
>>
>> -	clone_flow_key = !arg->exec;
>> -	return clone_execute(dp, skb, key, 0, actions, rem, last,
>> -			     clone_flow_key);
>> +	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
>> +		ret = ovs_psample_packet(dp, key, arg, skb);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	if (nla_ok(actions, rem)) {
>> +		clone_flow_key = !(arg->flags & OVS_SAMPLE_ARG_FLAG_EXEC);
>> +		ret = clone_execute(dp, skb, key, 0, actions, rem, last,
>> +				    clone_flow_key);
>> +	} else if (last) {
>> +		ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +	}
>> +	return ret;
>>   }
>>
>>   /* When 'last' is true, clone() should always consume the 'skb'.
>> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>> index f224d9bcea5e..1a348d3905fc 100644
>> --- a/net/openvswitch/flow_netlink.c
>> +++ b/net/openvswitch/flow_netlink.c
>> @@ -2561,6 +2561,9 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>>   				  u32 mpls_label_count, bool log,
>>   				  u32 depth);
>>
>> +static int copy_action(const struct nlattr *from,
>> +		       struct sw_flow_actions **sfa, bool log);
>> +
>>   static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>   				    const struct sw_flow_key *key,
>>   				    struct sw_flow_actions **sfa,
>> @@ -2569,10 +2572,10 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>   				    u32 depth)
>>   {
>>   	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
>> -	const struct nlattr *probability, *actions;
>> +	const struct nlattr *probability, *actions, *group, *cookie;
>> +	struct sample_arg arg = {};
>>   	const struct nlattr *a;
>>   	int rem, start, err;
>> -	struct sample_arg arg;
>>
>>   	memset(attrs, 0, sizeof(attrs));
>>   	nla_for_each_nested(a, attr, rem) {
>> @@ -2589,7 +2592,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>   		return -EINVAL;
>>
>>   	actions = attrs[OVS_SAMPLE_ATTR_ACTIONS];
>> -	if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN))
>> +	if (actions && (!nla_len(actions) || nla_len(actions) < NLA_HDRLEN))
>> +		return -EINVAL;
>> +
>> +	group = attrs[OVS_SAMPLE_ATTR_PSAMPLE_GROUP];
>> +	if (group && nla_len(group) != sizeof(u32))
>> +		return -EINVAL;
>> +
>> +	cookie = attrs[OVS_SAMPLE_ATTR_PSAMPLE_COOKIE];
>> +	if (cookie &&
>> +	    (!group || nla_len(cookie) > OVS_PSAMPLE_COOKIE_MAX_SIZE))
>> +		return -EINVAL;
>> +
>> +	if (!group && !actions)
>>   		return -EINVAL;
>>
>>   	/* validation done, copy sample action. */
>> @@ -2608,7 +2623,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>   	 * If the sample is the last action, it can always be excuted
>>   	 * rather than deferred.
>>   	 */
>> -	arg.exec = last || !actions_may_change_flow(actions);
>> +	if (actions && (last || !actions_may_change_flow(actions)))
>> +		arg.flags |= OVS_SAMPLE_ARG_FLAG_EXEC;
>> +
>> +	if (group) {
>> +		arg.flags |= OVS_SAMPLE_ARG_FLAG_PSAMPLE;
>> +		arg.group_id = nla_get_u32(group);
>> +	}
>> +
>> +	if (cookie) {
>> +		memcpy(&arg.cookie[0], nla_data(cookie), nla_len(cookie));
>> +		arg.cookie_len = nla_len(cookie);
>> +	}
>> +
>>   	arg.probability = nla_get_u32(probability);
>>
>>   	err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg),
>> @@ -2616,12 +2643,13 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>   	if (err)
>>   		return err;
>>
>> -	err = __ovs_nla_copy_actions(net, actions, key, sfa,
>> -				     eth_type, vlan_tci, mpls_label_count, log,
>> -				     depth + 1);
>> -
>> -	if (err)
>> -		return err;
>> +	if (actions) {
>> +		err = __ovs_nla_copy_actions(net, actions, key, sfa,
>> +					     eth_type, vlan_tci,
>> +					     mpls_label_count, log, depth + 1);
>> +		if (err)
>> +			return err;
>> +	}
>>
>>   	add_nested_action_end(*sfa, start);
>>
>> @@ -3553,20 +3581,38 @@ static int sample_action_to_attr(const struct nlattr *attr,
>>   		goto out;
>>   	}
>>
>> -	ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
>> -	if (!ac_start) {
>> -		err = -EMSGSIZE;
>> -		goto out;
>> +	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
>> +		if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PSAMPLE_GROUP,
>> +				arg->group_id)) {
>> +			err = -EMSGSIZE;
>> +			goto out;
>> +		}
>> +
>> +		if (arg->cookie_len &&
>> +		    nla_put(skb, OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,
>> +			    arg->cookie_len, &arg->cookie[0])) {
>> +			err = -EMSGSIZE;
>> +			goto out;
>> +		}
>>   	}
>>
>> -	err = ovs_nla_put_actions(actions, rem, skb);
>> +	if (nla_ok(actions, rem)) {
>> +		ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
>> +		if (!ac_start) {
>> +			err = -EMSGSIZE;
>> +			goto out;
>> +		}
>> +		err = ovs_nla_put_actions(actions, rem, skb);
>> +	}
>>
>>   out:
>>   	if (err) {
>> -		nla_nest_cancel(skb, ac_start);
>> +		if (ac_start)
>> +			nla_nest_cancel(skb, ac_start);
>>   		nla_nest_cancel(skb, start);
>>   	} else {
>> -		nla_nest_end(skb, ac_start);
>> +		if (ac_start)
>> +			nla_nest_end(skb, ac_start);
>>   		nla_nest_end(skb, start);
>>   	}
>>
>> -- 
>> 2.44.0
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 6/8] net:openvswitch: add psample support
  2024-05-07 14:18     ` Adrian Moreno
@ 2024-05-08  9:48       ` Eelco Chaudron
  2024-05-08 15:25       ` Aaron Conole
  1 sibling, 0 replies; 25+ messages in thread
From: Eelco Chaudron @ 2024-05-08  9:48 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: netdev, aconole, horms, i.maximets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Pravin B Shelar,
	Donald Hunter, linux-kernel, dev



On 7 May 2024, at 16:18, Adrian Moreno wrote:

> On 5/3/24 11:43 AM, Eelco Chaudron wrote:
>>
>>
>> On 24 Apr 2024, at 15:50, Adrian Moreno wrote:
>>
>>> Add support for psample sampling via two new attributes to the
>>> OVS_ACTION_ATTR_SAMPLE action.
>>>
>>> OVS_SAMPLE_ATTR_PSAMPLE_GROUP used to pass an integer psample group_id.
>>> OVS_SAMPLE_ATTR_PSAMPLE_COOKIE used to pass a variable-length binary
>>> cookie that will be forwared to psample.
>>>
>>> The maximum length of the user-defined cookie is set to 16, same as
>>> tc_cookie, to discourage using cookies that will not be offloadable.
>>>
>>> In order to simplify the internal processing of the action and given the
>>> maximum size of the cookie is relatively small, add both fields to the
>>> internal-only struct sample_arg.
>>>
>>> The presence of a group_id mandates that the action shall called the
>>> psample module to multicast the packet with such group_id and the
>>> user-provided cookie if present. This behavior is orthonogal to
>>> also executing the nested actions if present.
>>>
>>> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
>>
>> This is not a full review yet. Just some comments, as I’m looking at the user-space patch first and added similar comments.
>>
>> I’ll do a proper review of this series once I’m done with user-space part.
>>
>> //Eelco
>>
>>> ---
>>>   Documentation/netlink/specs/ovs_flow.yaml |  6 ++
>>>   include/uapi/linux/openvswitch.h          | 49 ++++++++++----
>>>   net/openvswitch/actions.c                 | 51 +++++++++++++--
>>>   net/openvswitch/flow_netlink.c            | 80 ++++++++++++++++++-----
>>>   4 files changed, 153 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/Documentation/netlink/specs/ovs_flow.yaml b/Documentation/netlink/specs/ovs_flow.yaml
>>> index 4fdfc6b5cae9..5543c2937225 100644
>>> --- a/Documentation/netlink/specs/ovs_flow.yaml
>>> +++ b/Documentation/netlink/specs/ovs_flow.yaml
>>> @@ -825,6 +825,12 @@ attribute-sets:
>>>           name: actions
>>>           type: nest
>>>           nested-attributes: action-attrs
>>> +      -
>>> +        name: psample_group
>>> +        type: u32
>>> +      -
>>> +        name: psample_cookie
>>> +        type: binary
>>>     -
>>>       name: userspace-attrs
>>>       enum-name: ovs-userspace-attr
>>> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
>>> index efc82c318fa2..e9cd6f3a952d 100644
>>> --- a/include/uapi/linux/openvswitch.h
>>> +++ b/include/uapi/linux/openvswitch.h
>>> @@ -639,6 +639,7 @@ enum ovs_flow_attr {
>>>   #define OVS_UFID_F_OMIT_MASK     (1 << 1)
>>>   #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>>>
>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>>>   /**
>>>    * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
>>>    * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
>>> @@ -646,15 +647,27 @@ enum ovs_flow_attr {
>>>    * %UINT32_MAX samples all packets and intermediate values sample intermediate
>>>    * fractions of packets.
>>>    * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
>>> - * Actions are passed as nested attributes.
>>> + * Actions are passed as nested attributes. Optional if
>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
>>> + * provided, will be copied to the psample cookie.
>>
>> As there is a limit of to the cookie should we mention it here?
>>
>
> I thought OVS_PSAMPLE_COOKIE_MAX_SIZE was expressive enough but sure, we can also mention it here.
>
>>>    *
>>> - * Executes the specified actions with the given probability on a per-packet
>>> - * basis.
>>> + * Either OVS_SAMPLE_ATTR_PSAMPLE_GROUP or OVS_SAMPLE_ATTR_ACTIONS must be
>>> + * specified.
>>> + *
>>> + * Executes the specified actions and/or sends the packet to psample
>>> + * with the given probability on a per-packet basis.
>>>    */
>>>   enum ovs_sample_attr {
>>>   	OVS_SAMPLE_ATTR_UNSPEC,
>>> -	OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
>>> -	OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
>>> +	OVS_SAMPLE_ATTR_PROBABILITY,	/* u32 number */
>>> +	OVS_SAMPLE_ATTR_ACTIONS,	/* Nested OVS_ACTION_ATTR_
>>
>> Missing * after OVS_ACTION_ATTR_
>
> As a matter of fact, adding an * makes checkpatch generate a warning IIRC. That's why I initially removed it. I can look at fixing checkpatch instead.
>
>>
>>> +					 * attributes.
>>> +					 */
>>> +	OVS_SAMPLE_ATTR_PSAMPLE_GROUP,	/* u32 number */
>>> +	OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,	/* binary */
>>
>> As these are general sample options, I would not add the PSAMPLE reference. Other data paths could use a different implementation. So I guess OVS_SAMPLE_ATTR_GROUP_ID and OVS_SAMPLE_ATTR_COOKIE would be enough.
>>
>
> OK. But isn't the API already psample-ish? I mean that the group_id is something specific to psample that might not be present in other datapath implementation.

Well, I see it as a general GROUP and COOKIE attribute that should be available as part of the SAMPLE being taken/provided by the Datapath implementation.

Not sure what we should call PSAMPLE, but some suggestions are;

  OVS_SAMPLE_ATTR_OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE

>>>   	__OVS_SAMPLE_ATTR_MAX,
>>>
>>>   #ifdef __KERNEL__
>>> @@ -665,13 +678,27 @@ enum ovs_sample_attr {
>>>   #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
>>>
>>>   #ifdef __KERNEL__
>>> +
>>> +/* Definition for flags in struct sample_arg. */
>>> +enum {
>>> +	/* When set, actions in sample will not change the flows. */
>>> +	OVS_SAMPLE_ARG_FLAG_EXEC = 1 << 0,
>>> +	/* When set, the packet will be sent to psample. */
>>> +	OVS_SAMPLE_ARG_FLAG_PSAMPLE = 1 << 1,
>>> +};
>>> +
>>>   struct sample_arg {
>>> -	bool exec;                   /* When true, actions in sample will not
>>> -				      * change flow keys. False otherwise.
>>> -				      */
>>> -	u32  probability;            /* Same value as
>>> -				      * 'OVS_SAMPLE_ATTR_PROBABILITY'.
>>> -				      */
>>
>>
>> Not sure if you can actually do this, you are changing a structure that is part of the UAPI. This change breaks backwards compatibility.
>>
>
> Hmmm... this the internal argument structure protected by #ifdef __KERNEL__. It is used in several actions to optimize the internal action handling (i.e: using a compact struct instead of a list of netlink attributes). I guess the reason for having it in this file is because the attribute enum is being reused, but I wouldn't think of this struct as part of the uAPI.
>
> If the (#ifdef __KERNEL__) does not exclude this struct from the uAPI I think we should move it (all of them actually) to some internal file.

You are right, I missed the ‘#ifdef __KERNEL__’ kernel.

>>
>>> +	u16 flags;		/* Flags that modify the behavior of the
>>> +				 * action. See SAMPLE_ARG_FLAG_*.
>>> +				 */
>>> +	u32  probability;       /* Same value as
>>> +				 * 'OVS_SAMPLE_ATTR_PROBABILITY'.
>>> +				 */
>>> +	u32  group_id;		/* Same value as
>>> +				 * 'OVS_SAMPLE_ATTR_PSAMPLE_GROUP'.
>>> +				 */
>>> +	u8  cookie_len;		/* Length of psample cookie. */
>>> +	char cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE]; /* psample cookie data. */
>>
>> Would it make sense for the cookie also to be u8?
>>
>
> Yes, probably.
>
>>>   };
>>>   #endif
>>>
>>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>>> index 6fcd7e2ca81f..eb3166986fd2 100644
>>> --- a/net/openvswitch/actions.c
>>> +++ b/net/openvswitch/actions.c
>>> @@ -24,6 +24,7 @@
>>>   #include <net/checksum.h>
>>>   #include <net/dsfield.h>
>>>   #include <net/mpls.h>
>>> +#include <net/psample.h>
>>>   #include <net/sctp/checksum.h>
>>>
>>>   #include "datapath.h"
>>> @@ -1025,6 +1026,34 @@ static int dec_ttl_exception_handler(struct datapath *dp, struct sk_buff *skb,
>>>   	return 0;
>>>   }
>>>
>>> +static int ovs_psample_packet(struct datapath *dp, struct sw_flow_key *key,
>>> +			      const struct sample_arg *arg,
>>> +			      struct sk_buff *skb)
>>> +{
>>> +	struct psample_group psample_group = {};
>>> +	struct psample_metadata md = {};
>>> +	struct vport *input_vport;
>>> +	u32 rate;
>>> +
>>> +	psample_group.group_num = arg->group_id;
>>> +	psample_group.net = ovs_dp_get_net(dp);
>>> +
>>> +	input_vport = ovs_vport_rcu(dp, key->phy.in_port);
>>> +	if (!input_vport)
>>> +		input_vport = ovs_vport_rcu(dp, OVSP_LOCAL);
>>> +
>>> +	md.in_ifindex = input_vport->dev->ifindex;
>>> +	md.user_cookie = arg->cookie_len ? &arg->cookie[0] : NULL;
>>> +	md.user_cookie_len = arg->cookie_len;
>>> +	md.trunc_size = skb->len;
>>> +
>>> +	rate = arg->probability ? U32_MAX / arg->probability : 0;
>>> +
>>> +	psample_sample_packet(&psample_group, skb, rate, &md);
>>
>> Does this mean now the ovs module, now is dependent on the presence of psample? I think we should only support sampling to psample if the module exists, else we should return an error. There might be distributions not including psample by default.
>
> Agree. I'll add some compile-time checks the same way we do with nf_nat.
>
>>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>   /* When 'last' is true, sample() should always consume the 'skb'.
>>>    * Otherwise, sample() should keep 'skb' intact regardless what
>>>    * actions are executed within sample().
>>> @@ -1033,11 +1062,12 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>>>   		  struct sw_flow_key *key, const struct nlattr *attr,
>>>   		  bool last)
>>>   {
>>> -	struct nlattr *actions;
>>> +	const struct sample_arg *arg;
>>>   	struct nlattr *sample_arg;
>>>   	int rem = nla_len(attr);
>>> -	const struct sample_arg *arg;
>>> +	struct nlattr *actions;
>>>   	bool clone_flow_key;
>>> +	int ret;
>>>
>>>   	/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
>>>   	sample_arg = nla_data(attr);
>>> @@ -1051,9 +1081,20 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>>>   		return 0;
>>>   	}
>>>
>>> -	clone_flow_key = !arg->exec;
>>> -	return clone_execute(dp, skb, key, 0, actions, rem, last,
>>> -			     clone_flow_key);
>>> +	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
>>> +		ret = ovs_psample_packet(dp, key, arg, skb);
>>> +		if (ret)
>>> +			return ret;
>>> +	}
>>> +
>>> +	if (nla_ok(actions, rem)) {
>>> +		clone_flow_key = !(arg->flags & OVS_SAMPLE_ARG_FLAG_EXEC);
>>> +		ret = clone_execute(dp, skb, key, 0, actions, rem, last,
>>> +				    clone_flow_key);
>>> +	} else if (last) {
>>> +		ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>>> +	}
>>> +	return ret;
>>>   }
>>>
>>>   /* When 'last' is true, clone() should always consume the 'skb'.
>>> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>>> index f224d9bcea5e..1a348d3905fc 100644
>>> --- a/net/openvswitch/flow_netlink.c
>>> +++ b/net/openvswitch/flow_netlink.c
>>> @@ -2561,6 +2561,9 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>>>   				  u32 mpls_label_count, bool log,
>>>   				  u32 depth);
>>>
>>> +static int copy_action(const struct nlattr *from,
>>> +		       struct sw_flow_actions **sfa, bool log);
>>> +
>>>   static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   				    const struct sw_flow_key *key,
>>>   				    struct sw_flow_actions **sfa,
>>> @@ -2569,10 +2572,10 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   				    u32 depth)
>>>   {
>>>   	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
>>> -	const struct nlattr *probability, *actions;
>>> +	const struct nlattr *probability, *actions, *group, *cookie;
>>> +	struct sample_arg arg = {};
>>>   	const struct nlattr *a;
>>>   	int rem, start, err;
>>> -	struct sample_arg arg;
>>>
>>>   	memset(attrs, 0, sizeof(attrs));
>>>   	nla_for_each_nested(a, attr, rem) {
>>> @@ -2589,7 +2592,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   		return -EINVAL;
>>>
>>>   	actions = attrs[OVS_SAMPLE_ATTR_ACTIONS];
>>> -	if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN))
>>> +	if (actions && (!nla_len(actions) || nla_len(actions) < NLA_HDRLEN))
>>> +		return -EINVAL;
>>> +
>>> +	group = attrs[OVS_SAMPLE_ATTR_PSAMPLE_GROUP];
>>> +	if (group && nla_len(group) != sizeof(u32))
>>> +		return -EINVAL;
>>> +
>>> +	cookie = attrs[OVS_SAMPLE_ATTR_PSAMPLE_COOKIE];
>>> +	if (cookie &&
>>> +	    (!group || nla_len(cookie) > OVS_PSAMPLE_COOKIE_MAX_SIZE))
>>> +		return -EINVAL;
>>> +
>>> +	if (!group && !actions)
>>>   		return -EINVAL;
>>>
>>>   	/* validation done, copy sample action. */
>>> @@ -2608,7 +2623,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   	 * If the sample is the last action, it can always be excuted
>>>   	 * rather than deferred.
>>>   	 */
>>> -	arg.exec = last || !actions_may_change_flow(actions);
>>> +	if (actions && (last || !actions_may_change_flow(actions)))
>>> +		arg.flags |= OVS_SAMPLE_ARG_FLAG_EXEC;
>>> +
>>> +	if (group) {
>>> +		arg.flags |= OVS_SAMPLE_ARG_FLAG_PSAMPLE;
>>> +		arg.group_id = nla_get_u32(group);
>>> +	}
>>> +
>>> +	if (cookie) {
>>> +		memcpy(&arg.cookie[0], nla_data(cookie), nla_len(cookie));
>>> +		arg.cookie_len = nla_len(cookie);
>>> +	}
>>> +
>>>   	arg.probability = nla_get_u32(probability);
>>>
>>>   	err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg),
>>> @@ -2616,12 +2643,13 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   	if (err)
>>>   		return err;
>>>
>>> -	err = __ovs_nla_copy_actions(net, actions, key, sfa,
>>> -				     eth_type, vlan_tci, mpls_label_count, log,
>>> -				     depth + 1);
>>> -
>>> -	if (err)
>>> -		return err;
>>> +	if (actions) {
>>> +		err = __ovs_nla_copy_actions(net, actions, key, sfa,
>>> +					     eth_type, vlan_tci,
>>> +					     mpls_label_count, log, depth + 1);
>>> +		if (err)
>>> +			return err;
>>> +	}
>>>
>>>   	add_nested_action_end(*sfa, start);
>>>
>>> @@ -3553,20 +3581,38 @@ static int sample_action_to_attr(const struct nlattr *attr,
>>>   		goto out;
>>>   	}
>>>
>>> -	ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
>>> -	if (!ac_start) {
>>> -		err = -EMSGSIZE;
>>> -		goto out;
>>> +	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
>>> +		if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PSAMPLE_GROUP,
>>> +				arg->group_id)) {
>>> +			err = -EMSGSIZE;
>>> +			goto out;
>>> +		}
>>> +
>>> +		if (arg->cookie_len &&
>>> +		    nla_put(skb, OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,
>>> +			    arg->cookie_len, &arg->cookie[0])) {
>>> +			err = -EMSGSIZE;
>>> +			goto out;
>>> +		}
>>>   	}
>>>
>>> -	err = ovs_nla_put_actions(actions, rem, skb);
>>> +	if (nla_ok(actions, rem)) {
>>> +		ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
>>> +		if (!ac_start) {
>>> +			err = -EMSGSIZE;
>>> +			goto out;
>>> +		}
>>> +		err = ovs_nla_put_actions(actions, rem, skb);
>>> +	}
>>>
>>>   out:
>>>   	if (err) {
>>> -		nla_nest_cancel(skb, ac_start);
>>> +		if (ac_start)
>>> +			nla_nest_cancel(skb, ac_start);
>>>   		nla_nest_cancel(skb, start);
>>>   	} else {
>>> -		nla_nest_end(skb, ac_start);
>>> +		if (ac_start)
>>> +			nla_nest_end(skb, ac_start);
>>>   		nla_nest_end(skb, start);
>>>   	}
>>>
>>> -- 
>>> 2.44.0
>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next 6/8] net:openvswitch: add psample support
  2024-05-07 14:18     ` Adrian Moreno
  2024-05-08  9:48       ` Eelco Chaudron
@ 2024-05-08 15:25       ` Aaron Conole
  1 sibling, 0 replies; 25+ messages in thread
From: Aaron Conole @ 2024-05-08 15:25 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: Eelco Chaudron, netdev, horms, i.maximets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Pravin B Shelar,
	Donald Hunter, linux-kernel, dev

Adrian Moreno <amorenoz@redhat.com> writes:

> On 5/3/24 11:43 AM, Eelco Chaudron wrote:
>> On 24 Apr 2024, at 15:50, Adrian Moreno wrote:
>> 
>>> Add support for psample sampling via two new attributes to the
>>> OVS_ACTION_ATTR_SAMPLE action.
>>>
>>> OVS_SAMPLE_ATTR_PSAMPLE_GROUP used to pass an integer psample group_id.
>>> OVS_SAMPLE_ATTR_PSAMPLE_COOKIE used to pass a variable-length binary
>>> cookie that will be forwared to psample.
>>>
>>> The maximum length of the user-defined cookie is set to 16, same as
>>> tc_cookie, to discourage using cookies that will not be offloadable.
>>>
>>> In order to simplify the internal processing of the action and given the
>>> maximum size of the cookie is relatively small, add both fields to the
>>> internal-only struct sample_arg.
>>>
>>> The presence of a group_id mandates that the action shall called the
>>> psample module to multicast the packet with such group_id and the
>>> user-provided cookie if present. This behavior is orthonogal to
>>> also executing the nested actions if present.
>>>
>>> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
>> This is not a full review yet. Just some comments, as I’m looking at
>> the user-space patch first and added similar comments.
>> I’ll do a proper review of this series once I’m done with user-space
>> part.
>> //Eelco
>> 
>>> ---
>>>   Documentation/netlink/specs/ovs_flow.yaml |  6 ++
>>>   include/uapi/linux/openvswitch.h          | 49 ++++++++++----
>>>   net/openvswitch/actions.c                 | 51 +++++++++++++--
>>>   net/openvswitch/flow_netlink.c            | 80 ++++++++++++++++++-----
>>>   4 files changed, 153 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/Documentation/netlink/specs/ovs_flow.yaml b/Documentation/netlink/specs/ovs_flow.yaml
>>> index 4fdfc6b5cae9..5543c2937225 100644
>>> --- a/Documentation/netlink/specs/ovs_flow.yaml
>>> +++ b/Documentation/netlink/specs/ovs_flow.yaml
>>> @@ -825,6 +825,12 @@ attribute-sets:
>>>           name: actions
>>>           type: nest
>>>           nested-attributes: action-attrs
>>> +      -
>>> +        name: psample_group
>>> +        type: u32
>>> +      -
>>> +        name: psample_cookie
>>> +        type: binary
>>>     -
>>>       name: userspace-attrs
>>>       enum-name: ovs-userspace-attr
>>> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
>>> index efc82c318fa2..e9cd6f3a952d 100644
>>> --- a/include/uapi/linux/openvswitch.h
>>> +++ b/include/uapi/linux/openvswitch.h
>>> @@ -639,6 +639,7 @@ enum ovs_flow_attr {
>>>   #define OVS_UFID_F_OMIT_MASK     (1 << 1)
>>>   #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>>>
>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>>>   /**
>>>    * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
>>>    * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
>>> @@ -646,15 +647,27 @@ enum ovs_flow_attr {
>>>    * %UINT32_MAX samples all packets and intermediate values sample intermediate
>>>    * fractions of packets.
>>>    * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
>>> - * Actions are passed as nested attributes.
>>> + * Actions are passed as nested attributes. Optional if
>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
>>> + * provided, will be copied to the psample cookie.
>> As there is a limit of to the cookie should we mention it here?
>> 
>
> I thought OVS_PSAMPLE_COOKIE_MAX_SIZE was expressive enough but sure,
> we can also mention it here.
>
>>>    *
>>> - * Executes the specified actions with the given probability on a per-packet
>>> - * basis.
>>> + * Either OVS_SAMPLE_ATTR_PSAMPLE_GROUP or OVS_SAMPLE_ATTR_ACTIONS must be
>>> + * specified.
>>> + *
>>> + * Executes the specified actions and/or sends the packet to psample
>>> + * with the given probability on a per-packet basis.
>>>    */
>>>   enum ovs_sample_attr {
>>>   	OVS_SAMPLE_ATTR_UNSPEC,
>>> -	OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
>>> -	OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
>>> +	OVS_SAMPLE_ATTR_PROBABILITY,	/* u32 number */
>>> +	OVS_SAMPLE_ATTR_ACTIONS,	/* Nested OVS_ACTION_ATTR_
>> Missing * after OVS_ACTION_ATTR_
>
> As a matter of fact, adding an * makes checkpatch generate a warning
> IIRC. That's why I initially removed it. I can look at fixing
> checkpatch instead.

I think we can ignore the warning.  Alternatively, consider not changing
the comment spacing for the existing comments.

>> 
>>> +					 * attributes.
>>> +					 */
>>> +	OVS_SAMPLE_ATTR_PSAMPLE_GROUP,	/* u32 number */
>>> +	OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,	/* binary */
>> As these are general sample options, I would not add the PSAMPLE
>> reference. Other data paths could use a different implementation. So
>> I guess OVS_SAMPLE_ATTR_GROUP_ID and OVS_SAMPLE_ATTR_COOKIE would be
>> enough.
>> 
>
> OK. But isn't the API already psample-ish? I mean that the group_id is
> something specific to psample that might not be present in other
> datapath implementation.
>
>
>>>   	__OVS_SAMPLE_ATTR_MAX,
>>>
>>>   #ifdef __KERNEL__
>>> @@ -665,13 +678,27 @@ enum ovs_sample_attr {
>>>   #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
>>>
>>>   #ifdef __KERNEL__
>>> +
>>> +/* Definition for flags in struct sample_arg. */
>>> +enum {
>>> +	/* When set, actions in sample will not change the flows. */
>>> +	OVS_SAMPLE_ARG_FLAG_EXEC = 1 << 0,
>>> +	/* When set, the packet will be sent to psample. */
>>> +	OVS_SAMPLE_ARG_FLAG_PSAMPLE = 1 << 1,
>>> +};
>>> +
>>>   struct sample_arg {
>>> -	bool exec;                   /* When true, actions in sample will not
>>> -				      * change flow keys. False otherwise.
>>> -				      */
>>> -	u32  probability;            /* Same value as
>>> -				      * 'OVS_SAMPLE_ATTR_PROBABILITY'.
>>> -				      */
>> Not sure if you can actually do this, you are changing a structure
>> that is part of the UAPI. This change breaks backwards
>> compatibility.
>> 
>
> Hmmm... this the internal argument structure protected by #ifdef
> __KERNEL__. It is used in several actions to optimize the internal
> action handling (i.e: using a compact struct instead of a list of
> netlink attributes). I guess the reason for having it in this file is
> because the attribute enum is being reused, but I wouldn't think of
> this struct as part of the uAPI.
>
> If the (#ifdef __KERNEL__) does not exclude this struct from the uAPI
> I think we should move it (all of them actually) to some internal
> file.
>
>> 
>>> +	u16 flags;		/* Flags that modify the behavior of the
>>> +				 * action. See SAMPLE_ARG_FLAG_*.
>>> +				 */
>>> +	u32  probability;       /* Same value as
>>> +				 * 'OVS_SAMPLE_ATTR_PROBABILITY'.
>>> +				 */
>>> +	u32  group_id;		/* Same value as
>>> +				 * 'OVS_SAMPLE_ATTR_PSAMPLE_GROUP'.
>>> +				 */
>>> +	u8  cookie_len;		/* Length of psample cookie. */
>>> +	char cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE]; /* psample cookie data. */
>> Would it make sense for the cookie also to be u8?
>> 
>
> Yes, probably.
>
>>>   };
>>>   #endif
>>>
>>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>>> index 6fcd7e2ca81f..eb3166986fd2 100644
>>> --- a/net/openvswitch/actions.c
>>> +++ b/net/openvswitch/actions.c
>>> @@ -24,6 +24,7 @@
>>>   #include <net/checksum.h>
>>>   #include <net/dsfield.h>
>>>   #include <net/mpls.h>
>>> +#include <net/psample.h>
>>>   #include <net/sctp/checksum.h>
>>>
>>>   #include "datapath.h"
>>> @@ -1025,6 +1026,34 @@ static int dec_ttl_exception_handler(struct datapath *dp, struct sk_buff *skb,
>>>   	return 0;
>>>   }
>>>
>>> +static int ovs_psample_packet(struct datapath *dp, struct sw_flow_key *key,
>>> +			      const struct sample_arg *arg,
>>> +			      struct sk_buff *skb)
>>> +{
>>> +	struct psample_group psample_group = {};
>>> +	struct psample_metadata md = {};
>>> +	struct vport *input_vport;
>>> +	u32 rate;
>>> +
>>> +	psample_group.group_num = arg->group_id;
>>> +	psample_group.net = ovs_dp_get_net(dp);
>>> +
>>> +	input_vport = ovs_vport_rcu(dp, key->phy.in_port);
>>> +	if (!input_vport)
>>> +		input_vport = ovs_vport_rcu(dp, OVSP_LOCAL);
>>> +
>>> +	md.in_ifindex = input_vport->dev->ifindex;
>>> +	md.user_cookie = arg->cookie_len ? &arg->cookie[0] : NULL;
>>> +	md.user_cookie_len = arg->cookie_len;
>>> +	md.trunc_size = skb->len;
>>> +
>>> +	rate = arg->probability ? U32_MAX / arg->probability : 0;
>>> +
>>> +	psample_sample_packet(&psample_group, skb, rate, &md);
>> Does this mean now the ovs module, now is dependent on the presence
>> of psample? I think we should only support sampling to psample if
>> the module exists, else we should return an error. There might be
>> distributions not including psample by default.
>
> Agree. I'll add some compile-time checks the same way we do with nf_nat.
>
>> 
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>   /* When 'last' is true, sample() should always consume the 'skb'.
>>>    * Otherwise, sample() should keep 'skb' intact regardless what
>>>    * actions are executed within sample().
>>> @@ -1033,11 +1062,12 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>>>   		  struct sw_flow_key *key, const struct nlattr *attr,
>>>   		  bool last)
>>>   {
>>> -	struct nlattr *actions;
>>> +	const struct sample_arg *arg;
>>>   	struct nlattr *sample_arg;
>>>   	int rem = nla_len(attr);
>>> -	const struct sample_arg *arg;
>>> +	struct nlattr *actions;
>>>   	bool clone_flow_key;
>>> +	int ret;
>>>
>>>   	/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
>>>   	sample_arg = nla_data(attr);
>>> @@ -1051,9 +1081,20 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>>>   		return 0;
>>>   	}
>>>
>>> -	clone_flow_key = !arg->exec;
>>> -	return clone_execute(dp, skb, key, 0, actions, rem, last,
>>> -			     clone_flow_key);
>>> +	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
>>> +		ret = ovs_psample_packet(dp, key, arg, skb);
>>> +		if (ret)
>>> +			return ret;
>>> +	}
>>> +
>>> +	if (nla_ok(actions, rem)) {
>>> +		clone_flow_key = !(arg->flags & OVS_SAMPLE_ARG_FLAG_EXEC);
>>> +		ret = clone_execute(dp, skb, key, 0, actions, rem, last,
>>> +				    clone_flow_key);
>>> +	} else if (last) {
>>> +		ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>>> +	}
>>> +	return ret;
>>>   }
>>>
>>>   /* When 'last' is true, clone() should always consume the 'skb'.
>>> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>>> index f224d9bcea5e..1a348d3905fc 100644
>>> --- a/net/openvswitch/flow_netlink.c
>>> +++ b/net/openvswitch/flow_netlink.c
>>> @@ -2561,6 +2561,9 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>>>   				  u32 mpls_label_count, bool log,
>>>   				  u32 depth);
>>>
>>> +static int copy_action(const struct nlattr *from,
>>> +		       struct sw_flow_actions **sfa, bool log);
>>> +
>>>   static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   				    const struct sw_flow_key *key,
>>>   				    struct sw_flow_actions **sfa,
>>> @@ -2569,10 +2572,10 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   				    u32 depth)
>>>   {
>>>   	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
>>> -	const struct nlattr *probability, *actions;
>>> +	const struct nlattr *probability, *actions, *group, *cookie;
>>> +	struct sample_arg arg = {};
>>>   	const struct nlattr *a;
>>>   	int rem, start, err;
>>> -	struct sample_arg arg;
>>>
>>>   	memset(attrs, 0, sizeof(attrs));
>>>   	nla_for_each_nested(a, attr, rem) {
>>> @@ -2589,7 +2592,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   		return -EINVAL;
>>>
>>>   	actions = attrs[OVS_SAMPLE_ATTR_ACTIONS];
>>> -	if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN))
>>> +	if (actions && (!nla_len(actions) || nla_len(actions) < NLA_HDRLEN))
>>> +		return -EINVAL;
>>> +
>>> +	group = attrs[OVS_SAMPLE_ATTR_PSAMPLE_GROUP];
>>> +	if (group && nla_len(group) != sizeof(u32))
>>> +		return -EINVAL;
>>> +
>>> +	cookie = attrs[OVS_SAMPLE_ATTR_PSAMPLE_COOKIE];
>>> +	if (cookie &&
>>> +	    (!group || nla_len(cookie) > OVS_PSAMPLE_COOKIE_MAX_SIZE))
>>> +		return -EINVAL;
>>> +
>>> +	if (!group && !actions)
>>>   		return -EINVAL;
>>>
>>>   	/* validation done, copy sample action. */
>>> @@ -2608,7 +2623,19 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   	 * If the sample is the last action, it can always be excuted
>>>   	 * rather than deferred.
>>>   	 */
>>> -	arg.exec = last || !actions_may_change_flow(actions);
>>> +	if (actions && (last || !actions_may_change_flow(actions)))
>>> +		arg.flags |= OVS_SAMPLE_ARG_FLAG_EXEC;
>>> +
>>> +	if (group) {
>>> +		arg.flags |= OVS_SAMPLE_ARG_FLAG_PSAMPLE;
>>> +		arg.group_id = nla_get_u32(group);
>>> +	}
>>> +
>>> +	if (cookie) {
>>> +		memcpy(&arg.cookie[0], nla_data(cookie), nla_len(cookie));
>>> +		arg.cookie_len = nla_len(cookie);
>>> +	}
>>> +
>>>   	arg.probability = nla_get_u32(probability);
>>>
>>>   	err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg),
>>> @@ -2616,12 +2643,13 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
>>>   	if (err)
>>>   		return err;
>>>
>>> -	err = __ovs_nla_copy_actions(net, actions, key, sfa,
>>> -				     eth_type, vlan_tci, mpls_label_count, log,
>>> -				     depth + 1);
>>> -
>>> -	if (err)
>>> -		return err;
>>> +	if (actions) {
>>> +		err = __ovs_nla_copy_actions(net, actions, key, sfa,
>>> +					     eth_type, vlan_tci,
>>> +					     mpls_label_count, log, depth + 1);
>>> +		if (err)
>>> +			return err;
>>> +	}
>>>
>>>   	add_nested_action_end(*sfa, start);
>>>
>>> @@ -3553,20 +3581,38 @@ static int sample_action_to_attr(const struct nlattr *attr,
>>>   		goto out;
>>>   	}
>>>
>>> -	ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
>>> -	if (!ac_start) {
>>> -		err = -EMSGSIZE;
>>> -		goto out;
>>> +	if (arg->flags & OVS_SAMPLE_ARG_FLAG_PSAMPLE) {
>>> +		if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PSAMPLE_GROUP,
>>> +				arg->group_id)) {
>>> +			err = -EMSGSIZE;
>>> +			goto out;
>>> +		}
>>> +
>>> +		if (arg->cookie_len &&
>>> +		    nla_put(skb, OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,
>>> +			    arg->cookie_len, &arg->cookie[0])) {
>>> +			err = -EMSGSIZE;
>>> +			goto out;
>>> +		}
>>>   	}
>>>
>>> -	err = ovs_nla_put_actions(actions, rem, skb);
>>> +	if (nla_ok(actions, rem)) {
>>> +		ac_start = nla_nest_start_noflag(skb, OVS_SAMPLE_ATTR_ACTIONS);
>>> +		if (!ac_start) {
>>> +			err = -EMSGSIZE;
>>> +			goto out;
>>> +		}
>>> +		err = ovs_nla_put_actions(actions, rem, skb);
>>> +	}
>>>
>>>   out:
>>>   	if (err) {
>>> -		nla_nest_cancel(skb, ac_start);
>>> +		if (ac_start)
>>> +			nla_nest_cancel(skb, ac_start);
>>>   		nla_nest_cancel(skb, start);
>>>   	} else {
>>> -		nla_nest_end(skb, ac_start);
>>> +		if (ac_start)
>>> +			nla_nest_end(skb, ac_start);
>>>   		nla_nest_end(skb, start);
>>>   	}
>>>
>>> -- 2.44.0
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-05-08 15:25 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-24 13:50 [PATCH net-next 0/8] net: openvswitch: Add sample multicasting Adrian Moreno
2024-04-24 13:50 ` [PATCH net-next 1/8] net: netlink: export genl private pointer getters Adrian Moreno
2024-04-24 13:50 ` [PATCH net-next 2/8] net: psample: add multicast filtering on group_id Adrian Moreno
2024-04-24 14:54   ` Jiri Pirko
2024-04-25  7:23     ` Adrian Moreno
2024-04-24 13:50 ` [PATCH net-next 3/8] net: psample: add user cookie Adrian Moreno
2024-04-25  7:32   ` Ido Schimmel
2024-04-25  8:09     ` Adrian Moreno
2024-04-24 13:50 ` [PATCH net-next 4/8] net: psample: add tracepoint Adrian Moreno
2024-04-25  7:18   ` Ido Schimmel
2024-04-25  8:06     ` Adrian Moreno
2024-04-25 15:25       ` Ido Schimmel
2024-04-29  5:33         ` Adrian Moreno
2024-04-30 12:53           ` Ido Schimmel
2024-04-24 13:50 ` [PATCH net-next 5/8] net: sched: act_sample: add action cookie to sample Adrian Moreno
2024-04-25  7:39   ` Ido Schimmel
2024-04-25 21:43   ` Jamal Hadi Salim
2024-04-24 13:50 ` [PATCH net-next 6/8] net:openvswitch: add psample support Adrian Moreno
2024-04-30  7:29   ` Dan Carpenter
2024-05-03  9:43   ` Eelco Chaudron
2024-05-07 14:18     ` Adrian Moreno
2024-05-08  9:48       ` Eelco Chaudron
2024-05-08 15:25       ` Aaron Conole
2024-04-24 13:50 ` [PATCH net-next 7/8] selftests: openvswitch: add sample action Adrian Moreno
2024-04-24 13:50 ` [PATCH net-next 8/8] selftests: openvswitch: add psample test Adrian Moreno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).