All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action
@ 2019-03-19 19:49 Kevin 'ldir' Darbyshire-Bryant
  2019-03-19 19:49 ` [PATCH 1/1] " Kevin 'ldir' Darbyshire-Bryant
  2019-03-22 14:09 ` [RFC PATCH 1/1 v2] " Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 2 replies; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-19 19:49 UTC (permalink / raw)
  To: netdev; +Cc: jiri, xiyou.wangcong, jhs, Kevin 'ldir' Darbyshire-Bryant

With nervousness and trepidation I'm submitting the attached RFC patch
for 'conndscp'.

Conndscp is a new tc filter action module.  It is designed to copy DSCPs
to conntrack marks and the reverse operation of conntrack mark contained
DSCPs to the diffserv field of suitable skbs.

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to implement
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

conndscp understands the following parameters:

mask - a 32 bit mask of at least 6 contiguous bits where conndscp will
place the DSCP in conntrack mark.  The DSCP is left-shifted by the
number of unset lower bits of the mask before storing into the mark
field.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by mask.  This represents a conditional operation flag - get
will only store the DSCP if the flag is unset.  set will only restore
the DSCP if the flag is set.  This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A mask
of zero disables the conditional behaviour.

mode - get/set/both - get stores the DSCP into the mark, set restores
the DSCP into the diffserv field from the mark, both 'gets' the mark and
then 'sets' it in that order.

optional parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>


A typical example of using conndscp to restore DSCP values for use with
a qdisc (e.g. CAKE) is shown below, using top 6 bits to store the DSCP
and the bottom bit of top byte as the state flag.

# egress qdisc
tc qdisc add dev eth0 cake bandwidth 20000kbit
# put an action on the egress interface to get DSCP to connmark->mark
# and to set DSCP from the stored connmark.
# this seems counter intuitive but it ensures once the mark is set that all
# subsequent egress packets have the same stored DSCP avoiding iptables rules
# to mark every packet, conndscp does it for us and then CAKE is happy using the
# DSCP
tc filter add dev eth0 protocol all prio 10 u32 match u32 0 0 flowid 1:1 action \
	conndscp mask 0xfc000000 statemask 0x01000000 mode both


#ingress qdisc via an ifb

tc qdisc add dev eth0 handle ffff: ingress
tc qdisc add dev ifb4eth0 cake badnwidth 80000kbit
ip link set ifb4eth0 up
# redirect all packets arriving on eth0 to ifb4eth0 and restore the DSCP from connmark
tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \
	match u32 0 0 flowid 1:1 action \
	conndscp mask 0xfc000000 statemask 0x01000000 mode set \
	mirred egress redirect dev ifb4eth0

#iptables rules using the statemask flag to only do it once

iptables -t mangle -N QOS_MARK_eth0

iptables -t mangle -A QOS_MARK_eth0 -m set --match-set Bulk4  dst -j DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
#add more rules similar to above as required


# send unmarked packets to the marking chain - conndscp will set the statemask bit
# if not already set.
iptables -t mangle -A POSTROUTING -o eth0 -m connmark --mark 0x00000000/0x01000000 -g QOS_MARK_eth0

conndscp (almost) shamelessly copies code from connmark and therefore
contains the same limitations.

I am not a full time programmer, conndscp represents something of the
order of a 2 week struggle, my C is awful, kernel & network knowledge
worse, though I like to think improving.  There are no doubt issues with
this patch/feature but I hope constructive feedback, quite possibly in
very short words for my simple brain, will knock it into shape.

Thanks for your time.

Kevin Darbyshire-Bryant (1):
  net: sched: Introduce conndscp action

 include/net/tc_act/tc_conndscp.h          |  19 ++
 include/uapi/linux/tc_act/tc_conndscp.h   |  33 +++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_conndscp.c                  | 333 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 6 files changed, 400 insertions(+)
 create mode 100644 include/net/tc_act/tc_conndscp.h
 create mode 100644 include/uapi/linux/tc_act/tc_conndscp.h
 create mode 100644 net/sched/act_conndscp.c

-- 
2.17.2 (Apple Git-113)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/1] net: sched: Introduce conndscp action
  2019-03-19 19:49 [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-19 19:49 ` Kevin 'ldir' Darbyshire-Bryant
  2019-03-21 17:58   ` kbuild test robot
  2019-03-22 14:09 ` [RFC PATCH 1/1 v2] " Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-19 19:49 UTC (permalink / raw)
  To: netdev; +Cc: jiri, xiyou.wangcong, jhs, Kevin 'ldir' Darbyshire-Bryant

Conndscp is a new tc filter action module.  It is designed to copy DSCPs
to conntrack marks and the reverse operation of conntrack mark contained
DSCPs to the diffserv field of suitable skbs.

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to implement
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

conndscp understands the following parameters:

mask - a 32 bit mask of at least 6 contiguous bits where conndscp will
place the DSCP in conntrack mark.  The DSCP is left-shifted by the
number of unset lower bits of the mask before storing into the mark
field.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by mask.  This represents a conditional operation flag - get
will only store the DSCP if the flag is unset.  set will only restore
the DSCP if the flag is set.  This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A mask
of zero disables the conditional behaviour.

mode - get/set/both - get stores the DSCP into the mark, set restores
the DSCP into the diffserv field from the mark, both 'gets' the mark and
then 'sets' it in that order.

optional parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 include/net/tc_act/tc_conndscp.h          |  19 ++
 include/uapi/linux/tc_act/tc_conndscp.h   |  33 +++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_conndscp.c                  | 333 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 6 files changed, 400 insertions(+)
 create mode 100644 include/net/tc_act/tc_conndscp.h
 create mode 100644 include/uapi/linux/tc_act/tc_conndscp.h
 create mode 100644 net/sched/act_conndscp.c

diff --git a/include/net/tc_act/tc_conndscp.h b/include/net/tc_act/tc_conndscp.h
new file mode 100644
index 000000000000..4cb328fc487d
--- /dev/null
+++ b/include/net/tc_act/tc_conndscp.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CONNDSCP_H
+#define __NET_TC_CONNDSCP_H
+
+#include <net/act_api.h>
+
+struct tcf_conndscp_info {
+	struct tc_action common;
+	struct net *net;
+	u16 zone;
+	u32 mask;
+	u32 statemask;
+	u8 mode;
+	u8 maskshift;
+};
+
+#define to_conndscp(a) ((struct tcf_conndscp_info *)a)
+
+#endif /* __NET_TC_CONNDSCP_H */
diff --git a/include/uapi/linux/tc_act/tc_conndscp.h b/include/uapi/linux/tc_act/tc_conndscp.h
new file mode 100644
index 000000000000..0897b5d6b0ce
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_conndscp.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CONNDSCP_H
+#define __UAPI_TC_CONNDSCP_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_CONNDSCP 27
+
+struct tc_conndscp {
+	tc_gen;
+	__u16 zone;
+	__u32 mask;
+	__u32 statemask;
+	__u8 mode;
+	__u8 maskshift;
+};
+
+enum {
+	TCA_CONNDSCP_UNSPEC,
+	TCA_CONNDSCP_PARMS,
+	TCA_CONNDSCP_TM,
+	TCA_CONNDSCP_PAD,
+	__TCA_CONNDSCP_MAX
+};
+#define TCA_CONNDSCP_MAX (__TCA_CONNDSCP_MAX - 1)
+
+enum {
+	CONNDSCP_FLAG_GETDSCP	= BIT(0),
+	CONNDSCP_FLAG_SETDSCP	= BIT(1)
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 1b9afdee5ba9..f43788b9d332 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -865,6 +865,19 @@ config NET_ACT_BPF
 	  To compile this code as a module, choose M here: the
 	  module will be called act_bpf.
 
+config NET_ACT_CONNDSCP
+        tristate "DSCP to Netfilter Connection Mark Store/Retriever"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+        ---help---
+	  Say Y here to allow storing of DSCP into conn mark
+	  and vice verca
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_connmark.
+
 config NET_ACT_CONNMARK
         tristate "Netfilter Connection Mark Retriever"
         depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c..b78198944618 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
 obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
+obj-$(CONFIG_NET_ACT_CONNDSCP)	+= act_conndscp.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
diff --git a/net/sched/act_conndscp.c b/net/sched/act_conndscp.c
new file mode 100644
index 000000000000..8ee87e2ab814
--- /dev/null
+++ b/net/sched/act_conndscp.c
@@ -0,0 +1,333 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* net/sched/act_conndscp.c  netfilter conndscp dscp<->connmark action
+ *
+ * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <uapi/linux/tc_act/tc_conndscp.h>
+#include <net/tc_act/tc_conndscp.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+static unsigned int conndscp_net_id;
+static struct tc_action_ops act_conndscp_ops;
+
+static void tcf_conndscp_get(struct nf_conn *ct, struct tcf_conndscp_info *ca,
+			     struct sk_buff *skb, int proto)
+{
+	u32 newmark;
+	u8 dscp;
+
+	/* mark does not contain DSCP so store DSCP bits into c->mark */
+	switch (proto) {
+	case NFPROTO_IPV4:
+		dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+		break;
+	case NFPROTO_IPV6:
+		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+		break;
+	default:
+		dscp = 0;
+		break;
+	}
+	newmark = ct->mark & ~(ca->mask | ca->statemask);
+	newmark |= (dscp << ca->maskshift) | ca->statemask;
+	if (ct->mark != newmark) {
+		/* using requeues stats to count how many connmark updates */
+		ca->tcf_qstats.requeues++;
+		ct->mark = newmark;
+		nf_conntrack_event_cache(IPCT_MARK, ct);
+	}
+}
+
+static void tcf_conndscp_set(struct nf_conn *ct, struct tcf_conndscp_info *ca,
+			     struct sk_buff *skb, int proto)
+{
+	u8 newdscp;
+
+	newdscp = (((ct->mark & ca->mask) >> ca->maskshift) << 2) &
+		     ~INET_ECN_MASK;
+
+	/* mark contains DSCP so restore DSCP bits from c->mark into diffserv */
+	/* using overlimits stats to count how many DSCP updates */
+	switch (proto) {
+	case NFPROTO_IPV4:
+		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) !=
+		     newdscp) {
+			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK,
+					    newdscp);
+			ca->tcf_qstats.overlimits++;
+		}
+		break;
+	case NFPROTO_IPV6:
+		if ((ipv6_get_dsfield(ipv6_hdr(skb)) &
+		     ~INET_ECN_MASK) != newdscp) {
+			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK,
+					    newdscp);
+			ca->tcf_qstats.overlimits++;
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static int tcf_conndscp_act(struct sk_buff *skb, const struct tc_action *a,
+			    struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_conndscp_info *ca = to_conndscp(a);
+	struct nf_conntrack_zone zone;
+	struct nf_conn *ct;
+	int proto;
+
+	spin_lock(&ca->tcf_lock);
+	tcf_lastuse_update(&ca->tcf_tm);
+	bstats_update(&ca->tcf_bstats, skb);
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (skb->len < sizeof(struct iphdr))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (skb->len < sizeof(struct ipv6hdr))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct) {
+		if (ca->mode & CONNDSCP_FLAG_SETDSCP &&
+		    (!ca->statemask || (ct->mark & ca->statemask)))
+			tcf_conndscp_set(ct, ca, skb, proto);
+		else if (ca->mode & CONNDSCP_FLAG_GETDSCP &&
+			 (!ca->statemask || !(ct->mark & ca->statemask)))
+			tcf_conndscp_get(ct, ca, skb, proto);
+		goto out;
+	}
+
+	if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+			       proto, ca->net, &tuple))
+		goto out;
+
+	zone.id = ca->zone;
+	zone.dir = NF_CT_DEFAULT_ZONE_DIR;
+
+	thash = nf_conntrack_find_get(ca->net, &zone, &tuple);
+	if (!thash)
+		goto out;
+
+	ct = nf_ct_tuplehash_to_ctrack(thash);
+	if (ca->mode & CONNDSCP_FLAG_SETDSCP &&
+	    (!ca->statemask || (ct->mark & ca->statemask)))
+		tcf_conndscp_set(ct, ca, skb, proto);
+	else if (ca->mode & CONNDSCP_FLAG_GETDSCP &&
+		 (!ca->statemask || !(ct->mark & ca->statemask)))
+		tcf_conndscp_get(ct, ca, skb, proto);
+	nf_ct_put(ct);
+
+out:
+	spin_unlock(&ca->tcf_lock);
+	return ca->tcf_action;
+}
+
+static const struct nla_policy conndscp_policy[TCA_CONNDSCP_MAX + 1] = {
+	[TCA_CONNDSCP_PARMS] = { .len = sizeof(struct tc_conndscp) },
+};
+
+static void conndscp_parmset(struct tcf_conndscp_info *ci,
+			     struct tc_conndscp *parm)
+{
+	ci->tcf_action = parm->action;
+	ci->zone = parm->zone;
+	ci->mask = parm->mask;
+	ci->maskshift = ci->mask ? __ffs(ci->mask) : 0;
+	ci->statemask = parm->statemask;
+	ci->mode = parm->mode;
+
+	/* let's not trust userspace entirely */
+	/* need at least contiguous 6 bit mask */
+	if ((0x3f & (ci->mask >> ci->maskshift)) != 0x3f)
+		ci->mode = 0;
+	if (ci->mask & ci->statemask)
+		ci->mode = 0;
+}
+
+static int tcf_conndscp_init(struct net *net, struct nlattr *nla,
+			     struct nlattr *est, struct tc_action **a,
+			     int ovr, int bind, bool rtnl_held,
+			     struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, conndscp_net_id);
+	struct nlattr *tb[TCA_CONNDSCP_MAX + 1];
+	struct tcf_conndscp_info *ci;
+	struct tc_conndscp *parm;
+	int ret = 0;
+
+	if (!nla)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_CONNDSCP_MAX, nla, conndscp_policy,
+			       NULL);
+	if (ret < 0)
+		return ret;
+
+	if (!tb[TCA_CONNDSCP_PARMS])
+		return -EINVAL;
+
+	parm = nla_data(tb[TCA_CONNDSCP_PARMS]);
+
+	ret = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!ret) {
+		ret = tcf_idr_create(tn, parm->index, est, a,
+				     &act_conndscp_ops, bind, false);
+		if (ret)
+			return ret;
+
+		ci = to_conndscp(*a);
+		ci->net = net;
+		conndscp_parmset(ci, parm);
+
+		tcf_idr_insert(tn, *a);
+		ret = ACT_P_CREATED;
+	} else if (ret > 0) {
+		ci = to_conndscp(*a);
+		if (bind)
+			return 0;
+		tcf_idr_release(*a, bind);
+		if (!ovr)
+			return -EEXIST;
+		/* replacing action and zone */
+		spin_lock_bh(&ci->tcf_lock);
+		conndscp_parmset(ci, parm);
+		spin_unlock_bh(&ci->tcf_lock);
+		ret = 0;
+	}
+
+	return ret;
+}
+
+static inline int tcf_conndscp_dump(struct sk_buff *skb, struct tc_action *a,
+				    int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_conndscp_info *ci = to_conndscp(a);
+	struct tc_conndscp opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+
+	spin_lock_bh(&ci->tcf_lock);
+	opt.action = ci->tcf_action;
+	opt.zone = ci->zone;
+	opt.mask = ci->mask;
+	opt.statemask = ci->statemask;
+	opt.mode = ci->mode;
+
+	if (nla_put(skb, TCA_CONNDSCP_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	tcf_tm_dump(&t, &ci->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CONNDSCP_TM, sizeof(t), &t,
+			  TCA_CONNDSCP_PAD))
+		goto nla_put_failure;
+	spin_unlock_bh(&ci->tcf_lock);
+
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&ci->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_conndscp_walker(struct net *net, struct sk_buff *skb,
+			       struct netlink_callback *cb, int type,
+			       const struct tc_action_ops *ops,
+			       struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, conndscp_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_conndscp_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, conndscp_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_conndscp_ops = {
+	.kind		=	"conndscp",
+	.type		=	TCA_ACT_CONNDSCP,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_conndscp_act,
+	.dump		=	tcf_conndscp_dump,
+	.init		=	tcf_conndscp_init,
+	.walk		=	tcf_conndscp_walker,
+	.lookup		=	tcf_conndscp_search,
+	.size		=	sizeof(struct tcf_conndscp_info),
+};
+
+static __net_init int conndscp_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, conndscp_net_id);
+
+	return tc_action_net_init(tn, &act_conndscp_ops);
+}
+
+static void __net_exit conndscp_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, conndscp_net_id);
+}
+
+static struct pernet_operations conndscp_net_ops = {
+	.init = conndscp_init_net,
+	.exit_batch = conndscp_exit_net,
+	.id   = &conndscp_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init conndscp_init_module(void)
+{
+	return tcf_register_action(&act_conndscp_ops, &conndscp_net_ops);
+}
+
+static void __exit conndscp_cleanup_module(void)
+{
+	tcf_unregister_action(&act_conndscp_ops, &conndscp_net_ops);
+}
+
+module_init(conndscp_init_module);
+module_exit(conndscp_cleanup_module);
+MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>");
+MODULE_DESCRIPTION("DSCP to Connection tracking mark storing/restoring");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index 203302065458..9d1fddcfb887 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -37,6 +37,7 @@ CONFIG_NET_ACT_SKBEDIT=m
 CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_ACT_BPF=m
+CONFIG_NET_ACT_CONNDSCP=m
 CONFIG_NET_ACT_CONNMARK=m
 CONFIG_NET_ACT_SKBMOD=m
 CONFIG_NET_ACT_IFE=m
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/1] net: sched: Introduce conndscp action
  2019-03-19 19:49 ` [PATCH 1/1] " Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-21 17:58   ` kbuild test robot
  0 siblings, 0 replies; 27+ messages in thread
From: kbuild test robot @ 2019-03-21 17:58 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant
  Cc: kbuild-all, netdev, jiri, xiyou.wangcong, jhs,
	Kevin 'ldir' Darbyshire-Bryant

[-- Attachment #1: Type: text/plain, Size: 1523 bytes --]

Hi Kevin,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]
[also build test ERROR on v5.1-rc1 next-20190321]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kevin-ldir-Darbyshire-Bryant/net-sched-Introduce-conndscp-action/20190322-010455
config: mips-allmodconfig (attached as .config)
compiler: mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=mips 

All errors (new ones prefixed by >>):

>> net//sched/act_conndscp.c:290:3: error: 'struct tc_action_ops' has no member named 'type'
     .type  = TCA_ACT_CONNDSCP,
      ^~~~

vim +290 net//sched/act_conndscp.c

   287	
   288	static struct tc_action_ops act_conndscp_ops = {
   289		.kind		=	"conndscp",
 > 290		.type		=	TCA_ACT_CONNDSCP,
   291		.owner		=	THIS_MODULE,
   292		.act		=	tcf_conndscp_act,
   293		.dump		=	tcf_conndscp_dump,
   294		.init		=	tcf_conndscp_init,
   295		.walk		=	tcf_conndscp_walker,
   296		.lookup		=	tcf_conndscp_search,
   297		.size		=	sizeof(struct tcf_conndscp_info),
   298	};
   299	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 59090 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-19 19:49 [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action Kevin 'ldir' Darbyshire-Bryant
  2019-03-19 19:49 ` [PATCH 1/1] " Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-22 14:09 ` Kevin 'ldir' Darbyshire-Bryant
  2019-03-22 17:39   ` Cong Wang
  1 sibling, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-22 14:09 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: jhs, jiri, netdev, xiyou.wangcong

Conndscp is a new tc filter action module.  It is designed to copy DSCPs
to conntrack marks and the reverse operation of conntrack mark contained
DSCPs to the diffserv field of suitable skbs.

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to implement
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

conndscp understands the following parameters:

mask - a 32 bit mask of at least 6 contiguous bits where conndscp will
place the DSCP in conntrack mark.  The DSCP is left-shifted by the
number of unset lower bits of the mask before storing into the mark
field.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by mask.  This represents a conditional operation flag - get
will only store the DSCP if the flag is unset.  set will only restore
the DSCP if the flag is set.  This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A mask
of zero disables the conditional behaviour.

mode - get/set/both - get stores the DSCP into the mark, set restores
the DSCP into the diffserv field from the mark, both 'gets' the mark and
then 'sets' it in that order.

optional parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
v2 - rebase on latest net-next & tweak struct tcf_conndscp_info
     alignment.  Incorporate changes related to net/sched: prepare TC
     actions to properly validate the control action


 include/net/tc_act/tc_conndscp.h          |  19 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_conndscp.h   |  31 ++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_conndscp.c                  | 354 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 420 insertions(+)
 create mode 100644 include/net/tc_act/tc_conndscp.h
 create mode 100644 include/uapi/linux/tc_act/tc_conndscp.h
 create mode 100644 net/sched/act_conndscp.c

diff --git a/include/net/tc_act/tc_conndscp.h b/include/net/tc_act/tc_conndscp.h
new file mode 100644
index 000000000000..732afb1a930a
--- /dev/null
+++ b/include/net/tc_act/tc_conndscp.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CONNDSCP_H
+#define __NET_TC_CONNDSCP_H
+
+#include <net/act_api.h>
+
+struct tcf_conndscp_info {
+	struct tc_action common;
+	struct net *net;
+	u32 mask;
+	u32 statemask;
+	u16 zone;
+	u8 mode;
+	u8 maskshift;
+};
+
+#define to_conndscp(a) ((struct tcf_conndscp_info *)a)
+
+#endif /* __NET_TC_CONNDSCP_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51a0496f78ea..59a20fbe53cb 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -105,6 +105,7 @@ enum tca_id {
 	TCA_ID_IFE = TCA_ACT_IFE,
 	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
 	/* other actions go here */
+	TCA_ID_CONNDSCP,
 	__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_conndscp.h b/include/uapi/linux/tc_act/tc_conndscp.h
new file mode 100644
index 000000000000..be74a3191c68
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_conndscp.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CONNDSCP_H
+#define __UAPI_TC_CONNDSCP_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+struct tc_conndscp {
+	tc_gen;
+	__u32 mask;
+	__u32 statemask;
+	__u16 zone;
+	__u8 mode;
+	__u8 maskshift;
+};
+
+enum {
+	TCA_CONNDSCP_UNSPEC,
+	TCA_CONNDSCP_PARMS,
+	TCA_CONNDSCP_TM,
+	TCA_CONNDSCP_PAD,
+	__TCA_CONNDSCP_MAX
+};
+#define TCA_CONNDSCP_MAX (__TCA_CONNDSCP_MAX - 1)
+
+enum {
+	CONNDSCP_FLAG_GETDSCP	= BIT(0),
+	CONNDSCP_FLAG_SETDSCP	= BIT(1)
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 1b9afdee5ba9..f43788b9d332 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -865,6 +865,19 @@ config NET_ACT_BPF
 	  To compile this code as a module, choose M here: the
 	  module will be called act_bpf.
 
+config NET_ACT_CONNDSCP
+        tristate "DSCP to Netfilter Connection Mark Store/Retriever"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+        ---help---
+	  Say Y here to allow storing of DSCP into conn mark
+	  and vice verca
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_connmark.
+
 config NET_ACT_CONNMARK
         tristate "Netfilter Connection Mark Retriever"
         depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c..b78198944618 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
 obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
+obj-$(CONFIG_NET_ACT_CONNDSCP)	+= act_conndscp.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
diff --git a/net/sched/act_conndscp.c b/net/sched/act_conndscp.c
new file mode 100644
index 000000000000..ee2e993e1962
--- /dev/null
+++ b/net/sched/act_conndscp.c
@@ -0,0 +1,354 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* net/sched/act_conndscp.c  netfilter conndscp dscp<->connmark action
+ *
+ * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <net/pkt_cls.h>
+#include <uapi/linux/tc_act/tc_conndscp.h>
+#include <net/tc_act/tc_conndscp.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+static unsigned int conndscp_net_id;
+static struct tc_action_ops act_conndscp_ops;
+
+static void tcf_conndscp_get(struct nf_conn *ct, struct tcf_conndscp_info *ca,
+			     struct sk_buff *skb, int proto)
+{
+	u32 newmark;
+	u8 dscp;
+
+	/* mark does not contain DSCP so store DSCP bits into c->mark */
+	switch (proto) {
+	case NFPROTO_IPV4:
+		dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+		break;
+	case NFPROTO_IPV6:
+		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+		break;
+	default:
+		dscp = 0;
+		break;
+	}
+	newmark = ct->mark & ~(ca->mask | ca->statemask);
+	newmark |= (dscp << ca->maskshift) | ca->statemask;
+	if (ct->mark != newmark) {
+		/* using requeues stats to count how many connmark updates */
+		ca->tcf_qstats.requeues++;
+		ct->mark = newmark;
+		nf_conntrack_event_cache(IPCT_MARK, ct);
+	}
+}
+
+static void tcf_conndscp_set(struct nf_conn *ct, struct tcf_conndscp_info *ca,
+			     struct sk_buff *skb, int proto)
+{
+	u8 newdscp;
+
+	newdscp = (((ct->mark & ca->mask) >> ca->maskshift) << 2) &
+		     ~INET_ECN_MASK;
+
+	/* mark contains DSCP so restore DSCP bits from c->mark into diffserv */
+	/* using overlimits stats to count how many DSCP updates */
+	switch (proto) {
+	case NFPROTO_IPV4:
+		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) !=
+		     newdscp) {
+			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK,
+					    newdscp);
+			ca->tcf_qstats.overlimits++;
+		}
+		break;
+	case NFPROTO_IPV6:
+		if ((ipv6_get_dsfield(ipv6_hdr(skb)) &
+		     ~INET_ECN_MASK) != newdscp) {
+			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK,
+					    newdscp);
+			ca->tcf_qstats.overlimits++;
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static int tcf_conndscp_act(struct sk_buff *skb, const struct tc_action *a,
+			    struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_conndscp_info *ca = to_conndscp(a);
+	struct nf_conntrack_zone zone;
+	struct nf_conn *ct;
+	int proto;
+
+	spin_lock(&ca->tcf_lock);
+	tcf_lastuse_update(&ca->tcf_tm);
+	bstats_update(&ca->tcf_bstats, skb);
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (skb->len < sizeof(struct iphdr))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (skb->len < sizeof(struct ipv6hdr))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct) {
+		if (ca->mode & CONNDSCP_FLAG_SETDSCP &&
+		    (!ca->statemask || (ct->mark & ca->statemask)))
+			tcf_conndscp_set(ct, ca, skb, proto);
+		else if (ca->mode & CONNDSCP_FLAG_GETDSCP &&
+			 (!ca->statemask || !(ct->mark & ca->statemask)))
+			tcf_conndscp_get(ct, ca, skb, proto);
+		goto out;
+	}
+
+	if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+			       proto, ca->net, &tuple))
+		goto out;
+
+	zone.id = ca->zone;
+	zone.dir = NF_CT_DEFAULT_ZONE_DIR;
+
+	thash = nf_conntrack_find_get(ca->net, &zone, &tuple);
+	if (!thash)
+		goto out;
+
+	ct = nf_ct_tuplehash_to_ctrack(thash);
+	if (ca->mode & CONNDSCP_FLAG_SETDSCP &&
+	    (!ca->statemask || (ct->mark & ca->statemask)))
+		tcf_conndscp_set(ct, ca, skb, proto);
+	else if (ca->mode & CONNDSCP_FLAG_GETDSCP &&
+		 (!ca->statemask || !(ct->mark & ca->statemask)))
+		tcf_conndscp_get(ct, ca, skb, proto);
+	nf_ct_put(ct);
+
+out:
+	spin_unlock(&ca->tcf_lock);
+	return ca->tcf_action;
+}
+
+static const struct nla_policy conndscp_policy[TCA_CONNDSCP_MAX + 1] = {
+	[TCA_CONNDSCP_PARMS] = { .len = sizeof(struct tc_conndscp) },
+};
+
+static void conndscp_parmset(struct tcf_conndscp_info *ci,
+			     struct tc_conndscp *parm)
+{
+/*	ci->tcf_action = parm->action;	*/
+	ci->zone = parm->zone;
+	ci->mask = parm->mask;
+	ci->maskshift = ci->mask ? __ffs(ci->mask) : 0;
+	ci->statemask = parm->statemask;
+	ci->mode = parm->mode;
+
+	/* let's not trust userspace entirely */
+	/* need at least contiguous 6 bit mask */
+	if ((0x3f & (ci->mask >> ci->maskshift)) != 0x3f)
+		ci->mode = 0;
+	if (ci->mask & ci->statemask)
+		ci->mode = 0;
+}
+
+static int tcf_conndscp_init(struct net *net, struct nlattr *nla,
+			     struct nlattr *est, struct tc_action **a,
+			     int ovr, int bind, bool rtnl_held,
+			     struct tcf_proto *tp,
+			     struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, conndscp_net_id);
+	struct nlattr *tb[TCA_CONNDSCP_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tcf_conndscp_info *ci;
+	struct tc_conndscp *parm;
+	int ret = 0, err;
+
+	if (!nla)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_CONNDSCP_MAX, nla, conndscp_policy,
+			       NULL);
+	if (ret < 0)
+		return ret;
+
+	if (!tb[TCA_CONNDSCP_PARMS])
+		return -EINVAL;
+
+	parm = nla_data(tb[TCA_CONNDSCP_PARMS]);
+
+	ret = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!ret) {
+		ret = tcf_idr_create(tn, parm->index, est, a,
+				     &act_conndscp_ops, bind, false);
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
+			return ret;
+		}
+
+		ci = to_conndscp(*a);
+		err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch,
+					       extack);
+		if (err < 0)
+			goto release_idr;
+		tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+		ci->net = net;
+		conndscp_parmset(ci, parm);
+
+		tcf_idr_insert(tn, *a);
+		ret = ACT_P_CREATED;
+	} else if (ret > 0) {
+		ci = to_conndscp(*a);
+		if (bind)
+			return 0;
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+		err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch,
+					       extack);
+		if (err < 0)
+			goto release_idr;
+		/* replacing action and zone */
+		spin_lock_bh(&ci->tcf_lock);
+		goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+		conndscp_parmset(ci, parm);
+		spin_unlock_bh(&ci->tcf_lock);
+		if (goto_ch)
+			tcf_chain_put_by_act(goto_ch);
+		ret = 0;
+	}
+
+	return ret;
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static inline int tcf_conndscp_dump(struct sk_buff *skb, struct tc_action *a,
+				    int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_conndscp_info *ci = to_conndscp(a);
+	struct tc_conndscp opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+
+	spin_lock_bh(&ci->tcf_lock);
+	opt.action = ci->tcf_action;
+	opt.zone = ci->zone;
+	opt.mask = ci->mask;
+	opt.statemask = ci->statemask;
+	opt.mode = ci->mode;
+
+	if (nla_put(skb, TCA_CONNDSCP_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	tcf_tm_dump(&t, &ci->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CONNDSCP_TM, sizeof(t), &t,
+			  TCA_CONNDSCP_PAD))
+		goto nla_put_failure;
+	spin_unlock_bh(&ci->tcf_lock);
+
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&ci->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_conndscp_walker(struct net *net, struct sk_buff *skb,
+			       struct netlink_callback *cb, int type,
+			       const struct tc_action_ops *ops,
+			       struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, conndscp_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_conndscp_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, conndscp_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_conndscp_ops = {
+	.kind		=	"conndscp",
+	.id		=	TCA_ID_CONNDSCP,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_conndscp_act,
+	.dump		=	tcf_conndscp_dump,
+	.init		=	tcf_conndscp_init,
+	.walk		=	tcf_conndscp_walker,
+	.lookup		=	tcf_conndscp_search,
+	.size		=	sizeof(struct tcf_conndscp_info),
+};
+
+static __net_init int conndscp_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, conndscp_net_id);
+
+	return tc_action_net_init(tn, &act_conndscp_ops);
+}
+
+static void __net_exit conndscp_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, conndscp_net_id);
+}
+
+static struct pernet_operations conndscp_net_ops = {
+	.init = conndscp_init_net,
+	.exit_batch = conndscp_exit_net,
+	.id   = &conndscp_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init conndscp_init_module(void)
+{
+	return tcf_register_action(&act_conndscp_ops, &conndscp_net_ops);
+}
+
+static void __exit conndscp_cleanup_module(void)
+{
+	tcf_unregister_action(&act_conndscp_ops, &conndscp_net_ops);
+}
+
+module_init(conndscp_init_module);
+module_exit(conndscp_cleanup_module);
+MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>");
+MODULE_DESCRIPTION("DSCP to Connection tracking mark storing/restoring");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index 203302065458..9d1fddcfb887 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -37,6 +37,7 @@ CONFIG_NET_ACT_SKBEDIT=m
 CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_ACT_BPF=m
+CONFIG_NET_ACT_CONNDSCP=m
 CONFIG_NET_ACT_CONNMARK=m
 CONFIG_NET_ACT_SKBMOD=m
 CONFIG_NET_ACT_IFE=m
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-22 14:09 ` [RFC PATCH 1/1 v2] " Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-22 17:39   ` Cong Wang
  2019-03-22 18:26     ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2019-03-22 17:39 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: jhs, jiri, netdev

Hello,

On Fri, Mar 22, 2019 at 7:09 AM Kevin 'ldir' Darbyshire-Bryant
<ldir@darbyshire-bryant.me.uk> wrote:
>
> Conndscp is a new tc filter action module.  It is designed to copy DSCPs
> to conntrack marks and the reverse operation of conntrack mark contained
> DSCPs to the diffserv field of suitable skbs.
>

Is it possible and feasible to integrate this into connmark?

Both are intended to retrieve information from conntrack and store
it into skb. I know the name "connmark" already says it is a mark,
while yours isn't, I still want to see if we can avoid code duplications.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-22 17:39   ` Cong Wang
@ 2019-03-22 18:26     ` Kevin 'ldir' Darbyshire-Bryant
  2019-03-22 20:05       ` Cong Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-22 18:26 UTC (permalink / raw)
  To: Cong Wang; +Cc: jhs, jiri, netdev

Hi Cong,

Thanks for your questions.

> On 22 Mar 2019, at 17:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> Hello,
> 
> On Fri, Mar 22, 2019 at 7:09 AM Kevin 'ldir' Darbyshire-Bryant
> <ldir@darbyshire-bryant.me.uk> wrote:
>> 
>> Conndscp is a new tc filter action module.  It is designed to copy DSCPs
>> to conntrack marks and the reverse operation of conntrack mark contained
>> DSCPs to the diffserv field of suitable skbs.
>> 
> 
> Is it possible and feasible to integrate this into connmark?

I started off coding it that way but quickly ran into my limitations with netlink messaging and became frustrated.  Aside from my own limitations, conndscp ab/uses tcf_qstats requeues & overlimits to indicate DSCP->MARK->DSCP operations and has been useful in proving DSCP/marking operations are occurring in the right times/places.  Integrating with connmark which itself uses overlimits to indicate conntrack mark to skb->mark restoration would lose that differentiation/confirmation/debug ability.  A possibility is to ab/use the drop count instead but I fear that would cause confusion.

> Both are intended to retrieve information from conntrack and store
> it into skb. I know the name "connmark" already says it is a mark,
> while yours isn't, I still want to see if we can avoid code duplications.

I understand your quest :-)  I think conndscp does a bit more than connmark.  Conndscp is two way diffserv<-->conntrack mark operation.  connmark is a single way conntrack mark->skb.mark operation.

Please let me know your thoughts and whether I’m talking out of my inexperienced backside, which direction I should go and I shall gladly take your advice…and improve my C some more :-)


Many thanks,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-22 18:26     ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-22 20:05       ` Cong Wang
  2019-03-22 20:50         ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2019-03-22 20:05 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: jhs, jiri, netdev

On Fri, Mar 22, 2019 at 11:26 AM Kevin 'ldir' Darbyshire-Bryant
<ldir@darbyshire-bryant.me.uk> wrote:
>
> Hi Cong,
>
> Thanks for your questions.
>
> > On 22 Mar 2019, at 17:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > Hello,
> >
> > On Fri, Mar 22, 2019 at 7:09 AM Kevin 'ldir' Darbyshire-Bryant
> > <ldir@darbyshire-bryant.me.uk> wrote:
> >>
> >> Conndscp is a new tc filter action module.  It is designed to copy DSCPs
> >> to conntrack marks and the reverse operation of conntrack mark contained
> >> DSCPs to the diffserv field of suitable skbs.
> >>
> >
> > Is it possible and feasible to integrate this into connmark?
>
> I started off coding it that way but quickly ran into my limitations with netlink messaging and became frustrated.  Aside from my own limitations, conndscp ab/uses tcf_qstats requeues & overlimits to indicate DSCP->MARK->DSCP operations and has been useful in proving DSCP/marking operations are occurring in the right times/places.  Integrating with connmark which itself uses overlimits to indicate conntrack mark to skb->mark restoration would lose that differentiation/confirmation/debug ability.  A possibility is to ab/use the drop count instead but I fear that would cause confusion.

This sounds problematic, why a flag/parameter doesn't work?


>
> > Both are intended to retrieve information from conntrack and store
> > it into skb. I know the name "connmark" already says it is a mark,
> > while yours isn't, I still want to see if we can avoid code duplications.
>
> I understand your quest :-)  I think conndscp does a bit more than connmark.  Conndscp is two way diffserv<-->conntrack mark operation.  connmark is a single way conntrack mark->skb.mark operation.

I am not sure if it is a good idea to modify conntrack in TC,
as conntrack doesn't even belong to TC. Retrieving information
from conntrack and saving it to skb is fine, as we modify skb
in many different ways.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-22 20:05       ` Cong Wang
@ 2019-03-22 20:50         ` Kevin 'ldir' Darbyshire-Bryant
  2019-03-22 21:31           ` Cong Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-22 20:50 UTC (permalink / raw)
  To: Cong Wang; +Cc: jhs, jiri, netdev



> On 22 Mar 2019, at 20:05, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Fri, Mar 22, 2019 at 11:26 AM Kevin 'ldir' Darbyshire-Bryant
> <ldir@darbyshire-bryant.me.uk> wrote:
>> 
>> Hi Cong,
>> 
>> Thanks for your questions.
>> 
>>> On 22 Mar 2019, at 17:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> On Fri, Mar 22, 2019 at 7:09 AM Kevin 'ldir' Darbyshire-Bryant
>>> <ldir@darbyshire-bryant.me.uk> wrote:
>>>> 
>>>> Conndscp is a new tc filter action module.  It is designed to copy DSCPs
>>>> to conntrack marks and the reverse operation of conntrack mark contained
>>>> DSCPs to the diffserv field of suitable skbs.
>>>> 
>>> 
>>> Is it possible and feasible to integrate this into connmark?
>> 
>> I started off coding it that way but quickly ran into my limitations with netlink messaging and became frustrated.  Aside from my own limitations, conndscp ab/uses tcf_qstats requeues & overlimits to indicate DSCP->MARK->DSCP operations and has been useful in proving DSCP/marking operations are occurring in the right times/places.  Integrating with connmark which itself uses overlimits to indicate conntrack mark to skb->mark restoration would lose that differentiation/confirmation/debug ability.  A possibility is to ab/use the drop count instead but I fear that would cause confusion.
> 
> This sounds problematic, why a flag/parameter doesn't work?
> 
I don’t understand the question?

> 
>> 
>>> Both are intended to retrieve information from conntrack and store
>>> it into skb. I know the name "connmark" already says it is a mark,
>>> while yours isn't, I still want to see if we can avoid code duplications.
>> 
>> I understand your quest :-)  I think conndscp does a bit more than connmark.  Conndscp is two way diffserv<-->conntrack mark operation.  connmark is a single way conntrack mark->skb.mark operation.
> 
> I am not sure if it is a good idea to modify conntrack in TC,
> as conntrack doesn't even belong to TC. Retrieving information
> from conntrack and saving it to skb is fine, as we modify skb
> in many different ways.

OK, this is why I wanted to ask as RFC before I went too far implementing stuff.  AFAIUI you’re saying it’s tc is okay to restore stuff from a connmark but not to set/change the conntrack mark.  So I need to find a legal place to store a DSCP into a conntrack mark.

Cheers,

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-22 20:50         ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-22 21:31           ` Cong Wang
  2019-03-22 22:06             ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2019-03-22 21:31 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: jhs, jiri, netdev

On Fri, Mar 22, 2019 at 1:50 PM Kevin 'ldir' Darbyshire-Bryant
<ldir@darbyshire-bryant.me.uk> wrote:
>
>
>
> > On 22 Mar 2019, at 20:05, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Fri, Mar 22, 2019 at 11:26 AM Kevin 'ldir' Darbyshire-Bryant
> > <ldir@darbyshire-bryant.me.uk> wrote:
> >>
> >> Hi Cong,
> >>
> >> Thanks for your questions.
> >>
> >>> On 22 Mar 2019, at 17:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> On Fri, Mar 22, 2019 at 7:09 AM Kevin 'ldir' Darbyshire-Bryant
> >>> <ldir@darbyshire-bryant.me.uk> wrote:
> >>>>
> >>>> Conndscp is a new tc filter action module.  It is designed to copy DSCPs
> >>>> to conntrack marks and the reverse operation of conntrack mark contained
> >>>> DSCPs to the diffserv field of suitable skbs.
> >>>>
> >>>
> >>> Is it possible and feasible to integrate this into connmark?
> >>
> >> I started off coding it that way but quickly ran into my limitations with netlink messaging and became frustrated.  Aside from my own limitations, conndscp ab/uses tcf_qstats requeues & overlimits to indicate DSCP->MARK->DSCP operations and has been useful in proving DSCP/marking operations are occurring in the right times/places.  Integrating with connmark which itself uses overlimits to indicate conntrack mark to skb->mark restoration would lose that differentiation/confirmation/debug ability.  A possibility is to ab/use the drop count instead but I fear that would cause confusion.
> >
> > This sounds problematic, why a flag/parameter doesn't work?
> >
> I don’t understand the question?

You said conndscp uses some stat to save some configuration
information, that is, DSCP->MARK->DSCP operations. But
configuration information is usually saved in a parameter struct
or some priviate flag. So, I have to ask why a flag/parameter doesn't
work for this case?

And, you also implied this is a barrier for you to reuse connmark
action.

Am I misunderstanding anything here?

>
> >
> >>
> >>> Both are intended to retrieve information from conntrack and store
> >>> it into skb. I know the name "connmark" already says it is a mark,
> >>> while yours isn't, I still want to see if we can avoid code duplications.
> >>
> >> I understand your quest :-)  I think conndscp does a bit more than connmark.  Conndscp is two way diffserv<-->conntrack mark operation.  connmark is a single way conntrack mark->skb.mark operation.
> >
> > I am not sure if it is a good idea to modify conntrack in TC,
> > as conntrack doesn't even belong to TC. Retrieving information
> > from conntrack and saving it to skb is fine, as we modify skb
> > in many different ways.
>
> OK, this is why I wanted to ask as RFC before I went too far implementing stuff.  AFAIUI you’re saying it’s tc is okay to restore stuff from a connmark but not to set/change the conntrack mark.  So I need to find a legal place to store a DSCP into a conntrack mark.

Yes.

I guess you should look into netfilter to modify any conntrack attribute,
it is at least where conntrack belongs to. :)

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-22 21:31           ` Cong Wang
@ 2019-03-22 22:06             ` Kevin 'ldir' Darbyshire-Bryant
  2019-03-22 23:09               ` Cong Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-22 22:06 UTC (permalink / raw)
  To: Cong Wang; +Cc: jhs, jiri, netdev



> On 22 Mar 2019, at 21:31, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Fri, Mar 22, 2019 at 1:50 PM Kevin 'ldir' Darbyshire-Bryant
> <ldir@darbyshire-bryant.me.uk> wrote:
>> 
>> 
>> 
>>> On 22 Mar 2019, at 20:05, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> On Fri, Mar 22, 2019 at 11:26 AM Kevin 'ldir' Darbyshire-Bryant
>>> <ldir@darbyshire-bryant.me.uk> wrote:
>>>> 
>>>> Hi Cong,
>>>> 
>>>> Thanks for your questions.
>>>> 
>>>>> On 22 Mar 2019, at 17:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> On Fri, Mar 22, 2019 at 7:09 AM Kevin 'ldir' Darbyshire-Bryant
>>>>> <ldir@darbyshire-bryant.me.uk> wrote:
>>>>>> 
>>>>>> Conndscp is a new tc filter action module.  It is designed to copy DSCPs
>>>>>> to conntrack marks and the reverse operation of conntrack mark contained
>>>>>> DSCPs to the diffserv field of suitable skbs.
>>>>>> 
>>>>> 
>>>>> Is it possible and feasible to integrate this into connmark?
>>>> 
>>>> I started off coding it that way but quickly ran into my limitations with netlink messaging and became frustrated.  Aside from my own limitations, conndscp ab/uses tcf_qstats requeues & overlimits to indicate DSCP->MARK->DSCP operations and has been useful in proving DSCP/marking operations are occurring in the right times/places.  Integrating with connmark which itself uses overlimits to indicate conntrack mark to skb->mark restoration would lose that differentiation/confirmation/debug ability.  A possibility is to ab/use the drop count instead but I fear that would cause confusion.
>>> 
>>> This sounds problematic, why a flag/parameter doesn't work?
>>> 
>> I don’t understand the question?
> 
> You said conndscp uses some stat to save some configuration
> information, that is, DSCP->MARK->DSCP operations. But
> configuration information is usually saved in a parameter struct
> or some priviate flag. So, I have to ask why a flag/parameter doesn't
> work for this case?
> 
> And, you also implied this is a barrier for you to reuse connmark
> action.
> 
> Am I misunderstanding anything here?

Ahh!  I understand the question, apologies if I was not clear.  conndscp like connmark reports some status information back to tc via tcf_qstats structure.  connmark uses ‘overlimits’ to report the number of marks restored from conntrack->mark to skb->mark.  conndscp uses ‘overlimits’ and ‘requeues’ to report status about how many marks it has restored/set. e.g.

root@Router:~# tc -s filter show dev eth0
filter parent cacf: protocol all pref 10 u32 chain 0 
filter parent cacf: protocol all pref 10 u32 chain 0 fh 800: ht divisor 1 
filter parent cacf: protocol all pref 10 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 not_in_hw 
  match 00000000/00000000 at 0
	action order 1: conndscp zone 0 pipe
	 index 1 ref 1 bind 1 mask 0xfc000000 statemask 0x01000000 mode both installed 119695 sec used 0 sec
	Action statistics:
	Sent 944294567 bytes 4390248 pkt (dropped 0, overlimits 2366576 requeues 50157) <<— here
	backlog 0b 0p requeues 50157

I explained (badly) that merging ‘connmark’ and ‘conndscp’ would present an issue (to me) of how to report both types of statistics (connmark skb->mark restores & conndscp connmark->skb->ip-diffserv restores & skb->ipdiffserv->connmark->mark stores)


> 
>> 
>>> 
>>>> 
>>>>> Both are intended to retrieve information from conntrack and store
>>>>> it into skb. I know the name "connmark" already says it is a mark,
>>>>> while yours isn't, I still want to see if we can avoid code duplications.
>>>> 
>>>> I understand your quest :-)  I think conndscp does a bit more than connmark.  Conndscp is two way diffserv<-->conntrack mark operation.  connmark is a single way conntrack mark->skb.mark operation.
>>> 
>>> I am not sure if it is a good idea to modify conntrack in TC,
>>> as conntrack doesn't even belong to TC. Retrieving information
>>> from conntrack and saving it to skb is fine, as we modify skb
>>> in many different ways.
>> 
>> OK, this is why I wanted to ask as RFC before I went too far implementing stuff.  AFAIUI you’re saying it’s tc is okay to restore stuff from a connmark but not to set/change the conntrack mark.  So I need to find a legal place to store a DSCP into a conntrack mark.
> 
> Yes.
> 
> I guess you should look into netfilter to modify any conntrack attribute,
> it is at least where conntrack belongs to. :)

So I wonder if an XT_CONNMARK_SAVEDSCP option in xt_connmark would be more acceptable?

Your patience & advice appreciated.

Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-22 22:06             ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-22 23:09               ` Cong Wang
  2019-03-23 17:45                 ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2019-03-22 23:09 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: jhs, jiri, netdev

On Fri, Mar 22, 2019 at 3:06 PM Kevin 'ldir' Darbyshire-Bryant
<ldir@darbyshire-bryant.me.uk> wrote:
>
>
>
> > On 22 Mar 2019, at 21:31, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Fri, Mar 22, 2019 at 1:50 PM Kevin 'ldir' Darbyshire-Bryant
> > <ldir@darbyshire-bryant.me.uk> wrote:
> >>
> >>
> >>
> >>> On 22 Mar 2019, at 20:05, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >>>
> >>> On Fri, Mar 22, 2019 at 11:26 AM Kevin 'ldir' Darbyshire-Bryant
> >>> <ldir@darbyshire-bryant.me.uk> wrote:
> >>>>
> >>>> Hi Cong,
> >>>>
> >>>> Thanks for your questions.
> >>>>
> >>>>> On 22 Mar 2019, at 17:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> On Fri, Mar 22, 2019 at 7:09 AM Kevin 'ldir' Darbyshire-Bryant
> >>>>> <ldir@darbyshire-bryant.me.uk> wrote:
> >>>>>>
> >>>>>> Conndscp is a new tc filter action module.  It is designed to copy DSCPs
> >>>>>> to conntrack marks and the reverse operation of conntrack mark contained
> >>>>>> DSCPs to the diffserv field of suitable skbs.
> >>>>>>
> >>>>>
> >>>>> Is it possible and feasible to integrate this into connmark?
> >>>>
> >>>> I started off coding it that way but quickly ran into my limitations with netlink messaging and became frustrated.  Aside from my own limitations, conndscp ab/uses tcf_qstats requeues & overlimits to indicate DSCP->MARK->DSCP operations and has been useful in proving DSCP/marking operations are occurring in the right times/places.  Integrating with connmark which itself uses overlimits to indicate conntrack mark to skb->mark restoration would lose that differentiation/confirmation/debug ability.  A possibility is to ab/use the drop count instead but I fear that would cause confusion.
> >>>
> >>> This sounds problematic, why a flag/parameter doesn't work?
> >>>
> >> I don’t understand the question?
> >
> > You said conndscp uses some stat to save some configuration
> > information, that is, DSCP->MARK->DSCP operations. But
> > configuration information is usually saved in a parameter struct
> > or some priviate flag. So, I have to ask why a flag/parameter doesn't
> > work for this case?
> >
> > And, you also implied this is a barrier for you to reuse connmark
> > action.
> >
> > Am I misunderstanding anything here?
>
> Ahh!  I understand the question, apologies if I was not clear.  conndscp like connmark reports some status information back to tc via tcf_qstats structure.  connmark uses ‘overlimits’ to report the number of marks restored from conntrack->mark to skb->mark.  conndscp uses ‘overlimits’ and ‘requeues’ to report status about how many marks it has restored/set. e.g.

I see, I didn't know this, but it is not hard to add a connmark
specific stat for this, I don't know why it has to reuse 'overlimit',
perhaps just to save some memory space.

Unless you have legitimate reasons, you don't have to reuse
it. It is just confusing.


> >
> > I guess you should look into netfilter to modify any conntrack attribute,
> > it is at least where conntrack belongs to. :)
>
> So I wonder if an XT_CONNMARK_SAVEDSCP option in xt_connmark would be more acceptable?

I think so, but I have to say I am not a netfilter expert. You probably
want to check it with netfilter developers too.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-22 23:09               ` Cong Wang
@ 2019-03-23 17:45                 ` Kevin 'ldir' Darbyshire-Bryant
  2019-03-25 19:17                   ` Cong Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-23 17:45 UTC (permalink / raw)
  To: Cong Wang; +Cc: jhs, jiri, netdev

Hi Cong,

Thanks for your responses so far.

> On 22 Mar 2019, at 23:09, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Fri, Mar 22, 2019 at 3:06 PM Kevin 'ldir' Darbyshire-Bryant
> <ldir@darbyshire-bryant.me.uk> wrote:
>> 
>> 
>> 
>>> On 22 Mar 2019, at 21:31, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> On Fri, Mar 22, 2019 at 1:50 PM Kevin 'ldir' Darbyshire-Bryant
>>> <ldir@darbyshire-bryant.me.uk> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On 22 Mar 2019, at 20:05, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> 
>>>>> On Fri, Mar 22, 2019 at 11:26 AM Kevin 'ldir' Darbyshire-Bryant
>>>>> <ldir@darbyshire-bryant.me.uk> wrote:
>>>>>> 
>>>>>> Hi Cong,
>>>>>> 
>>>>>> Thanks for your questions.
>>>>>> 
>>>>>>> On 22 Mar 2019, at 17:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> On Fri, Mar 22, 2019 at 7:09 AM Kevin 'ldir' Darbyshire-Bryant
>>>>>>> <ldir@darbyshire-bryant.me.uk> wrote:
>>>>>>>> 
>>>>>>>> Conndscp is a new tc filter action module.  It is designed to copy DSCPs
>>>>>>>> to conntrack marks and the reverse operation of conntrack mark contained
>>>>>>>> DSCPs to the diffserv field of suitable skbs.
>>>>>>>> 
>>>>>>> 
>>>>>>> Is it possible and feasible to integrate this into connmark?
>>>>>> 
>>>>>> I started off coding it that way but quickly ran into my limitations with netlink messaging and became frustrated.  Aside from my own limitations, conndscp ab/uses tcf_qstats requeues & overlimits to indicate DSCP->MARK->DSCP operations and has been useful in proving DSCP/marking operations are occurring in the right times/places.  Integrating with connmark which itself uses overlimits to indicate conntrack mark to skb->mark restoration would lose that differentiation/confirmation/debug ability.  A possibility is to ab/use the drop count instead but I fear that would cause confusion.
>>>>> 
>>>>> This sounds problematic, why a flag/parameter doesn't work?
>>>>> 
>>>> I don’t understand the question?
>>> 
>>> You said conndscp uses some stat to save some configuration
>>> information, that is, DSCP->MARK->DSCP operations. But
>>> configuration information is usually saved in a parameter struct
>>> or some priviate flag. So, I have to ask why a flag/parameter doesn't
>>> work for this case?
>>> 
>>> And, you also implied this is a barrier for you to reuse connmark
>>> action.
>>> 
>>> Am I misunderstanding anything here?
>> 
>> Ahh!  I understand the question, apologies if I was not clear.  conndscp like connmark reports some status information back to tc via tcf_qstats structure.  connmark uses ‘overlimits’ to report the number of marks restored from conntrack->mark to skb->mark.  conndscp uses ‘overlimits’ and ‘requeues’ to report status about how many marks it has restored/set. e.g.
> 
> I see, I didn't know this, but it is not hard to add a connmark
> specific stat for this, I don't know why it has to reuse 'overlimit',
> perhaps just to save some memory space.
> 
> Unless you have legitimate reasons, you don't have to reuse
> it. It is just confusing.

I will remove the functionality from conndscp that changes the conntrack mark, so that it only restores the mark into the diffserv field.

So that I’m clear about which direction I should be headed:

Bearing in mind that conndscp writes to the skb’s iphdr diffserv field and *not* skb->fwmark, do you still desire to see the dscp restoration code done as part of connmark.  In other words NOT have a separate conndscp module?

> 
> 
>>> 
>>> I guess you should look into netfilter to modify any conntrack attribute,
>>> it is at least where conntrack belongs to. :)
>> 
>> So I wonder if an XT_CONNMARK_SAVEDSCP option in xt_connmark would be more acceptable?
> 
> I think so, but I have to say I am not a netfilter expert. You probably
> want to check it with netfilter developers too.

I have a working copy/paste abomination example that creates an XT_CONNMARK_SAVEDSCP type function and will consult the netfilter devs for feedback.


Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-23 17:45                 ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-25 19:17                   ` Cong Wang
  2019-03-27 20:32                     ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2019-03-25 19:17 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: jhs, jiri, netdev

On Sat, Mar 23, 2019 at 10:45 AM Kevin 'ldir' Darbyshire-Bryant
<ldir@darbyshire-bryant.me.uk> wrote:
>
> I will remove the functionality from conndscp that changes the conntrack mark, so that it only restores the mark into the diffserv field.
>
> So that I’m clear about which direction I should be headed:
>
> Bearing in mind that conndscp writes to the skb’s iphdr diffserv field and *not* skb->fwmark, do you still desire to see the dscp restoration code done as part of connmark.  In other words NOT have a separate conndscp module?
>

For me, the barrier is the name "connmark" is confusing if we put conndscp
into it. So, I think leaving conndscp alone is fine.

Perhaps we just need an action called "act_conntrack" which could retrieve
any meaningful information from conntrack to skb.

What do you think?

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/1 v2] net: sched: Introduce conndscp action
  2019-03-25 19:17                   ` Cong Wang
@ 2019-03-27 20:32                     ` Kevin 'ldir' Darbyshire-Bryant
  2019-03-29 20:45                       ` [RFC net-next 0/1] net: sched: Introduce conntrack action Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-27 20:32 UTC (permalink / raw)
  To: Cong Wang; +Cc: jhs, jiri, netdev

Hi Cong,

Thanks for getting back to me once again and apologies for the delayed reply.

> On 25 Mar 2019, at 19:17, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Sat, Mar 23, 2019 at 10:45 AM Kevin 'ldir' Darbyshire-Bryant
> <ldir@darbyshire-bryant.me.uk> wrote:
>> 
>> I will remove the functionality from conndscp that changes the conntrack mark, so that it only restores the mark into the diffserv field.
>> 
>> So that I’m clear about which direction I should be headed:
>> 
>> Bearing in mind that conndscp writes to the skb’s iphdr diffserv field and *not* skb->fwmark, do you still desire to see the dscp restoration code done as part of connmark.  In other words NOT have a separate conndscp module?
>> 
> 
> For me, the barrier is the name "connmark" is confusing if we put conndscp
> into it. So, I think leaving conndscp alone is fine.
> 
> Perhaps we just need an action called "act_conntrack" which could retrieve
> any meaningful information from conntrack to skb.
> 
> What do you think?

Hmm.  A number of thoughts flash through the brain cell: 1) I obviously can’t think of anything else that would/could sensibly be restored from conntrack to skb, but then to my knowledge no one else has yet wanted to restore DSCP, so that’s hardly a good excuse.  2) laziness/fear/lack of skill, I’ve no idea how to go about coding something like that in an extendable, quick and efficient manner.  3) keeping it simple.

I’m working on a simplified conndscp, probably a few days yet for testing/checking - see if that ends up as something more suitable.


> 
> Thanks.


Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-03-27 20:32                     ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-29 20:45                       ` Kevin 'ldir' Darbyshire-Bryant
  2019-03-29 20:45                         ` [RFC net-next 1/1] " Kevin 'ldir' Darbyshire-Bryant
  2019-04-01 13:14                         ` [RFC net-next 0/1] " Marcelo Ricardo Leitner
  0 siblings, 2 replies; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-29 20:45 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: jhs, jiri, netdev, xiyou.wangcong

Hi Cong,

OK, so I've renamed conndscp to conntrack and hopefully this are
flexible enough for future conntrack->skb operations to be added in the
future.  How does this one fly?

Cheers,

Kevin

Kevin Darbyshire-Bryant (1):
  net: sched: Introduce conntrack action

 include/net/tc_act/tc_conntrack.h         |  19 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_conntrack.h  |  30 ++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_conntrack.c                 | 324 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 389 insertions(+)
 create mode 100644 include/net/tc_act/tc_conntrack.h
 create mode 100644 include/uapi/linux/tc_act/tc_conntrack.h
 create mode 100644 net/sched/act_conntrack.c

-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC net-next 1/1] net: sched: Introduce conntrack action
  2019-03-29 20:45                       ` [RFC net-next 0/1] net: sched: Introduce conntrack action Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-29 20:45                         ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-01 13:14                         ` [RFC net-next 0/1] " Marcelo Ricardo Leitner
  1 sibling, 0 replies; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-29 20:45 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: jhs, jiri, netdev, xiyou.wangcong

conntrack is a new tc filter action module.  It is designed to restore
DSCPs stored in conntrack marks

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to implement
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

It is anticipated that userspace control from tc would support the
following parameters:

mask - a 32 bit mask of at least 6 contiguous bits where conntrack will
place the DSCP in conntrack mark.  The DSCP is left-shifted by the
number of unset lower bits of the mask before storing into the mark
field.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by mask.  This represents a conditional operation flag hence
the DSCP is only restored if the flag is set.  This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP.  A statemask of zero disables the
conditional behaviour, the conntrack mark will be updated on every DSCP
change.

mode dscp - conntrack at present only understands one mode 'dscp'.

optional parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 include/net/tc_act/tc_conntrack.h         |  19 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_conntrack.h  |  30 ++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_conntrack.c                 | 324 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 389 insertions(+)
 create mode 100644 include/net/tc_act/tc_conntrack.h
 create mode 100644 include/uapi/linux/tc_act/tc_conntrack.h
 create mode 100644 net/sched/act_conntrack.c

diff --git a/include/net/tc_act/tc_conntrack.h b/include/net/tc_act/tc_conntrack.h
new file mode 100644
index 000000000000..45319e557f90
--- /dev/null
+++ b/include/net/tc_act/tc_conntrack.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CONNTRACK_H
+#define __NET_TC_CONNTRACK_H
+
+#include <net/act_api.h>
+
+struct tcf_conntrack_info {
+	struct tc_action common;
+	struct net *net;
+	u32 mask;
+	u32 statemask;
+	u16 zone;
+	u8 mode;
+	u8 maskshift;
+};
+
+#define to_conntrack(a) ((struct tcf_conntrack_info *)a)
+
+#endif /* __NET_TC_CONNTRACK_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51a0496f78ea..a5a6930084d7 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -105,6 +105,7 @@ enum tca_id {
 	TCA_ID_IFE = TCA_ACT_IFE,
 	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
 	/* other actions go here */
+	TCA_ID_CONNTRACK,
 	__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_conntrack.h b/include/uapi/linux/tc_act/tc_conntrack.h
new file mode 100644
index 000000000000..99c4bfb29c21
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_conntrack.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CONNTRACK_H
+#define __UAPI_TC_CONNTRACK_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+struct tc_conntrack {
+	tc_gen;
+	__u32 mask;
+	__u32 statemask;
+	__u16 zone;
+	__u8 mode;
+	__u8 maskshift;
+};
+
+enum {
+	TCA_CONNTRACK_UNSPEC,
+	TCA_CONNTRACK_PARMS,
+	TCA_CONNTRACK_TM,
+	TCA_CONNTRACK_PAD,
+	__TCA_CONNTRACK_MAX
+};
+#define TCA_CONNTRACK_MAX (__TCA_CONNTRACK_MAX - 1)
+
+enum {
+	CONNTRACK_FLAG_SETDSCP	= BIT(0)
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 5c02ad97ef23..848dc1dd3be1 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -876,6 +876,19 @@ config NET_ACT_CONNMARK
 	  To compile this code as a module, choose M here: the
 	  module will be called act_connmark.
 
+config NET_ACT_CONNTRACK
+        tristate "Netfilter Connmark to DSCP Retriever"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+        help
+	  Say Y here to allow transfer of a connmark stored DSCP into
+	  ipv4/v6 diffserv
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_conntrack.
+
 config NET_ACT_SKBMOD
         tristate "skb data modification action"
         depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c..e962dd782046 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
 obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
+obj-$(CONFIG_NET_ACT_CONNTRACK)	+= act_conntrack.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
diff --git a/net/sched/act_conntrack.c b/net/sched/act_conntrack.c
new file mode 100644
index 000000000000..c08ff06aae60
--- /dev/null
+++ b/net/sched/act_conntrack.c
@@ -0,0 +1,324 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/* net/sched/act_conntrack.c  netfilter conntrack connmark->DSCP action
+ *
+ * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <net/pkt_cls.h>
+#include <uapi/linux/tc_act/tc_conntrack.h>
+#include <net/tc_act/tc_conntrack.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+static unsigned int conntrack_net_id;
+static struct tc_action_ops act_conntrack_ops;
+
+static void tcf_conntrack_set(struct nf_conn *ct, struct tcf_conntrack_info *ca,
+			      struct sk_buff *skb, int proto)
+{
+	u8 newdscp;
+
+	newdscp = (((ct->mark & ca->mask) >> ca->maskshift) << 2) &
+		     ~INET_ECN_MASK;
+
+	/* mark contains DSCP so restore DSCP bits from c->mark into diffserv */
+	/* using overlimits stats to count how many DSCP updates */
+	switch (proto) {
+	case NFPROTO_IPV4:
+		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) !=
+		     newdscp) {
+			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK,
+					    newdscp);
+			ca->tcf_qstats.overlimits++;
+		}
+		break;
+	case NFPROTO_IPV6:
+		if ((ipv6_get_dsfield(ipv6_hdr(skb)) &
+		     ~INET_ECN_MASK) != newdscp) {
+			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK,
+					    newdscp);
+			ca->tcf_qstats.overlimits++;
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static int tcf_conntrack_act(struct sk_buff *skb, const struct tc_action *a,
+			     struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash = NULL;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_conntrack_info *ca = to_conntrack(a);
+	struct nf_conntrack_zone zone;
+	struct nf_conn *ct;
+	int proto;
+
+	spin_lock(&ca->tcf_lock);
+	tcf_lastuse_update(&ca->tcf_tm);
+	bstats_update(&ca->tcf_bstats, skb);
+
+	if (unlikely(!(ca->mode & CONNTRACK_FLAG_SETDSCP)))
+		goto out;
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (skb->len < sizeof(struct iphdr))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (skb->len < sizeof(struct ipv6hdr))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct) { /* look harder */
+		if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+				       proto, ca->net, &tuple))
+			goto out;
+		zone.id = ca->zone;
+		zone.dir = NF_CT_DEFAULT_ZONE_DIR;
+
+		thash = nf_conntrack_find_get(ca->net, &zone, &tuple);
+		if (!thash)
+			goto out;
+
+		if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+				       proto, ca->net, &tuple))
+			goto out;
+
+		ct = nf_ct_tuplehash_to_ctrack(thash);
+	}
+
+	if (!ca->statemask || (ct->mark & ca->statemask))
+		tcf_conntrack_set(ct, ca, skb, proto);
+
+	if (thash)
+		nf_ct_put(ct);
+
+out:
+	spin_unlock(&ca->tcf_lock);
+	return ca->tcf_action;
+}
+
+static const struct nla_policy conntrack_policy[TCA_CONNTRACK_MAX + 1] = {
+	[TCA_CONNTRACK_PARMS] = { .len = sizeof(struct tc_conntrack) },
+};
+
+static void conntrack_parmset(struct tcf_conntrack_info *ci,
+			      struct tc_conntrack *parm)
+{
+	ci->zone = parm->zone;
+	ci->mask = parm->mask;
+	ci->maskshift = ci->mask ? __ffs(ci->mask) : 0;
+	ci->statemask = parm->statemask;
+	ci->mode = parm->mode;
+
+	/* let's not trust userspace entirely */
+	/* need at least contiguous 6 bit mask */
+	if ((0x3f & (ci->mask >> ci->maskshift)) != 0x3f)
+		ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
+	/* mask & statemask must not overlap */
+	if (ci->mask & ci->statemask)
+		ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
+}
+
+static int tcf_conntrack_init(struct net *net, struct nlattr *nla,
+			      struct nlattr *est, struct tc_action **a,
+			      int ovr, int bind, bool rtnl_held,
+			      struct tcf_proto *tp,
+			      struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, conntrack_net_id);
+	struct nlattr *tb[TCA_CONNTRACK_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tcf_conntrack_info *ci;
+	struct tc_conntrack *parm;
+	int ret = 0, err;
+
+	if (!nla)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_CONNTRACK_MAX, nla, conntrack_policy,
+			       NULL);
+	if (ret < 0)
+		return ret;
+
+	if (!tb[TCA_CONNTRACK_PARMS])
+		return -EINVAL;
+
+	parm = nla_data(tb[TCA_CONNTRACK_PARMS]);
+
+	ret = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!ret) {
+		ret = tcf_idr_create(tn, parm->index, est, a,
+				     &act_conntrack_ops, bind, false);
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
+			return ret;
+		}
+
+		ci = to_conntrack(*a);
+		err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch,
+					       extack);
+		if (err < 0)
+			goto release_idr;
+		tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+		ci->net = net;
+		conntrack_parmset(ci, parm);
+
+		tcf_idr_insert(tn, *a);
+		ret = ACT_P_CREATED;
+	} else if (ret > 0) {
+		ci = to_conntrack(*a);
+		if (bind)
+			return 0;
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+		err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch,
+					       extack);
+		if (err < 0)
+			goto release_idr;
+		/* replacing action and zone */
+		spin_lock_bh(&ci->tcf_lock);
+		goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+		conntrack_parmset(ci, parm);
+		spin_unlock_bh(&ci->tcf_lock);
+		if (goto_ch)
+			tcf_chain_put_by_act(goto_ch);
+		ret = 0;
+	}
+
+	return ret;
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static inline int tcf_conntrack_dump(struct sk_buff *skb, struct tc_action *a,
+				     int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_conntrack_info *ci = to_conntrack(a);
+	struct tc_conntrack opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+
+	spin_lock_bh(&ci->tcf_lock);
+	opt.action = ci->tcf_action;
+	opt.zone = ci->zone;
+	opt.mask = ci->mask;
+	opt.statemask = ci->statemask;
+	opt.mode = ci->mode;
+
+	if (nla_put(skb, TCA_CONNTRACK_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	tcf_tm_dump(&t, &ci->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CONNTRACK_TM, sizeof(t), &t,
+			  TCA_CONNTRACK_PAD))
+		goto nla_put_failure;
+	spin_unlock_bh(&ci->tcf_lock);
+
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&ci->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_conntrack_walker(struct net *net, struct sk_buff *skb,
+				struct netlink_callback *cb, int type,
+				const struct tc_action_ops *ops,
+				struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, conntrack_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_conntrack_search(struct net *net, struct tc_action **a,
+				u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, conntrack_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_conntrack_ops = {
+	.kind		=	"conntrack",
+	.id		=	TCA_ID_CONNTRACK,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_conntrack_act,
+	.dump		=	tcf_conntrack_dump,
+	.init		=	tcf_conntrack_init,
+	.walk		=	tcf_conntrack_walker,
+	.lookup		=	tcf_conntrack_search,
+	.size		=	sizeof(struct tcf_conntrack_info),
+};
+
+static __net_init int conntrack_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, conntrack_net_id);
+
+	return tc_action_net_init(tn, &act_conntrack_ops);
+}
+
+static void __net_exit conntrack_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, conntrack_net_id);
+}
+
+static struct pernet_operations conntrack_net_ops = {
+	.init = conntrack_init_net,
+	.exit_batch = conntrack_exit_net,
+	.id   = &conntrack_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init conntrack_init_module(void)
+{
+	return tcf_register_action(&act_conntrack_ops, &conntrack_net_ops);
+}
+
+static void __exit conntrack_cleanup_module(void)
+{
+	tcf_unregister_action(&act_conntrack_ops, &conntrack_net_ops);
+}
+
+module_init(conntrack_init_module);
+module_exit(conntrack_cleanup_module);
+MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>");
+MODULE_DESCRIPTION("Conntrack mark to DSCP restoring");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index 203302065458..104256b4fa2a 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -38,6 +38,7 @@ CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_ACT_BPF=m
 CONFIG_NET_ACT_CONNMARK=m
+CONFIG_NET_ACT_CONNTRACK=m
 CONFIG_NET_ACT_SKBMOD=m
 CONFIG_NET_ACT_IFE=m
 CONFIG_NET_ACT_TUNNEL_KEY=m
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-03-29 20:45                       ` [RFC net-next 0/1] net: sched: Introduce conntrack action Kevin 'ldir' Darbyshire-Bryant
  2019-03-29 20:45                         ` [RFC net-next 1/1] " Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-01 13:14                         ` Marcelo Ricardo Leitner
  2019-04-01 13:54                           ` Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 1 reply; 27+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-04-01 13:14 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant
  Cc: jhs, jiri, netdev, xiyou.wangcong, Paul Blakey, Oz Shlomo

Hi,

On Fri, Mar 29, 2019 at 08:45:06PM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
> Hi Cong,
> 
> OK, so I've renamed conndscp to conntrack and hopefully this are
> flexible enough for future conntrack->skb operations to be added in the
> future.  How does this one fly?

This work sort of clashes with the work that Paul Blakey and I are
doing to integrate conntrack with tc and vice-versa.

Considering that in this patch the action is not RCU-ified, that it is
using a struct as netlink parameter and it is dealing only with the
dscp info, seems it's easier if we/you extend our code to support this
feature as well.  How does that sound to you?

The RFC I had posted is VERY outdated (message-id
cover.1548285996.git.mleitner@redhat.com), please don't use it as
reference.  Not sure if Paul can post a refreshed RFC already.

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-04-01 13:14                         ` [RFC net-next 0/1] " Marcelo Ricardo Leitner
@ 2019-04-01 13:54                           ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-01 14:22                             ` Paul Blakey
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-01 13:54 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: jhs, jiri, netdev, xiyou.wangcong, Paul Blakey, Oz Shlomo



> On 1 Apr 2019, at 14:14, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> 
> Hi,
> 
> On Fri, Mar 29, 2019 at 08:45:06PM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
>> Hi Cong,
>> 
>> OK, so I've renamed conndscp to conntrack and hopefully this are
>> flexible enough for future conntrack->skb operations to be added in the
>> future.  How does this one fly?
> 
> This work sort of clashes with the work that Paul Blakey and I are
> doing to integrate conntrack with tc and vice-versa.
> 
> Considering that in this patch the action is not RCU-ified, that it is
> using a struct as netlink parameter and it is dealing only with the
> dscp info, seems it's easier if we/you extend our code to support this
> feature as well.  How does that sound to you?
> 
> The RFC I had posted is VERY outdated (message-id
> cover.1548285996.git.mleitner@redhat.com), please don't use it as
> reference.  Not sure if Paul can post a refreshed RFC already.
> 
> Thanks,
> Marcelo

I think the reality is that I’m way out of my depth here.  The idea was to have something so simple that I could write(copy - see act_connmark) it/use it for my use case.  I looked at the email you suggested and have not a clue!  Sorry.

Maybe someone can see the idea and run with it.

Kevin



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-04-01 13:54                           ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-01 14:22                             ` Paul Blakey
  2019-04-01 21:06                               ` Cong Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Blakey @ 2019-04-01 14:22 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant, Marcelo Ricardo Leitner
  Cc: Paul Blakey, jhs, jiri, netdev, xiyou.wangcong, Oz Shlomo



> 
> 
>> On 1 Apr 2019, at 14:14, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
>>
>> Hi,
>>
>> On Fri, Mar 29, 2019 at 08:45:06PM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
>>> Hi Cong,
>>>
>>> OK, so I've renamed conndscp to conntrack and hopefully this are
>>> flexible enough for future conntrack->skb operations to be added in the
>>> future.  How does this one fly?
>>
>> This work sort of clashes with the work that Paul Blakey and I are
>> doing to integrate conntrack with tc and vice-versa.
>>
>> Considering that in this patch the action is not RCU-ified, that it is
>> using a struct as netlink parameter and it is dealing only with the
>> dscp info, seems it's easier if we/you extend our code to support this
>> feature as well.  How does that sound to you?
>>
>> The RFC I had posted is VERY outdated (message-id
>> cover.1548285996.git.mleitner@redhat.com), please don't use it as
>> reference.  Not sure if Paul can post a refreshed RFC already.
>>
>> Thanks,
>> Marcelo
> 
> I think the reality is that I’m way out of my depth here.  The idea was to have something so simple that I could write(copy - see act_connmark) it/use it for my use case.  I looked at the email you suggested and have not a clue!  Sorry.
> 
> Maybe someone can see the idea and run with it.
> 
> Kevin
> 
> 

Hi,

We are working on act_ct (basically a act_conntrack) which will be an
action to send packets to conntrack for connection tracking. This two
modes of operation are so different I don't think they need merging.

This would probably be better off with the previous name act_conndscp.


Thanks,
Paul.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-04-01 14:22                             ` Paul Blakey
@ 2019-04-01 21:06                               ` Cong Wang
  2019-04-02  9:24                                 ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Cong Wang @ 2019-04-01 21:06 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Kevin 'ldir' Darbyshire-Bryant, Marcelo Ricardo Leitner,
	jhs, jiri, netdev, Oz Shlomo

On Mon, Apr 1, 2019 at 7:22 AM Paul Blakey <paulb@mellanox.com> wrote:
>
>
>
> >
> >
> >> On 1 Apr 2019, at 14:14, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> On Fri, Mar 29, 2019 at 08:45:06PM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
> >>> Hi Cong,
> >>>
> >>> OK, so I've renamed conndscp to conntrack and hopefully this are
> >>> flexible enough for future conntrack->skb operations to be added in the
> >>> future.  How does this one fly?
> >>
> >> This work sort of clashes with the work that Paul Blakey and I are
> >> doing to integrate conntrack with tc and vice-versa.
> >>
> >> Considering that in this patch the action is not RCU-ified, that it is
> >> using a struct as netlink parameter and it is dealing only with the
> >> dscp info, seems it's easier if we/you extend our code to support this
> >> feature as well.  How does that sound to you?
> >>
> >> The RFC I had posted is VERY outdated (message-id
> >> cover.1548285996.git.mleitner@redhat.com), please don't use it as
> >> reference.  Not sure if Paul can post a refreshed RFC already.
> >>
> >> Thanks,
> >> Marcelo
> >
> > I think the reality is that I’m way out of my depth here.  The idea was to have something so simple that I could write(copy - see act_connmark) it/use it for my use case.  I looked at the email you suggested and have not a clue!  Sorry.
> >
> > Maybe someone can see the idea and run with it.
> >
> > Kevin
> >
> >
>
> Hi,
>
> We are working on act_ct (basically a act_conntrack) which will be an
> action to send packets to conntrack for connection tracking. This two
> modes of operation are so different I don't think they need merging.

What do you mean by "send packets to conntrack"? It is kinda confusing
as TC is L2 and conntrack is L3, you want to re-inject the packets to L3??


>
> This would probably be better off with the previous name act_conndscp.


If naming is the only concern here, then it is not hard to find
a name we are all satisfied with, like act_ctinfo (if we still want more
than just DSCP).

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-04-01 21:06                               ` Cong Wang
@ 2019-04-02  9:24                                 ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-03  7:47                                   ` Paul Blakey
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-02  9:24 UTC (permalink / raw)
  To: Cong Wang
  Cc: Paul Blakey, Marcelo Ricardo Leitner, jhs, jiri, netdev, Oz Shlomo

Hi Cong & all,

> On 1 Apr 2019, at 22:06, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Mon, Apr 1, 2019 at 7:22 AM Paul Blakey <paulb@mellanox.com> wrote:
>> 
> 
<snip some context>
> 
>> 
>> This would probably be better off with the previous name act_conndscp.
> 
> 
> If naming is the only concern here, then it is not hard to find
> a name we are all satisfied with, like act_ctinfo (if we still want more
> than just DSCP).

Personally I don’t *need* anything more than restoring the DSCP from conntrack mark, however my own narrow viewpoint doesn’t preclude some future desire to restore something else.  For that reason I changed the name from act_conndscp to act_conntrack and hope that the latest suggested name change ‘act_ctinfo’ be the last.  Incidentally I like the name ‘act_ctinfo’ especially if it is intended there be a ’send to conntrack’ type action of act_ct/act_conntrack.

Moving on from naming the darn thing and assuming ‘act_ctinfo’ sticks, is there anything fundamentally wrong with the code?  Nitpicks (or bigpicks) in approach or style?  I’m naively hoping the first real submission results in “That’s wonderful code, we’d be mad not to accept it” instead of “you’ve done that wrong, and that, and that, and that….” :-)


> 
> Thanks.


Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-04-02  9:24                                 ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-03  7:47                                   ` Paul Blakey
  2019-04-03  8:23                                     ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Blakey @ 2019-04-03  7:47 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant, Cong Wang
  Cc: Paul Blakey, Marcelo Ricardo Leitner, jhs, jiri, netdev, Oz Shlomo



Kevin 'ldir' Darbyshire-Bryant wrote on 02/04/2019 12:24:
> Hi Cong & all,
> 
>> On 1 Apr 2019, at 22:06, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> On Mon, Apr 1, 2019 at 7:22 AM Paul Blakey <paulb@mellanox.com> wrote:
>>>
>>
> <snip some context>
>>
>>>
>>> This would probably be better off with the previous name act_conndscp.
>>
>>
>> If naming is the only concern here, then it is not hard to find
>> a name we are all satisfied with, like act_ctinfo (if we still want more
>> than just DSCP).
> 
> Personally I don’t *need* anything more than restoring the DSCP from conntrack mark, however my own narrow viewpoint doesn’t preclude some future desire to restore something else.  For that reason I changed the name from act_conndscp to act_conntrack and hope that the latest suggested name change ‘act_ctinfo’ be the last.  Incidentally I like the name ‘act_ctinfo’ especially if it is intended there be a ’send to conntrack’ type action of act_ct/act_conntrack.
> 
> Moving on from naming the darn thing and assuming ‘act_ctinfo’ sticks, is there anything fundamentally wrong with the code?  Nitpicks (or bigpicks) in approach or style?  I’m naively hoping the first real submission results in “That’s wonderful code, we’d be mad not to accept it” instead of “you’ve done that wrong, and that, and that, and that….” :-)
> 
> 
>>
>> Thanks.
> 
> 
> Cheers,
> 
> Kevin D-B
> 
> gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
> 


Hi Kevin,
If you'd like, You can rebase it on our upcoming act_ct (our
act_conntrack) once we're done (hopefully in a couple of weeks). If you
going with the a standalone action, it's fine with me as well.


> +       /* let's not trust userspace entirely */
> +       /* need at least contiguous 6 bit mask */
> +       if ((0x3f & (ci->mask >> ci->maskshift)) != 0x3f)
> +               ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
> +       /* mask & statemask must not overlap */
> +       if (ci->mask & ci->statemask)
> +               ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
> +

Some nitpicks, you check if the user provides sane input in
conntrack_parmset, but instead of returning an error, you just disable
the only action the user provided and the module supports, so it won't
do nothing, yet the command succeeds.

As marcelo said, this module isn't RCUified... see the design pattern in
act_vlan, act_tunnel_key, act_csum, or what this commits changed:

commit 4c5b9d9642c859f7369338fc42c0f62f4151bef3
Author: Manish Kurup <kurup.manish@gmail.com>
act_vlan: VLAN action rewrite to use RCU lock/unlock and update

commit 9c5f69bbd75a7db80578782b037629c5f1e59dce
Author: Davide Caratti <dcaratti@redhat.com>
net/sched: act_csum: don't use spinlock in the fast path


And regarding the
> +       ct = nf_ct_get(skb, &ctinfo);
> +       if (!ct) { /* look harder */


The packet didn't pass connection tracking yet right (!ct) but you're
setting the DSCP early?


> +               if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
> +                                      proto, ca->net, &tuple))
> +                       goto out;
> +               zone.id = ca->zone;
> +               zone.dir = NF_CT_DEFAULT_ZONE_DIR;
> +
> +               thash = nf_conntrack_find_get(ca->net, &zone, &tuple);
> +               if (!thash)
> +                       goto out;
> +
> +               if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
> +                                      proto, ca->net, &tuple))
> +                       goto out;
> +

Aren't searching for the same tuple twice?

Thanks,
Paul.







^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-04-03  7:47                                   ` Paul Blakey
@ 2019-04-03  8:23                                     ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-03 11:56                                       ` Paul Blakey
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-03  8:23 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Cong Wang, Marcelo Ricardo Leitner, jhs, jiri, netdev, Oz Shlomo



> On 3 Apr 2019, at 08:47, Paul Blakey <paulb@mellanox.com> wrote:
> 
> 
> 
> Kevin 'ldir' Darbyshire-Bryant wrote on 02/04/2019 12:24:
>> Hi Cong & all,
>> 
>>> On 1 Apr 2019, at 22:06, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> On Mon, Apr 1, 2019 at 7:22 AM Paul Blakey <paulb@mellanox.com> wrote:
>>>> 
>>> 
>> <snip some context>
> 
> 
> Hi Kevin,
> If you'd like, You can rebase it on our upcoming act_ct (our
> act_conntrack) once we're done (hopefully in a couple of weeks). If you
> going with the a standalone action, it's fine with me as well.
> 
> 
>> +       /* let's not trust userspace entirely */
>> +       /* need at least contiguous 6 bit mask */
>> +       if ((0x3f & (ci->mask >> ci->maskshift)) != 0x3f)
>> +               ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
>> +       /* mask & statemask must not overlap */
>> +       if (ci->mask & ci->statemask)
>> +               ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
>> +
> 
> Some nitpicks, you check if the user provides sane input in
> conntrack_parmset, but instead of returning an error, you just disable
> the only action the user provided and the module supports, so it won't
> do nothing, yet the command succeeds.

Ok, I’ll return an -E something.  I guess I’m still stuck between this
generic ‘act_ctinfo’ potentially does more than one thing vs ‘act_conndscp’
doing a single thing.

> 
> As marcelo said, this module isn't RCUified... see the design pattern in
> act_vlan, act_tunnel_key, act_csum, or what this commits changed:
> 
> commit 4c5b9d9642c859f7369338fc42c0f62f4151bef3
> Author: Manish Kurup <kurup.manish@gmail.com>
> act_vlan: VLAN action rewrite to use RCU lock/unlock and update
> 
> commit 9c5f69bbd75a7db80578782b037629c5f1e59dce
> Author: Davide Caratti <dcaratti@redhat.com>
> net/sched: act_csum: don't use spinlock in the fast path

Ok, will take a look.  Note current act_connmark on which this is a
shameless copy has the same problem.  Is that an oversight and also
needs fixing?

> 
> 
> And regarding the
>> +       ct = nf_ct_get(skb, &ctinfo);
>> +       if (!ct) { /* look harder */
> 
> 
> The packet didn't pass connection tracking yet right (!ct) but you're
> setting the DSCP early?
> 
> 
>> +               if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
>> +                                      proto, ca->net, &tuple))
>> +                       goto out;
>> +               zone.id = ca->zone;
>> +               zone.dir = NF_CT_DEFAULT_ZONE_DIR;
>> +
>> +               thash = nf_conntrack_find_get(ca->net, &zone, &tuple);
>> +               if (!thash)
>> +                       goto out;
>> +
>> +               if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
>> +                                      proto, ca->net, &tuple))
>> +                       goto out;
>> +
> 
> Aren't searching for the same tuple twice?

Again, I’m not doing anything that act_connmark (conntrack mark to
skb mark setting) isn’t doing already, so is act_connmark also wrong?
As you can almost certainly tell I’m working in areas that I don’t
understand, I only (think I) know the result I wish to achieve and so far
it is working.  More luck than judgement?! :-)

> 
> Thanks,
> Paul.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-04-03  8:23                                     ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-03 11:56                                       ` Paul Blakey
  2019-04-03 12:35                                         ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Blakey @ 2019-04-03 11:56 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant
  Cc: Paul Blakey, Cong Wang, Marcelo Ricardo Leitner, jhs, jiri,
	netdev, Oz Shlomo



Kevin 'ldir' Darbyshire-Bryant wrote on 03/04/2019 11:23:
> 
> 
>> On 3 Apr 2019, at 08:47, Paul Blakey <paulb@mellanox.com> wrote:
>>
>>
>>
>> Kevin 'ldir' Darbyshire-Bryant wrote on 02/04/2019 12:24:
>>> Hi Cong & all,
>>>
>>>> On 1 Apr 2019, at 22:06, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>
>>>> On Mon, Apr 1, 2019 at 7:22 AM Paul Blakey <paulb@mellanox.com> wrote:
>>>>>
>>>>
>>> <snip some context>
>>
>>
>> Hi Kevin,
>> If you'd like, You can rebase it on our upcoming act_ct (our
>> act_conntrack) once we're done (hopefully in a couple of weeks). If you
>> going with the a standalone action, it's fine with me as well.
>>
>>
>>> +       /* let's not trust userspace entirely */
>>> +       /* need at least contiguous 6 bit mask */
>>> +       if ((0x3f & (ci->mask >> ci->maskshift)) != 0x3f)
>>> +               ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
>>> +       /* mask & statemask must not overlap */
>>> +       if (ci->mask & ci->statemask)
>>> +               ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
>>> +
>>
>> Some nitpicks, you check if the user provides sane input in
>> conntrack_parmset, but instead of returning an error, you just disable
>> the only action the user provided and the module supports, so it won't
>> do nothing, yet the command succeeds.
> 
> Ok, I’ll return an -E something.  I guess I’m still stuck between this
> generic ‘act_ctinfo’ potentially does more than one thing vs ‘act_conndscp’
> doing a single thing.
> 
>>
>> As marcelo said, this module isn't RCUified... see the design pattern in
>> act_vlan, act_tunnel_key, act_csum, or what this commits changed:
>>
>> commit 4c5b9d9642c859f7369338fc42c0f62f4151bef3
>> Author: Manish Kurup <kurup.manish@gmail.com>
>> act_vlan: VLAN action rewrite to use RCU lock/unlock and update
>>
>> commit 9c5f69bbd75a7db80578782b037629c5f1e59dce
>> Author: Davide Caratti <dcaratti@redhat.com>
>> net/sched: act_csum: don't use spinlock in the fast path
> 
> Ok, will take a look.  Note current act_connmark on which this is a
> shameless copy has the same problem.  Is that an oversight and also
> needs fixing?

I think it's a performance consideration (and not an error) for now, and
there were yet to be updated.

> 
>>
>>
>> And regarding the
>>> +       ct = nf_ct_get(skb, &ctinfo);
>>> +       if (!ct) { /* look harder */
>>
>>
>> The packet didn't pass connection tracking yet right (!ct) but you're
>> setting the DSCP early?
>>
>>
>>> +               if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
>>> +                                      proto, ca->net, &tuple))
>>> +                       goto out;
>>> +               zone.id = ca->zone;
>>> +               zone.dir = NF_CT_DEFAULT_ZONE_DIR;
>>> +
>>> +               thash = nf_conntrack_find_get(ca->net, &zone, &tuple);
>>> +               if (!thash)
>>> +                       goto out;
>>> +
>>> +               if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
>>> +                                      proto, ca->net, &tuple))
>>> +                       goto out;
>>> +
>>
>> Aren't searching for the same tuple twice?
> 
> Again, I’m not doing anything that act_connmark (conntrack mark to
> skb mark setting) isn’t doing already, so is act_connmark also wrong?
> As you can almost certainly tell I’m working in areas that I don’t
> understand, I only (think I) know the result I wish to achieve and so far
> it is working.  More luck than judgement?! :-)
> 

if I recall correctly, act_connmark doesn't call nf_ct_get_tuplepr twice.


>>
>> Thanks,
>> Paul.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC net-next 0/1] net: sched: Introduce conntrack action
  2019-04-03 11:56                                       ` Paul Blakey
@ 2019-04-03 12:35                                         ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-09 11:33                                           ` [RFC net-next 0/1] net: sched: Introduce act_ctinfo action Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-03 12:35 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Cong Wang, Marcelo Ricardo Leitner, jhs, jiri, netdev, Oz Shlomo



> On 3 Apr 2019, at 12:56, Paul Blakey <paulb@mellanox.com> wrote:
> 
> 
> 
> Kevin 'ldir' Darbyshire-Bryant wrote on 03/04/2019 11:23:
>> 
>> 
>>> On 3 Apr 2019, at 08:47, Paul Blakey <paulb@mellanox.com> wrote:
>>> 
>>> 
>>> 
>>> Kevin 'ldir' Darbyshire-Bryant wrote on 02/04/2019 12:24:
>>>> Hi Cong & all,
>>>> 
>>>>> On 1 Apr 2019, at 22:06, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> 
>>>>> On Mon, Apr 1, 2019 at 7:22 AM Paul Blakey <paulb@mellanox.com> wrote:
>>>>>> 
>>>>> 
>>>> <snip some context>
>>> 
>>> 
>>> Hi Kevin,
>>> If you'd like, You can rebase it on our upcoming act_ct (our
>>> act_conntrack) once we're done (hopefully in a couple of weeks). If you
>>> going with the a standalone action, it's fine with me as well.
>>> 
>>> 
>>>> +       /* let's not trust userspace entirely */
>>>> +       /* need at least contiguous 6 bit mask */
>>>> +       if ((0x3f & (ci->mask >> ci->maskshift)) != 0x3f)
>>>> +               ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
>>>> +       /* mask & statemask must not overlap */
>>>> +       if (ci->mask & ci->statemask)
>>>> +               ci->mode &= ~CONNTRACK_FLAG_SETDSCP;
>>>> +
>>> 
>>> Some nitpicks, you check if the user provides sane input in
>>> conntrack_parmset, but instead of returning an error, you just disable
>>> the only action the user provided and the module supports, so it won't
>>> do nothing, yet the command succeeds.
>> 
>> Ok, I’ll return an -E something.  I guess I’m still stuck between this
>> generic ‘act_ctinfo’ potentially does more than one thing vs ‘act_conndscp’
>> doing a single thing.
>> 
>>> 
>>> As marcelo said, this module isn't RCUified... see the design pattern in
>>> act_vlan, act_tunnel_key, act_csum, or what this commits changed:
>>> 
>>> commit 4c5b9d9642c859f7369338fc42c0f62f4151bef3
>>> Author: Manish Kurup <kurup.manish@gmail.com>
>>> act_vlan: VLAN action rewrite to use RCU lock/unlock and update
>>> 
>>> commit 9c5f69bbd75a7db80578782b037629c5f1e59dce
>>> Author: Davide Caratti <dcaratti@redhat.com>
>>> net/sched: act_csum: don't use spinlock in the fast path
>> 
>> Ok, will take a look.  Note current act_connmark on which this is a
>> shameless copy has the same problem.  Is that an oversight and also
>> needs fixing?
> 
> I think it's a performance consideration (and not an error) for now, and
> there were yet to be updated.
> 
>> 
>>> 
>>> 
>>> And regarding the
>>>> +       ct = nf_ct_get(skb, &ctinfo);
>>>> +       if (!ct) { /* look harder */
>>> 
>>> 
>>> The packet didn't pass connection tracking yet right (!ct) but you're
>>> setting the DSCP early?
>>> 
>>> 
>>>> +               if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
>>>> +                                      proto, ca->net, &tuple))
>>>> +                       goto out;
>>>> +               zone.id = ca->zone;
>>>> +               zone.dir = NF_CT_DEFAULT_ZONE_DIR;
>>>> +
>>>> +               thash = nf_conntrack_find_get(ca->net, &zone, &tuple);
>>>> +               if (!thash)
>>>> +                       goto out;
>>>> +
>>>> +               if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
>>>> +                                      proto, ca->net, &tuple))
>>>> +                       goto out;
>>>> +
>>> 
>>> Aren't searching for the same tuple twice?
>> 
>> Again, I’m not doing anything that act_connmark (conntrack mark to
>> skb mark setting) isn’t doing already, so is act_connmark also wrong?
>> As you can almost certainly tell I’m working in areas that I don’t
>> understand, I only (think I) know the result I wish to achieve and so far
>> it is working.  More luck than judgement?! :-)
>> 
> 
> if I recall correctly, act_connmark doesn't call nf_ct_get_tuplepr twice.
> 

Whoops!  Hangs head in shame - copy/paste merge error at some point - will fix…
along with all the other stuff.


> 
>>> 
>>> Thanks,
>>> Paul.


Cheers,

Kevin D-B


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC net-next 0/1] net: sched: Introduce act_ctinfo action
  2019-04-03 12:35                                         ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-09 11:33                                           ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-09 11:33                                             ` [RFC net-next 1/1] " Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-09 11:33 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant
  Cc: jhs, jiri, marcelo.leitner, netdev, ozsh, paulb, xiyou.wangcong

Hi Cong & everyone.

Another rename & some more tweaks.  I'm not sure quite what version
we're up to but on the basis of the name change (again) I'm assuming v1.

Changes:

Changed the netlink parameter passing to mostly indepedent
flags/structures to hopefully facilitate easier future extension of this
action/module.

Sanity checks on userspace passed parameters now return with -EINVAL
instead of silently nulling them out.

RCU'ified the action update based on structure seen in act_csum.  I hope
to $DEITY I've done that right :-)

Closer to perfection?

Kevin Darbyshire-Bryant (1):
  net: sched: Introduce act_ctinfo action

 include/net/tc_act/tc_ctinfo.h            |  24 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_ctinfo.h     |  33 ++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_ctinfo.c                    | 372 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 445 insertions(+)
 create mode 100644 include/net/tc_act/tc_ctinfo.h
 create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
 create mode 100644 net/sched/act_ctinfo.c

-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC net-next 1/1] net: sched: Introduce act_ctinfo action
  2019-04-09 11:33                                           ` [RFC net-next 0/1] net: sched: Introduce act_ctinfo action Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-09 11:33                                             ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 0 replies; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-09 11:33 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant
  Cc: jhs, jiri, marcelo.leitner, netdev, ozsh, paulb, xiyou.wangcong

ctinfo is a new tc filter action module.  It is designed to restore DSCPs
stored in conntrack marks into the ipv4/v6 diffserv field.

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to indicate
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

ctinfo understands the following parameters:

dscp dscpmask[/statemask]

dscpmask - a 32 bit mask of at least 6 contiguous bits and indicates
where ctinfo will find the DSCP bits stored in the conntrack mark.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by dscpmask.  This represents a conditional operation flag
whereby the DSCP is only restored if the flag is set.  This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP.  A mask of zero disables the
conditional behaviour ie. the conntrack mark DSCP bits are always
restored to the ip diffserv field (assuming the conntrack entry is found
& the skb is an ipv4/ipv6 type)

optional parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>)

e.g. dscp 0xfc000000/0x01000000

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 include/net/tc_act/tc_ctinfo.h            |  24 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_ctinfo.h     |  33 ++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_ctinfo.c                    | 372 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 445 insertions(+)
 create mode 100644 include/net/tc_act/tc_ctinfo.h
 create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
 create mode 100644 net/sched/act_ctinfo.c

diff --git a/include/net/tc_act/tc_ctinfo.h b/include/net/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..bb33e66d3ea5
--- /dev/null
+++ b/include/net/tc_act/tc_ctinfo.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CTINFO_H
+#define __NET_TC_CTINFO_H
+
+#include <net/act_api.h>
+
+struct tcf_ctinfo_params {
+	struct net *net;
+	u32 dscpmask;
+	u32 dscpstatemask;
+	u16 zone;
+	u8 mode;
+	u8 dscpmaskshift;
+	struct rcu_head rcu;
+};
+
+struct tcf_ctinfo {
+	struct tc_action common;
+	struct tcf_ctinfo_params __rcu *params;
+};
+
+#define to_ctinfo(a) ((struct tcf_ctinfo *)a)
+
+#endif /* __NET_TC_CTINFO_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51a0496f78ea..a93680fc4bfa 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -105,6 +105,7 @@ enum tca_id {
 	TCA_ID_IFE = TCA_ACT_IFE,
 	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
 	/* other actions go here */
+	TCA_ID_CTINFO,
 	__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h b/include/uapi/linux/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..b84902b5e3b1
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ctinfo.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CTINFO_H
+#define __UAPI_TC_CTINFO_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+struct tc_ctinfo {
+	tc_gen;
+};
+
+struct tc_ctinfo_dscp {
+	__u32 mask;
+	__u32 statemask;
+};
+
+enum {
+	TCA_CTINFO_UNSPEC,
+	TCA_CTINFO_ACT,
+	TCA_CTINFO_ZONE,
+	TCA_CTINFO_DSCP_PARMS,
+	TCA_CTINFO_MODE_DSCP,
+	TCA_CTINFO_TM,
+	TCA_CTINFO_PAD,
+	__TCA_CTINFO_MAX
+};
+#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
+
+enum {
+	CTINFO_MODE_SETDSCP	= BIT(0)
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 5c02ad97ef23..5ac01c5ebae9 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -876,6 +876,19 @@ config NET_ACT_CONNMARK
 	  To compile this code as a module, choose M here: the
 	  module will be called act_connmark.
 
+config NET_ACT_CTINFO
+        tristate "Netfilter Connmark to DSCP Retriever"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+        help
+	  Say Y here to allow transfer of a connmark stored DSCP into
+	  ipv4/v6 diffserv
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_ctinfo.
+
 config NET_ACT_SKBMOD
         tristate "skb data modification action"
         depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c..d54bfcbd7981 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
 obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
+obj-$(CONFIG_NET_ACT_CTINFO)	+= act_ctinfo.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
new file mode 100644
index 000000000000..7307eb62eba5
--- /dev/null
+++ b/net/sched/act_ctinfo.c
@@ -0,0 +1,372 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* net/sched/act_ctinfo.c  netfilter ctinfo connmark->DSCP action
+ *
+ * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <net/pkt_cls.h>
+#include <uapi/linux/tc_act/tc_ctinfo.h>
+#include <net/tc_act/tc_ctinfo.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+static unsigned int ctinfo_net_id;
+static struct tc_action_ops act_ctinfo_ops;
+
+static void tcf_ctinfo_dscp_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
+				struct tcf_ctinfo_params *cp,
+				struct sk_buff *skb, int wlen, int proto)
+{
+	u8 dscp,newdscp;
+
+	newdscp = (((ct->mark & cp->dscpmask) >> cp->dscpmaskshift) << 2) &
+		     ~INET_ECN_MASK;
+
+	/* mark contains DSCP so restore DSCP bits from c->mark into diffserv */
+	/* using overlimits stats to count how many DSCP updates */
+	switch (proto) {
+	case NFPROTO_IPV4:
+		dscp = ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (!skb_try_make_writable(skb, wlen)) {
+				ipv4_change_dsfield(ip_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->tcf_qstats.overlimits++;
+			} else {
+				ca->tcf_qstats.drops++;
+			}
+		}
+		break;
+	case NFPROTO_IPV6:
+		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (!skb_try_make_writable(skb, wlen)) {
+				ipv6_change_dsfield(ipv6_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->tcf_qstats.overlimits++;
+			} else {
+				ca->tcf_qstats.drops++;
+			}
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static int tcf_ctinfo_act(struct sk_buff *skb, const struct tc_action *a,
+			  struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash = NULL;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_ctinfo *ca = to_ctinfo(a);
+	struct tcf_ctinfo_params *cp;
+	struct nf_conntrack_zone zone;
+	struct nf_conn *ct;
+	int proto, wlen;
+	int action;
+
+	cp = rcu_dereference_bh(ca->params);
+
+	tcf_lastuse_update(&ca->tcf_tm);
+	bstats_update(&ca->tcf_bstats, skb);
+	action = READ_ONCE(ca->tcf_action);
+
+	/* currently the only mode we know but in future...*/
+	if (unlikely(!(cp->mode & CTINFO_MODE_SETDSCP)))
+		goto out;
+
+	wlen = skb_network_offset(skb);
+	if (tc_skb_protocol(skb) == htons(ETH_P_IP)) {
+		wlen += sizeof(struct iphdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (tc_skb_protocol(skb) == htons(ETH_P_IPV6)) {
+		wlen += sizeof(struct ipv6hdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct) { /* look harder usually ingress */
+		if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+				       proto, cp->net, &tuple))
+			goto out;
+		zone.id = cp->zone;
+		zone.dir = NF_CT_DEFAULT_ZONE_DIR;
+
+		thash = nf_conntrack_find_get(cp->net, &zone, &tuple);
+		if (!thash)
+			goto out;
+
+		ct = nf_ct_tuplehash_to_ctrack(thash);
+	}
+
+	if (!cp->dscpstatemask || (ct->mark & cp->dscpstatemask))
+		tcf_ctinfo_dscp_set(ct, ca, cp, skb, wlen, proto);
+
+	if (thash)
+		nf_ct_put(ct);
+out:
+	return action;
+}
+
+static const struct nla_policy ctinfo_policy[TCA_CTINFO_MAX + 1] = {
+	[TCA_CTINFO_ACT] = { .len = sizeof(struct tc_ctinfo) },
+	[TCA_CTINFO_ZONE] = { .type = NLA_U16 },
+	[TCA_CTINFO_MODE_DSCP] = { .type = NLA_FLAG },
+	[TCA_CTINFO_DSCP_PARMS] = { .len = sizeof(struct tc_ctinfo_dscp) },
+};
+
+static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
+			     struct nlattr *est, struct tc_action **a,
+			     int ovr, int bind, bool rtnl_held,
+			     struct tcf_proto *tp,
+			     struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+	struct tcf_ctinfo_params *cp_new;
+	struct nlattr *tb[TCA_CTINFO_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tcf_ctinfo *ci;
+	struct tc_ctinfo *actparm;
+	struct tc_ctinfo_dscp *dscpparm;
+	int ret = 0, err, i;
+
+	if (!nla)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CTINFO_MAX, nla, ctinfo_policy, NULL);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_CTINFO_ACT])
+		return -EINVAL;
+
+	if (tb[TCA_CTINFO_MODE_DSCP] && !tb[TCA_CTINFO_DSCP_PARMS])
+		return -EINVAL;
+
+	actparm = nla_data(tb[TCA_CTINFO_ACT]);
+	dscpparm = nla_data(tb[TCA_CTINFO_DSCP_PARMS]);
+
+	if (dscpparm) {
+		/* need at least contiguous 6 bit mask */
+		i = dscpparm->mask ? __ffs(dscpparm->mask) : 0;
+		if ((0x3f & (dscpparm->mask >> i)) != 0x3f)
+			return -EINVAL;
+		/* mask & statemask must not overlap */
+		if (dscpparm->mask & dscpparm->statemask)
+			return -EINVAL;
+	}
+//done the validation:now to the actual action allocation
+	err = tcf_idr_check_alloc(tn, &actparm->index, a, bind);
+	if (!err) {
+		ret = tcf_idr_create(tn, actparm->index, est, a,
+				     &act_ctinfo_ops, bind, false);
+		if (ret) {
+			tcf_idr_cleanup(tn, actparm->index);
+			return ret;
+		}
+	} else if (err > 0) {
+		if (bind) /* don't override defaults */
+			return 0;
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+	} else {
+		return err;
+	}
+
+	err = tcf_action_check_ctrlact(actparm->action, tp, &goto_ch, extack);
+	if (err < 0)
+		goto release_idr;
+
+	ci = to_ctinfo(*a);
+
+	cp_new = kzalloc(sizeof(*cp_new), GFP_KERNEL);
+	if (unlikely(!cp_new)) {
+		err = -ENOMEM;
+		goto put_chain;
+	}
+
+	cp_new->net = net;
+	cp_new->zone = tb[TCA_CTINFO_ZONE] ? nla_get_u16(tb[TCA_CTINFO_ZONE]) : 0;
+	if (dscpparm) {
+		cp_new->dscpmask = dscpparm->mask;
+		cp_new->dscpmaskshift = cp_new->dscpmask ? __ffs(cp_new->dscpmask) : 0;
+		cp_new->dscpstatemask = dscpparm->statemask;
+	}
+
+	if (tb[TCA_CTINFO_MODE_DSCP])
+		cp_new->mode |= CTINFO_MODE_SETDSCP;
+	else
+		cp_new->mode &= ~CTINFO_MODE_SETDSCP;
+
+	spin_lock_bh(&ci->tcf_lock);
+	goto_ch = tcf_action_set_ctrlact(*a, actparm->action, goto_ch);
+	rcu_swap_protected(ci->params, cp_new,
+				lockdep_is_held(&ci->tcf_lock));
+	spin_unlock_bh(&ci->tcf_lock);
+
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	if (cp_new)
+		kfree_rcu(cp_new, rcu);
+
+	if (ret == ACT_P_CREATED)
+		tcf_idr_insert(tn, *a);
+
+	return ret;
+
+put_chain:
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static inline int tcf_ctinfo_dump(struct sk_buff *skb, struct tc_action *a,
+				    int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ctinfo *ci = to_ctinfo(a);
+	struct tcf_ctinfo_params *cp;
+	struct tc_ctinfo opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+	struct tc_ctinfo_dscp dscpparm;
+
+	spin_lock_bh(&ci->tcf_lock);
+	cp = rcu_dereference_protected(ci->params,
+				       lockdep_is_held(&ci->tcf_lock));
+	opt.action = ci->tcf_action;
+
+	if (nla_put(skb, TCA_CTINFO_ACT, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (cp->mode & CTINFO_MODE_SETDSCP) {
+		dscpparm.mask = cp->dscpmask;
+		dscpparm.statemask = cp->dscpstatemask;
+		if (nla_put(skb, TCA_CTINFO_DSCP_PARMS, sizeof(dscpparm), &dscpparm))
+			goto nla_put_failure;
+
+		if (nla_put_flag(skb, TCA_CTINFO_MODE_DSCP))
+			goto nla_put_failure;
+	}
+
+	if (cp->zone) {
+		if (nla_put_u16(skb, TCA_CTINFO_ZONE, cp->zone))
+			goto nla_put_failure;
+	}
+
+	tcf_tm_dump(&t, &ci->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CTINFO_TM, sizeof(t), &t,
+			  TCA_CTINFO_PAD))
+		goto nla_put_failure;
+
+	spin_unlock_bh(&ci->tcf_lock);
+
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&ci->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_ctinfo_walker(struct net *net, struct sk_buff *skb,
+			       struct netlink_callback *cb, int type,
+			       const struct tc_action_ops *ops,
+			       struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_ctinfo_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_ctinfo_ops = {
+	.kind		=	"ctinfo",
+	.id		=	TCA_ID_CTINFO,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ctinfo_act,
+	.dump		=	tcf_ctinfo_dump,
+	.init		=	tcf_ctinfo_init,
+	.walk		=	tcf_ctinfo_walker,
+	.lookup		=	tcf_ctinfo_search,
+	.size		=	sizeof(struct tcf_ctinfo),
+};
+
+static __net_init int ctinfo_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tc_action_net_init(tn, &act_ctinfo_ops);
+}
+
+static void __net_exit ctinfo_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, ctinfo_net_id);
+}
+
+static struct pernet_operations ctinfo_net_ops = {
+	.init = ctinfo_init_net,
+	.exit_batch = ctinfo_exit_net,
+	.id   = &ctinfo_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init ctinfo_init_module(void)
+{
+	return tcf_register_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+static void __exit ctinfo_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+module_init(ctinfo_init_module);
+module_exit(ctinfo_cleanup_module);
+MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>");
+MODULE_DESCRIPTION("Conntrack mark to DSCP restoring");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index 203302065458..9d1fddcfb887 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -37,6 +37,7 @@ CONFIG_NET_ACT_SKBEDIT=m
 CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_ACT_BPF=m
+CONFIG_NET_ACT_CONNDSCP=m
 CONFIG_NET_ACT_CONNMARK=m
 CONFIG_NET_ACT_SKBMOD=m
 CONFIG_NET_ACT_IFE=m
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-04-09 11:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 19:49 [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action Kevin 'ldir' Darbyshire-Bryant
2019-03-19 19:49 ` [PATCH 1/1] " Kevin 'ldir' Darbyshire-Bryant
2019-03-21 17:58   ` kbuild test robot
2019-03-22 14:09 ` [RFC PATCH 1/1 v2] " Kevin 'ldir' Darbyshire-Bryant
2019-03-22 17:39   ` Cong Wang
2019-03-22 18:26     ` Kevin 'ldir' Darbyshire-Bryant
2019-03-22 20:05       ` Cong Wang
2019-03-22 20:50         ` Kevin 'ldir' Darbyshire-Bryant
2019-03-22 21:31           ` Cong Wang
2019-03-22 22:06             ` Kevin 'ldir' Darbyshire-Bryant
2019-03-22 23:09               ` Cong Wang
2019-03-23 17:45                 ` Kevin 'ldir' Darbyshire-Bryant
2019-03-25 19:17                   ` Cong Wang
2019-03-27 20:32                     ` Kevin 'ldir' Darbyshire-Bryant
2019-03-29 20:45                       ` [RFC net-next 0/1] net: sched: Introduce conntrack action Kevin 'ldir' Darbyshire-Bryant
2019-03-29 20:45                         ` [RFC net-next 1/1] " Kevin 'ldir' Darbyshire-Bryant
2019-04-01 13:14                         ` [RFC net-next 0/1] " Marcelo Ricardo Leitner
2019-04-01 13:54                           ` Kevin 'ldir' Darbyshire-Bryant
2019-04-01 14:22                             ` Paul Blakey
2019-04-01 21:06                               ` Cong Wang
2019-04-02  9:24                                 ` Kevin 'ldir' Darbyshire-Bryant
2019-04-03  7:47                                   ` Paul Blakey
2019-04-03  8:23                                     ` Kevin 'ldir' Darbyshire-Bryant
2019-04-03 11:56                                       ` Paul Blakey
2019-04-03 12:35                                         ` Kevin 'ldir' Darbyshire-Bryant
2019-04-09 11:33                                           ` [RFC net-next 0/1] net: sched: Introduce act_ctinfo action Kevin 'ldir' Darbyshire-Bryant
2019-04-09 11:33                                             ` [RFC net-next 1/1] " Kevin 'ldir' Darbyshire-Bryant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.