netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] netfilter: xt_connmark: add savedscp-mark action
@ 2019-03-24 14:23 Kevin 'ldir' Darbyshire-Bryant
  2019-03-24 14:23 ` [PATCH 1/1] netfilter: connmark: introduce savedscp Kevin 'ldir' Darbyshire-Bryant
  2019-12-03 16:06 ` [PATCH 0/1] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
  0 siblings, 2 replies; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-24 14:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Kevin 'ldir' Darbyshire-Bryant

What-ho Chaps!

With nervousness and trepidation I'm submitting the attached RFC patch
for an extension to 'connmark'.

savedscp-mark is designed to copy DSCPs to conntrack marks and is
intended to be used with a tc filter based companion action currently in
the process of being submitted called 'conndscp'.  The TC people are ok
with extracting information from conntrack connmarks but are much less
keen on TC changing those marks.  Hence the requirement to split the
functionality across two subsystems 'tc' & 'netfilter'.

At this stage I'm primarily interested in whether an extension to
xt_connmark that writes DSCP to conntrack marks for late use/pickup by a
tc filter is acceptable in terms of concept.

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to implement
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway, hence the efforts in trying to
restore a classification on ingress based on a classification made on
egress.


x_tables CONNMARK with the new savedscp action solves the problem of
storing the DSCP to the conntrack mark.
    
It accepts 2 parameters.  The mark is a 32bit value with usually one
bit set.  This bit is set when savedscp saves the DSCP to the mark.
This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A mark
of zero disables the setting of a status bit/s.
    
The mask is a 32bit value of at least 6 contiguous bits and represents
the area where the DSCP will be stored.
    
e.g.
    
iptables -A QOS_MARK_eth0 -t mangle -j CONNMARK --savedscp-mark \
0x01000000/0xfc000000


A typical example of using savedscp-mark to store DSCP values for use
with a tc conndscp & hence qdisc (e.g. CAKE) is shown below, using top 6
bits to store the DSCP and the bottom bit of top byte as the state flag.

#iptables rules using the statemask flag to only do it once

iptables -t mangle -N QOS_MARK_eth0

iptables -t mangle -A QOS_MARK_eth0 -m set --match-set Bulk4  dst -j DSCP \
--set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
#add more rules similar to above as required
#save the DSCP into conntrack mark & set the 'marked' bit
iptables -t mangle -A QOS_MARK_eth0 -t mangle -j CONNMARK --savedscp-mark 0x01000000/0xfc000000

# send unmarked packets to the marking chain 
iptables -t mangle -A POSTROUTING -o eth0 -m connmark --mark 0x00000000/0x01000000 -g QOS_MARK_eth0



I am not a full time programmer in any language, xt_connmark &
tc/conndscp represent something of the order of a 3 week struggle, my C
is awful, kernel & network knowledge worse, though I like to think
improving.  There are no doubt issues with this patch/feature but I hope
constructive feedback, quite possibly in very short words for my simple
brain, will knock it into shape.

More importantly is the 'connmark --savedscp-mark' concept fatally
flawed in some way that it cannot be made acceptable?

Your time, comments & guidance appreciated.


Kevin Darbyshire-Bryant (1):
  netfilter: connmark: introduce savedscp

 include/uapi/linux/netfilter/xt_connmark.h |  3 ++-
 net/netfilter/xt_connmark.c                | 30 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

-- 
2.17.2 (Apple Git-113)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/1] netfilter: connmark: introduce savedscp
  2019-03-24 14:23 [RFC PATCH 0/1] netfilter: xt_connmark: add savedscp-mark action Kevin 'ldir' Darbyshire-Bryant
@ 2019-03-24 14:23 ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-08 22:39   ` Pablo Neira Ayuso
  2019-12-03 16:06 ` [PATCH 0/1] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
  1 sibling, 1 reply; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-24 14:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Kevin 'ldir' Darbyshire-Bryant

savedscp is a method of storing the DSCP of an ip packet into conntrack
mark.  In combination with a suitable tc filter action (conndscp but may
end up being integrated into connmark) DSCP values are able to be stored
on egress and restored on ingress across links that otherwise alter or
bleach DSCP.

This is useful for qdiscs such as CAKE which are able to shape according
to policies based on DSCP.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

The ingress problem is solved by the tc filter, but the tc people didn't
like the idea of tc setting conntrack mark values, though they are ok
with reading conntrack values and hence restoring DSCP from conntrack
marks.

x_tables CONNMARK with the new savedscp action solves the problem of
storing the DSCP to the conntrack mark.

It accepts 2 parameters.  The mark is a 32bit value with usually one
bit set.  This bit is set when savedscp saves the DSCP to the mark.
This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A mark
of zero disables the setting of a status bit/s.

The mask is a 32bit value of at least 6 contiguous bits and represents
the area where the DSCP will be stored.

e.g.

iptables -A QOS_MARK_eth0 -t mangle -j CONNMARK --savedscp-mark 0x01000000/0xfc000000

Would store the DSCP in the top 6 bits of the 32bit mark field, and use
the LSB of the top byte as the 'DSCP has been stored' marker.

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 include/uapi/linux/netfilter/xt_connmark.h |  3 ++-
 net/netfilter/xt_connmark.c                | 30 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netfilter/xt_connmark.h b/include/uapi/linux/netfilter/xt_connmark.h
index 1aa5c955ee1e..24272cac2d37 100644
--- a/include/uapi/linux/netfilter/xt_connmark.h
+++ b/include/uapi/linux/netfilter/xt_connmark.h
@@ -16,7 +16,8 @@
 enum {
 	XT_CONNMARK_SET = 0,
 	XT_CONNMARK_SAVE,
-	XT_CONNMARK_RESTORE
+	XT_CONNMARK_RESTORE,
+	XT_CONNMARK_SAVEDSCP
 };
 
 enum {
diff --git a/net/netfilter/xt_connmark.c b/net/netfilter/xt_connmark.c
index 29c38aa7f726..de8dc707ac5b 100644
--- a/net/netfilter/xt_connmark.c
+++ b/net/netfilter/xt_connmark.c
@@ -42,6 +42,7 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
 	u_int32_t new_targetmark;
 	struct nf_conn *ct;
 	u_int32_t newmark;
+	u8 dscp;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if (ct == NULL)
@@ -74,6 +75,34 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
 			nf_conntrack_event_cache(IPCT_MARK, ct);
 		}
 		break;
+	case XT_CONNMARK_SAVEDSCP:
+		if (!info->ctmask)
+			goto out;
+
+		if (skb->protocol == htons(ETH_P_IP)) {
+			if (skb->len < sizeof(struct iphdr))
+				goto out;
+
+			dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+
+		} else if (skb->protocol == htons(ETH_P_IPV6)) {
+			if (skb->len < sizeof(struct ipv6hdr))
+				goto out;
+
+			dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+
+		} else { /* protocol doesn't have diffserv - get out! */
+			goto out;
+		}
+
+		newmark = (ct->mark & ~info->ctmask) ^
+			  (info->ctmark | (dscp << info->shift_bits));
+
+		if (ct->mark != newmark) {
+			ct->mark = newmark;
+			nf_conntrack_event_cache(IPCT_MARK, ct);
+		}
+		break;
 	case XT_CONNMARK_RESTORE:
 		new_targetmark = (ct->mark & info->ctmask);
 		if (info->shift_dir == D_SHIFT_RIGHT)
@@ -86,6 +115,7 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
 		skb->mark = newmark;
 		break;
 	}
+out:
 	return XT_CONTINUE;
 }
 
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] netfilter: connmark: introduce savedscp
  2019-03-24 14:23 ` [PATCH 1/1] netfilter: connmark: introduce savedscp Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-08 22:39   ` Pablo Neira Ayuso
  2019-04-08 23:16     ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-04-08 22:39 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: netfilter-devel

Hi Kevin,

On Sun, Mar 24, 2019 at 02:23:45PM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
> savedscp is a method of storing the DSCP of an ip packet into conntrack
> mark.  In combination with a suitable tc filter action (conndscp but may
> end up being integrated into connmark) DSCP values are able to be stored
> on egress and restored on ingress across links that otherwise alter or
> bleach DSCP.
> 
> This is useful for qdiscs such as CAKE which are able to shape according
> to policies based on DSCP.
> 
> Ingress classification is traditionally a challenging task since
> iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
> lookups, hence are unable to see internal IPv4 addresses as used on the
> typical home masquerading gateway.
> 
> The ingress problem is solved by the tc filter, but the tc people didn't
> like the idea of tc setting conntrack mark values, though they are ok
> with reading conntrack values and hence restoring DSCP from conntrack
> marks.
> 
> x_tables CONNMARK with the new savedscp action solves the problem of
> storing the DSCP to the conntrack mark.
> 
> It accepts 2 parameters.  The mark is a 32bit value with usually one
> bit set.  This bit is set when savedscp saves the DSCP to the mark.
> This is useful to implement a 'one shot'
> iptables based classification where the 'complicated' iptables rules are
> only run once to classify the connection on initial (egress) packet and
> subsequent packets are all marked/restored with the same DSCP.  A mark
> of zero disables the setting of a status bit/s.
> 
> The mask is a 32bit value of at least 6 contiguous bits and represents
> the area where the DSCP will be stored.
> 
> e.g.
> 
> iptables -A QOS_MARK_eth0 -t mangle -j CONNMARK --savedscp-mark 0x01000000/0xfc000000
> 
> Would store the DSCP in the top 6 bits of the 32bit mark field, and use
> the LSB of the top byte as the 'DSCP has been stored' marker.

I'd prefer we explore how to express this in nftables, I think the
main problem here is that we do not support for datatype compatibility
/ casting yet, hence dscp cannot be assign to the mark. But we only
need to update userspace code to achieve this.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] netfilter: connmark: introduce savedscp
  2019-04-08 22:39   ` Pablo Neira Ayuso
@ 2019-04-08 23:16     ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-09 14:23       ` [RFC nf-next v2 0/2] xt_connmark: add savedscp-mark action Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-08 23:16 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel



> On 8 Apr 2019, at 23:39, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> Hi Kevin,
> 
> On Sun, Mar 24, 2019 at 02:23:45PM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
>> savedscp is a method of storing the DSCP of an ip packet into conntrack
>> mark.  In combination with a suitable tc filter action (conndscp but may
>> end up being integrated into connmark) DSCP values are able to be stored
>> on egress and restored on ingress across links that otherwise alter or
>> bleach DSCP.
>> 
>> This is useful for qdiscs such as CAKE which are able to shape according
>> to policies based on DSCP.
>> 
>> Ingress classification is traditionally a challenging task since
>> iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
>> lookups, hence are unable to see internal IPv4 addresses as used on the
>> typical home masquerading gateway.
>> 
>> The ingress problem is solved by the tc filter, but the tc people didn't
>> like the idea of tc setting conntrack mark values, though they are ok
>> with reading conntrack values and hence restoring DSCP from conntrack
>> marks.
>> 
>> x_tables CONNMARK with the new savedscp action solves the problem of
>> storing the DSCP to the conntrack mark.
>> 
>> It accepts 2 parameters.  The mark is a 32bit value with usually one
>> bit set.  This bit is set when savedscp saves the DSCP to the mark.
>> This is useful to implement a 'one shot'
>> iptables based classification where the 'complicated' iptables rules are
>> only run once to classify the connection on initial (egress) packet and
>> subsequent packets are all marked/restored with the same DSCP.  A mark
>> of zero disables the setting of a status bit/s.
>> 
>> The mask is a 32bit value of at least 6 contiguous bits and represents
>> the area where the DSCP will be stored.
>> 
>> e.g.
>> 
>> iptables -A QOS_MARK_eth0 -t mangle -j CONNMARK --savedscp-mark 0x01000000/0xfc000000
>> 
>> Would store the DSCP in the top 6 bits of the 32bit mark field, and use
>> the LSB of the top byte as the 'DSCP has been stored' marker.
> 
> I'd prefer we explore how to express this in nftables, I think the
> main problem here is that we do not support for datatype compatibility
> / casting yet, hence dscp cannot be assign to the mark. But we only
> need to update userspace code to achieve this.
> 
> Thanks.

Hi Pablo,

Thanks for getting back to me.  I know nothing of nftables so this will be
‘interesting’.  I have a v2 of the patch with a very minor tweak that I will
send tomorrow and I also have a userspace ‘iptables’ patch that shows the
user side of things if it will help.

For avoidance of doubt, the two values passed by —savedscp-mark are both
32 bit bit masks.  One must be a contiguous 6 bit field and specifies where
in the connmark field to store that packet’s DSCP value.  It does not specify
the DSCP value itself.

The second value is again (usually) a single bit mask and specifies a bit in
the connmark field that is SET by the —savedscp-mark action.

nftables userspace must already have a method of passing 32bit bitmasks?

e.g. a ‘dscpmask’ of 0xfc000000 specifies to store the packet’s dscp field in
the top 6 bits of the conntrack connmark:

———0xFC—————————0000000-—
|TOP 6 bits | ~~~~~~~~~~~~~
———————————————————

    ^
    | shifted as required
    ————
        |
-diffserv field -
| 6 bits DSCP   |
-----------------

Hopefully I have not misunderstood your points?!


Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC nf-next v2 0/2] xt_connmark: add savedscp-mark action
  2019-04-08 23:16     ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-09 14:23       ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-09 14:23         ` [RFC nf-next v2 1/2] netfilter: connmark: introduce savedscp Kevin 'ldir' Darbyshire-Bryant
  2019-04-09 14:23         ` [RFC nf-next 2/2] iptables: connmark - add savedscp option Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 2 replies; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-09 14:23 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: netfilter-devel, pablo

Hi Pablo,

As promised here's the v2 patch along with a patch to iptables user
space land to make it work.

Cheers,

Kevin

Kevin Darbyshire-Bryant (1):
  netfilter: connmark: introduce savedscp

 include/uapi/linux/netfilter/xt_connmark.h |  3 ++-
 net/netfilter/xt_connmark.c                | 30 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC nf-next v2 1/2] netfilter: connmark: introduce savedscp
  2019-04-09 14:23       ` [RFC nf-next v2 0/2] xt_connmark: add savedscp-mark action Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-09 14:23         ` Kevin 'ldir' Darbyshire-Bryant
  2019-04-30 12:29           ` Pablo Neira Ayuso
  2019-04-09 14:23         ` [RFC nf-next 2/2] iptables: connmark - add savedscp option Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 1 reply; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-09 14:23 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: netfilter-devel, pablo

savedscp is a method of storing the DSCP of an ip packet into conntrack
mark.  In combination with a suitable tc filter action (act_ctinfo) DSCP
values are able to be stored in the mark on egress and restored on
ingress across links that otherwise alter or bleach DSCP.

This is useful for qdiscs such as CAKE which are able to shape according
to policies based on DSCP.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

x_tables CONNMARK savedscp action solves the problem of storing the DSCP
to the conntrack mark in a way suitable for the new act_ctinfo tc action
to restore.

The savedsp option accepts 2 parameters, a 32bit 'dscpmask' and a 32bit
'statemask'.  The dscp mask must be a minimum of 6 contiguous bits and
represents the area where the DSCP will be stored in the connmark.  The
state mask is a minimum 1 bit length mask that must not overlap with the
dscpmask.  It represents a flag which is set when the DSCP has been
stored in the conntrack mark. This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A state
mask of zero disables the setting of a status bit/s.

example syntax with a suitably modified iptables user space application:

iptables -A QOS_MARK_eth0 -t mangle -j CONNMARK --savedscp-mark 0xfc000000/0x01000000

Would store the DSCP in the top 6 bits of the 32bit mark field, and use
the LSB of the top byte as the 'DSCP has been stored' marker.

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      ^                   ^
      |                   |
      ---|             Conditional flag
         |             set this when dscp
|-ip diffserv-|        stored in mark
| 6 bits      |
|-------------|

an identically configured tc action to restore looks like:

tc filter show dev eth0 ingress
filter parent ffff: protocol all pref 10 u32 chain 0
filter parent ffff: protocol all pref 10 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 10 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 not_in_hw
  match 00000000/00000000 at 0
	action order 1: ctinfo zone 0 pipe
	 index 2 ref 1 bind 1 dscp 0xfc000000/0x1000000

	action order 2: mirred (Egress Redirect to device ifb4eth0) stolen
	index 1 ref 1 bind 1

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
v2 - reword commit message
     swap the dscpmask/statemask order so it matches the tc action order
     this also matches the value[/mask] convention in that dscpmask is
     non optional whereas the statemask is optional.

 include/uapi/linux/netfilter/xt_connmark.h |  3 ++-
 net/netfilter/xt_connmark.c                | 30 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netfilter/xt_connmark.h b/include/uapi/linux/netfilter/xt_connmark.h
index 1aa5c955ee1e..24272cac2d37 100644
--- a/include/uapi/linux/netfilter/xt_connmark.h
+++ b/include/uapi/linux/netfilter/xt_connmark.h
@@ -16,7 +16,8 @@
 enum {
 	XT_CONNMARK_SET = 0,
 	XT_CONNMARK_SAVE,
-	XT_CONNMARK_RESTORE
+	XT_CONNMARK_RESTORE,
+	XT_CONNMARK_SAVEDSCP
 };
 
 enum {
diff --git a/net/netfilter/xt_connmark.c b/net/netfilter/xt_connmark.c
index 29c38aa7f726..6c63cf476342 100644
--- a/net/netfilter/xt_connmark.c
+++ b/net/netfilter/xt_connmark.c
@@ -42,6 +42,7 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
 	u_int32_t new_targetmark;
 	struct nf_conn *ct;
 	u_int32_t newmark;
+	u8 dscp;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if (ct == NULL)
@@ -74,6 +75,34 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
 			nf_conntrack_event_cache(IPCT_MARK, ct);
 		}
 		break;
+	case XT_CONNMARK_SAVEDSCP:
+		if (!info->ctmark)
+			goto out;
+
+		if (skb->protocol == htons(ETH_P_IP)) {
+			if (skb->len < sizeof(struct iphdr))
+				goto out;
+
+			dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+
+		} else if (skb->protocol == htons(ETH_P_IPV6)) {
+			if (skb->len < sizeof(struct ipv6hdr))
+				goto out;
+
+			dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+
+		} else { /* protocol doesn't have diffserv - get out! */
+			goto out;
+		}
+
+		newmark = (ct->mark & ~info->ctmark) ^
+			  (info->ctmask | (dscp << info->shift_bits));
+
+		if (ct->mark != newmark) {
+			ct->mark = newmark;
+			nf_conntrack_event_cache(IPCT_MARK, ct);
+		}
+		break;
 	case XT_CONNMARK_RESTORE:
 		new_targetmark = (ct->mark & info->ctmask);
 		if (info->shift_dir == D_SHIFT_RIGHT)
@@ -86,6 +115,7 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
 		skb->mark = newmark;
 		break;
 	}
+out:
 	return XT_CONTINUE;
 }
 
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC nf-next 2/2] iptables: connmark - add savedscp option
  2019-04-09 14:23       ` [RFC nf-next v2 0/2] xt_connmark: add savedscp-mark action Kevin 'ldir' Darbyshire-Bryant
  2019-04-09 14:23         ` [RFC nf-next v2 1/2] netfilter: connmark: introduce savedscp Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-09 14:23         ` Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 0 replies; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-09 14:23 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: netfilter-devel, pablo

Naive user space front end to xt_connmark 'savedscp' option.

e.g.

iptables -A QOS_MARK_eth0 -t mangle -j CONNMARK --savedscp-mark 0xfc000000/0x01000000

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 extensions/libxt_CONNMARK.c           | 68 ++++++++++++++++++++++++++-
 include/linux/netfilter/xt_connmark.h |  3 +-
 2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/extensions/libxt_CONNMARK.c b/extensions/libxt_CONNMARK.c
index 21e10913..2dbd7a4a 100644
--- a/extensions/libxt_CONNMARK.c
+++ b/extensions/libxt_CONNMARK.c
@@ -49,6 +49,7 @@ enum {
 	O_CTMASK,
 	O_NFMASK,
 	O_MASK,
+	O_SAVEDSCP_MARK,
 	F_SET_MARK         = 1 << O_SET_MARK,
 	F_SAVE_MARK        = 1 << O_SAVE_MARK,
 	F_RESTORE_MARK     = 1 << O_RESTORE_MARK,
@@ -61,8 +62,10 @@ enum {
 	F_CTMASK           = 1 << O_CTMASK,
 	F_NFMASK           = 1 << O_NFMASK,
 	F_MASK             = 1 << O_MASK,
+	F_SAVEDSCP_MARK	   = 1 << O_SAVEDSCP_MARK,
 	F_OP_ANY           = F_SET_MARK | F_SAVE_MARK | F_RESTORE_MARK |
-	                     F_AND_MARK | F_OR_MARK | F_XOR_MARK | F_SET_XMARK,
+	                     F_AND_MARK | F_OR_MARK | F_XOR_MARK | F_SET_XMARK |
+			     F_SAVEDSCP_MARK,
 };
 
 static const char *const xt_connmark_shift_ops[] = {
@@ -75,6 +78,7 @@ static void CONNMARK_help(void)
 	printf(
 "CONNMARK target options:\n"
 "  --set-mark value[/mask]       Set conntrack mark value\n"
+"  --savedscp-mark dscpmask/statemask    Save DSCP to conntrack mark value\n"
 "  --save-mark [--mask mask]     Save the packet nfmark in the connection\n"
 "  --restore-mark [--mask mask]  Restore saved nfmark value\n");
 }
@@ -83,6 +87,8 @@ static void CONNMARK_help(void)
 static const struct xt_option_entry CONNMARK_opts[] = {
 	{.name = "set-mark", .id = O_SET_MARK, .type = XTTYPE_MARKMASK32,
 	 .excl = F_OP_ANY},
+	{.name = "savedscp-mark", .id = O_SAVEDSCP_MARK, .type = XTTYPE_MARKMASK32,
+	 .excl = F_OP_ANY},
 	{.name = "save-mark", .id = O_SAVE_MARK, .type = XTTYPE_NONE,
 	 .excl = F_OP_ANY},
 	{.name = "restore-mark", .id = O_RESTORE_MARK, .type = XTTYPE_NONE,
@@ -98,6 +104,8 @@ static const struct xt_option_entry connmark_tg_opts[] = {
 	 .excl = F_OP_ANY},
 	{.name = "set-mark", .id = O_SET_MARK, .type = XTTYPE_MARKMASK32,
 	 .excl = F_OP_ANY},
+	{.name = "savedscp-mark", .id = O_SAVEDSCP_MARK, .type = XTTYPE_MARKMASK32,
+	 .excl = F_OP_ANY},
 	{.name = "and-mark", .id = O_AND_MARK, .type = XTTYPE_UINT32,
 	 .excl = F_OP_ANY},
 	{.name = "or-mark", .id = O_OR_MARK, .type = XTTYPE_UINT32,
@@ -124,6 +132,8 @@ static const struct xt_option_entry connmark_tg_opts_v2[] = {
 	 .excl = F_OP_ANY},
 	{.name = "set-mark", .id = O_SET_MARK, .type = XTTYPE_MARKMASK32,
 	 .excl = F_OP_ANY},
+	{.name = "savedscp-mark", .id = O_SAVEDSCP_MARK, .type = XTTYPE_MARKMASK32,
+	 .excl = F_OP_ANY},
 	{.name = "and-mark", .id = O_AND_MARK, .type = XTTYPE_UINT32,
 	 .excl = F_OP_ANY},
 	{.name = "or-mark", .id = O_OR_MARK, .type = XTTYPE_UINT32,
@@ -158,6 +168,7 @@ static void connmark_tg_help(void)
 "  --restore-mark [--ctmask mask] [--nfmask mask]\n"
 "                                Copy nfmark to ctmark using masks\n"
 "  --set-mark value[/mask]       Set conntrack mark value\n"
+"  --savedscp-mark value/mask    Save DSCP to conntrack mark value\n"
 "  --save-mark [--mask mask]     Save the packet nfmark in the connection\n"
 "  --restore-mark [--mask mask]  Restore saved nfmark value\n"
 "  --and-mark value              Binary AND the ctmark with bits\n"
@@ -210,6 +221,11 @@ static void CONNMARK_parse(struct xt_option_call *cb)
 		markinfo->mark = cb->val.mark;
 		markinfo->mask = cb->val.mask;
 		break;
+	case O_SAVEDSCP_MARK:
+		markinfo->mode = XT_CONNMARK_SAVEDSCP;
+		markinfo->mark = cb->val.mark;
+		markinfo->mask = cb->val.mask;
+		break;
 	case O_SAVE_MARK:
 		markinfo->mode = XT_CONNMARK_SAVE;
 		break;
@@ -238,6 +254,19 @@ static void connmark_tg_parse(struct xt_option_call *cb)
 		info->ctmark = cb->val.mark;
 		info->ctmask = cb->val.mark | cb->val.mask;
 		break;
+	case O_SAVEDSCP_MARK:
+		info->mode   = XT_CONNMARK_SAVEDSCP;
+		info->ctmark = cb->val.mark;
+		info->ctmask = cb->val.mask;
+		info->nfmask = info->ctmark ? ffs(info->ctmark) - 1 : 0;
+		/* need 6 contiguous bits */
+		if ((0x3f & (info->ctmark >> info->nfmask)) != 0x3f)
+			xtables_error(PARAMETER_PROBLEM,
+				"CONNMARK savedscp: insufficient contiguous dscpmask bits");
+		if (info->ctmark & info->ctmask)
+			xtables_error(PARAMETER_PROBLEM,
+				"CONNMARK savedscp: dscpmask/statemask bits overlap");
+		break;
 	case O_AND_MARK:
 		info->mode   = XT_CONNMARK_SET;
 		info->ctmark = 0;
@@ -283,6 +312,19 @@ static void connmark_tg_parse_v2(struct xt_option_call *cb)
 		info->ctmark = cb->val.mark;
 		info->ctmask = cb->val.mark | cb->val.mask;
 		break;
+	case O_SAVEDSCP_MARK:
+		info->mode   = XT_CONNMARK_SAVEDSCP;
+		info->ctmark = cb->val.mark;
+		info->ctmask = cb->val.mask;
+		info->shift_bits = info->ctmark ? ffs(info->ctmark) - 1 : 0;
+		/* need 6 contiguous bits */
+		if ((0x3f & (info->ctmark >> info->shift_bits)) != 0x3f)
+			xtables_error(PARAMETER_PROBLEM,
+				"CONNMARK savedscp: insufficient contiguous dscpmask bits");
+		if (info->ctmark & info->ctmask)
+			xtables_error(PARAMETER_PROBLEM,
+				"CONNMARK savedscp: dscpmask/statemask bits overlap");
+		break;
 	case O_AND_MARK:
 		info->mode   = XT_CONNMARK_SET;
 		info->ctmark = 0;
@@ -351,6 +393,11 @@ static void CONNMARK_print(const void *ip,
 	    print_mark(markinfo->mark);
 	    print_mask("/", markinfo->mask);
 	    break;
+	case XT_CONNMARK_SAVEDSCP:
+	    printf(" CONNMARK savedscp ");
+	    print_mark(markinfo->mark);
+	    print_mask("/", markinfo->mask);
+	    break;
 	case XT_CONNMARK_SAVE:
 	    printf(" CONNMARK save ");
 	    print_mask("mask ", markinfo->mask);
@@ -386,6 +433,10 @@ connmark_tg_print(const void *ip, const struct xt_entry_target *target,
 			printf(" CONNMARK xset 0x%x/0x%x",
 			       info->ctmark, info->ctmask);
 		break;
+	case XT_CONNMARK_SAVEDSCP:
+		printf(" CONNMARK DSCP set 0x%x/0x%x",
+		       info->ctmark, info->ctmask);
+		break;
 	case XT_CONNMARK_SAVE:
 		if (info->nfmask == UINT32_MAX && info->ctmask == UINT32_MAX)
 			printf(" CONNMARK save");
@@ -433,6 +484,10 @@ connmark_tg_print_v2(const void *ip, const struct xt_entry_target *target,
 			printf(" CONNMARK xset 0x%x/0x%x",
 			       info->ctmark, info->ctmask);
 		break;
+	case XT_CONNMARK_SAVEDSCP:
+		printf(" CONNMARK DSCP xset 0x%x/0x%x",
+		       info->ctmark, info->ctmask);
+		break;
 	case XT_CONNMARK_SAVE:
 		if (info->nfmask == UINT32_MAX && info->ctmask == UINT32_MAX)
 			printf(" CONNMARK save");
@@ -474,6 +529,11 @@ static void CONNMARK_save(const void *ip, const struct xt_entry_target *target)
 	    print_mark(markinfo->mark);
 	    print_mask("/", markinfo->mask);
 	    break;
+	case XT_CONNMARK_SAVEDSCP:
+	    printf(" --savedscp-mark ");
+	    print_mark(markinfo->mark);
+	    print_mask("/", markinfo->mask);
+	    break;
 	case XT_CONNMARK_SAVE:
 	    printf(" --save-mark ");
 	    print_mask("--mask ", markinfo->mask);
@@ -505,6 +565,9 @@ connmark_tg_save(const void *ip, const struct xt_entry_target *target)
 	case XT_CONNMARK_SET:
 		printf(" --set-xmark 0x%x/0x%x", info->ctmark, info->ctmask);
 		break;
+	case XT_CONNMARK_SAVEDSCP:
+		printf(" --savedscp-mark 0x%x/0x%x", info->ctmark, info->ctmask);
+		break;
 	case XT_CONNMARK_SAVE:
 		printf(" --save-mark --nfmask 0x%x --ctmask 0x%x",
 		       info->nfmask, info->ctmask);
@@ -529,6 +592,9 @@ connmark_tg_save_v2(const void *ip, const struct xt_entry_target *target)
 	case XT_CONNMARK_SET:
 		printf(" --set-xmark 0x%x/0x%x", info->ctmark, info->ctmask);
 		break;
+	case XT_CONNMARK_SAVEDSCP:
+		printf(" --savedscp-mark 0x%x/0x%x", info->ctmark, info->ctmask);
+		break;
 	case XT_CONNMARK_SAVE:
 		printf(" --save-mark --nfmask 0x%x --ctmask 0x%x",
 		       info->nfmask, info->ctmask);
diff --git a/include/linux/netfilter/xt_connmark.h b/include/linux/netfilter/xt_connmark.h
index bbf2acc9..cf526101 100644
--- a/include/linux/netfilter/xt_connmark.h
+++ b/include/linux/netfilter/xt_connmark.h
@@ -15,7 +15,8 @@
 enum {
 	XT_CONNMARK_SET = 0,
 	XT_CONNMARK_SAVE,
-	XT_CONNMARK_RESTORE
+	XT_CONNMARK_RESTORE,
+	XT_CONNMARK_SAVEDSCP
 };
 
 struct xt_connmark_tginfo1 {
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC nf-next v2 1/2] netfilter: connmark: introduce savedscp
  2019-04-09 14:23         ` [RFC nf-next v2 1/2] netfilter: connmark: introduce savedscp Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-30 12:29           ` Pablo Neira Ayuso
  2019-04-30 20:40             ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 26+ messages in thread
From: Pablo Neira Ayuso @ 2019-04-30 12:29 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: netfilter-devel

On Tue, Apr 09, 2019 at 02:23:46PM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
> diff --git a/include/uapi/linux/netfilter/xt_connmark.h b/include/uapi/linux/netfilter/xt_connmark.h
> index 1aa5c955ee1e..24272cac2d37 100644
> --- a/include/uapi/linux/netfilter/xt_connmark.h
> +++ b/include/uapi/linux/netfilter/xt_connmark.h
> @@ -16,7 +16,8 @@
>  enum {
>  	XT_CONNMARK_SET = 0,
>  	XT_CONNMARK_SAVE,
> -	XT_CONNMARK_RESTORE
> +	XT_CONNMARK_RESTORE,
> +	XT_CONNMARK_SAVEDSCP

I'd prefer you implement this in nftables, more comments below.

>  };
>  
>  enum {
> diff --git a/net/netfilter/xt_connmark.c b/net/netfilter/xt_connmark.c
> index 29c38aa7f726..6c63cf476342 100644
> --- a/net/netfilter/xt_connmark.c
> +++ b/net/netfilter/xt_connmark.c
> @@ -42,6 +42,7 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
>  	u_int32_t new_targetmark;
>  	struct nf_conn *ct;
>  	u_int32_t newmark;
> +	u8 dscp;
>  
>  	ct = nf_ct_get(skb, &ctinfo);
>  	if (ct == NULL)
> @@ -74,6 +75,34 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
>  			nf_conntrack_event_cache(IPCT_MARK, ct);
>  		}
>  		break;
> +	case XT_CONNMARK_SAVEDSCP:

Could you add a new revision and a new flag field for this? so it just
adds the dscp to the mark as you need.

> +		if (!info->ctmark)
> +			goto out;
> +
> +		if (skb->protocol == htons(ETH_P_IP)) {
> +			if (skb->len < sizeof(struct iphdr))
> +				goto out;
> +
> +			dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> +
> +		} else if (skb->protocol == htons(ETH_P_IPV6)) {
> +			if (skb->len < sizeof(struct ipv6hdr))
> +				goto out;

This is already guaranteed to have a valid IP header in place, no need
for this check.

> +
> +			dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> +
> +		} else { /* protocol doesn't have diffserv - get out! */
> +			goto out;
> +		}
> +
> +		newmark = (ct->mark & ~info->ctmark) ^
> +			  (info->ctmask | (dscp << info->shift_bits));
> +
> +		if (ct->mark != newmark) {
> +			ct->mark = newmark;
> +			nf_conntrack_event_cache(IPCT_MARK, ct);
> +		}
> +		break;
>  	case XT_CONNMARK_RESTORE:
>  		new_targetmark = (ct->mark & info->ctmask);
>  		if (info->shift_dir == D_SHIFT_RIGHT)
> @@ -86,6 +115,7 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
>  		skb->mark = newmark;
>  		break;
>  	}
> +out:
>  	return XT_CONTINUE;
>  }
>  
> -- 
> 2.20.1 (Apple Git-117)
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC nf-next v2 1/2] netfilter: connmark: introduce savedscp
  2019-04-30 12:29           ` Pablo Neira Ayuso
@ 2019-04-30 20:40             ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 0 replies; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-30 20:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Hi Pablo,

Thanks for review.  Some answers inline

> On 30 Apr 2019, at 13:29, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> On Tue, Apr 09, 2019 at 02:23:46PM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
>> diff --git a/include/uapi/linux/netfilter/xt_connmark.h b/include/uapi/linux/netfilter/xt_connmark.h
>> index 1aa5c955ee1e..24272cac2d37 100644
>> --- a/include/uapi/linux/netfilter/xt_connmark.h
>> +++ b/include/uapi/linux/netfilter/xt_connmark.h
>> @@ -16,7 +16,8 @@
>> enum {
>> 	XT_CONNMARK_SET = 0,
>> 	XT_CONNMARK_SAVE,
>> -	XT_CONNMARK_RESTORE
>> +	XT_CONNMARK_RESTORE,
>> +	XT_CONNMARK_SAVEDSCP
> 
> I'd prefer you implement this in nftables, more comments below.

I will look into this. nftables is new to me.

> 
>> };
>> 
>> enum {
>> diff --git a/net/netfilter/xt_connmark.c b/net/netfilter/xt_connmark.c
>> index 29c38aa7f726..6c63cf476342 100644
>> --- a/net/netfilter/xt_connmark.c
>> +++ b/net/netfilter/xt_connmark.c
>> @@ -42,6 +42,7 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
>> 	u_int32_t new_targetmark;
>> 	struct nf_conn *ct;
>> 	u_int32_t newmark;
>> +	u8 dscp;
>> 
>> 	ct = nf_ct_get(skb, &ctinfo);
>> 	if (ct == NULL)
>> @@ -74,6 +75,34 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
>> 			nf_conntrack_event_cache(IPCT_MARK, ct);
>> 		}
>> 		break;
>> +	case XT_CONNMARK_SAVEDSCP:
> 
> Could you add a new revision and a new flag field for this? so it just
> adds the dscp to the mark as you need.

Not sure I understand this.  Do you mean make it part of XT_CONNMARK_SAVE
and have something like ‘info->dscpmask’?  If (!info->dscpmask) do dscp
stuff else do the original routine?

> 
>> +		if (!info->ctmark)
>> +			goto out;
>> +
>> +		if (skb->protocol == htons(ETH_P_IP)) {
>> +			if (skb->len < sizeof(struct iphdr))
>> +				goto out;
>> +
>> +			dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
>> +
>> +		} else if (skb->protocol == htons(ETH_P_IPV6)) {
>> +			if (skb->len < sizeof(struct ipv6hdr))
>> +				goto out;
> 
> This is already guaranteed to have a valid IP header in place, no need
> for this check.

Ok, thanks - removing code is easy (and faster) :-)

Once again thanks for your review & time.

> 
>> +
>> +			dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
>> +
>> +		} else { /* protocol doesn't have diffserv - get out! */
>> +			goto out;
>> +		}
>> +
>> +		newmark = (ct->mark & ~info->ctmark) ^
>> +			  (info->ctmask | (dscp << info->shift_bits));
>> +
>> +		if (ct->mark != newmark) {
>> +			ct->mark = newmark;
>> +			nf_conntrack_event_cache(IPCT_MARK, ct);
>> +		}
>> +		break;
>> 	case XT_CONNMARK_RESTORE:
>> 		new_targetmark = (ct->mark & info->ctmask);
>> 		if (info->shift_dir == D_SHIFT_RIGHT)
>> @@ -86,6 +115,7 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
>> 		skb->mark = newmark;
>> 		break;
>> 	}
>> +out:
>> 	return XT_CONTINUE;
>> }
>> 
>> -- 
>> 2.20.1 (Apple Git-117)
>> 


Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/1] netfilter: connmark: introduce set-dscpmark
@ 2019-12-03 16:06 ` Kevin Darbyshire-Bryant
  2019-12-03 16:06   ` [PATCH 1/1] " Kevin Darbyshire-Bryant
                     ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-12-03 16:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Kevin Darbyshire-Bryant

Greetings.  The following patch is similar to one I submitted as an RFC
quite a while back (April).  Since then I've realised that the option
should have been in the 'set mark' family as opposed to 'save mark'
because 'set' is about setting the ct mark directly, whereas 'save' is
about copying a packet's mark to the ct mark.

Similarly I've been made aware of the revision infrastructure and now
that I understand that a little more have made use of it for this
change.  Hopefully this addresses one of Pablo's concerns.

I've not been able to address the 'I'd like an nftables version'.  Quite
simply it is beyond my knowledge and ability.  I am willing to
contribute financially if someone wishes to step up to the nftables
plate...yes I'd like to see the functionality implemented *that* much.

Kevin Darbyshire-Bryant (1):
  netfilter: connmark: introduce set-dscpmark

 include/uapi/linux/netfilter/xt_connmark.h | 10 ++++
 net/netfilter/xt_connmark.c                | 57 ++++++++++++++++++----
 2 files changed, 58 insertions(+), 9 deletions(-)

-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/1] netfilter: connmark: introduce set-dscpmark
  2019-12-03 16:06 ` [PATCH 0/1] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
@ 2019-12-03 16:06   ` Kevin Darbyshire-Bryant
  2019-12-09 23:57     ` Kevin 'ldir' Darbyshire-Bryant
  2019-12-05  8:56   ` [PATCH 0/1] " Jeremy Sowden
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-12-03 16:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Kevin Darbyshire-Bryant

set-dscpmark is a method of storing the DSCP of an ip packet into
conntrack mark.  In combination with a suitable tc filter action
(act_ctinfo) DSCP values are able to be stored in the mark on egress and
restored on ingress across links that otherwise alter or bleach DSCP.

This is useful for qdiscs such as CAKE which are able to shape according
to policies based on DSCP.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

x_tables CONNMARK set-dscpmark target solves the problem of storing the
DSCP to the conntrack mark in a way suitable for the new act_ctinfo tc
action to restore.

The set-dscpmark option accepts 2 parameters, a 32bit 'dscpmask' and a
32bit 'statemask'.  The dscp mask must be 6 contiguous bits and
represents the area where the DSCP will be stored in the connmark.  The
state mask is a minimum 1 bit length mask that must not overlap with the
dscpmask.  It represents a flag which is set when the DSCP has been
stored in the conntrack mark. This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A state
mask of zero disables the setting of a status bit/s.

example syntax with a suitably modified iptables user space application:

iptables -A QOS_MARK_eth0 -t mangle -j CONNMARK --set-dscpmark 0xfc000000/0x01000000

Would store the DSCP in the top 6 bits of the 32bit mark field, and use
the LSB of the top byte as the 'DSCP has been stored' marker.

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      ^                   ^
      |                   |
      ---|             Conditional flag
         |             set this when dscp
|-ip diffserv-|        stored in mark
| 6 bits      |
|-------------|

an identically configured tc action to restore looks like:

tc filter show dev eth0 ingress
filter parent ffff: protocol all pref 10 u32 chain 0
filter parent ffff: protocol all pref 10 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 10 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1: not_in_hw
  match 00000000/00000000 at 0
	action order 1: ctinfo zone 0 pipe
	 index 2 ref 1 bind 1 dscp 0xfc000000/0x1000000

	action order 2: mirred (Egress Redirect to device ifb4eth0) stolen
	index 1 ref 1 bind 1

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 include/uapi/linux/netfilter/xt_connmark.h | 10 ++++
 net/netfilter/xt_connmark.c                | 57 ++++++++++++++++++----
 2 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/netfilter/xt_connmark.h b/include/uapi/linux/netfilter/xt_connmark.h
index 1aa5c955ee1e..a1585e7ecbe1 100644
--- a/include/uapi/linux/netfilter/xt_connmark.h
+++ b/include/uapi/linux/netfilter/xt_connmark.h
@@ -19,6 +19,11 @@ enum {
 	XT_CONNMARK_RESTORE
 };
 
+enum {
+	XT_CONNMARK_VALUE = BIT(0),
+	XT_CONNMARK_DSCP = BIT(1)
+};
+
 enum {
 	D_SHIFT_LEFT = 0,
 	D_SHIFT_RIGHT,
@@ -34,6 +39,11 @@ struct xt_connmark_tginfo2 {
 	__u8 shift_dir, shift_bits, mode;
 };
 
+struct xt_connmark_tginfo3 {
+	__u32 ctmark, ctmask, nfmask;
+	__u8 shift_dir, shift_bits, mode, func;
+};
+
 struct xt_connmark_mtinfo1 {
 	__u32 mark, mask;
 	__u8 invert;
diff --git a/net/netfilter/xt_connmark.c b/net/netfilter/xt_connmark.c
index eec2f3a88d73..188fd2495121 100644
--- a/net/netfilter/xt_connmark.c
+++ b/net/netfilter/xt_connmark.c
@@ -24,12 +24,13 @@ MODULE_ALIAS("ipt_connmark");
 MODULE_ALIAS("ip6t_connmark");
 
 static unsigned int
-connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
+connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo3 *info)
 {
 	enum ip_conntrack_info ctinfo;
 	u_int32_t new_targetmark;
 	struct nf_conn *ct;
 	u_int32_t newmark;
+	u_int8_t dscp;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if (ct == NULL)
@@ -37,12 +38,24 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
 
 	switch (info->mode) {
 	case XT_CONNMARK_SET:
-		newmark = (ct->mark & ~info->ctmask) ^ info->ctmark;
-		if (info->shift_dir == D_SHIFT_RIGHT)
-			newmark >>= info->shift_bits;
-		else
-			newmark <<= info->shift_bits;
-
+		newmark = ct->mark;
+		if (info->func & XT_CONNMARK_VALUE) {
+			newmark = (newmark & ~info->ctmask) ^ info->ctmark;
+			if (info->shift_dir == D_SHIFT_RIGHT)
+				newmark >>= info->shift_bits;
+			else
+				newmark <<= info->shift_bits;
+		} else if (info->func & XT_CONNMARK_DSCP) {
+			if (skb->protocol == htons(ETH_P_IP))
+				dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+			else if (skb->protocol == htons(ETH_P_IPV6))
+				dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+			else	/* protocol doesn't have diffserv */
+				break;
+
+			newmark = (newmark & ~info->ctmark) |
+				  (info->ctmask | (dscp << info->shift_bits));
+		}
 		if (ct->mark != newmark) {
 			ct->mark = newmark;
 			nf_conntrack_event_cache(IPCT_MARK, ct);
@@ -81,20 +94,36 @@ static unsigned int
 connmark_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_connmark_tginfo1 *info = par->targinfo;
-	const struct xt_connmark_tginfo2 info2 = {
+	const struct xt_connmark_tginfo3 info3 = {
 		.ctmark	= info->ctmark,
 		.ctmask	= info->ctmask,
 		.nfmask	= info->nfmask,
 		.mode	= info->mode,
+		.func	= XT_CONNMARK_VALUE
 	};
 
-	return connmark_tg_shift(skb, &info2);
+	return connmark_tg_shift(skb, &info3);
 }
 
 static unsigned int
 connmark_tg_v2(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_connmark_tginfo2 *info = par->targinfo;
+	const struct xt_connmark_tginfo3 info3 = {
+		.ctmark	= info->ctmark,
+		.ctmask	= info->ctmask,
+		.nfmask	= info->nfmask,
+		.mode	= info->mode,
+		.func	= XT_CONNMARK_VALUE
+	};
+
+	return connmark_tg_shift(skb, &info3);
+}
+
+static unsigned int
+connmark_tg_v3(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_connmark_tginfo3 *info = par->targinfo;
 
 	return connmark_tg_shift(skb, info);
 }
@@ -165,6 +194,16 @@ static struct xt_target connmark_tg_reg[] __read_mostly = {
 		.targetsize     = sizeof(struct xt_connmark_tginfo2),
 		.destroy        = connmark_tg_destroy,
 		.me             = THIS_MODULE,
+	},
+	{
+		.name           = "CONNMARK",
+		.revision       = 3,
+		.family         = NFPROTO_UNSPEC,
+		.checkentry     = connmark_tg_check,
+		.target         = connmark_tg_v3,
+		.targetsize     = sizeof(struct xt_connmark_tginfo3),
+		.destroy        = connmark_tg_destroy,
+		.me             = THIS_MODULE,
 	}
 };
 
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/1] netfilter: connmark: introduce set-dscpmark
  2019-12-03 16:06 ` [PATCH 0/1] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
  2019-12-03 16:06   ` [PATCH 1/1] " Kevin Darbyshire-Bryant
@ 2019-12-05  8:56   ` Jeremy Sowden
  2019-12-05  9:46     ` Kevin 'ldir' Darbyshire-Bryant
  2019-12-05 10:49     ` Florian Westphal
  2019-12-09 21:42   ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Jeremy Sowden
  2019-12-11 13:01   ` [PATCH nf-next v2] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
  3 siblings, 2 replies; 26+ messages in thread
From: Jeremy Sowden @ 2019-12-05  8:56 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant, Pablo Neira Ayuso; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

On 2019-12-03, at 16:06:52 +0000, Kevin Darbyshire-Bryant wrote:
> Greetings.  The following patch is similar to one I submitted as an
> RFC quite a while back (April).  Since then I've realised that the
> option should have been in the 'set mark' family as opposed to 'save
> mark' because 'set' is about setting the ct mark directly, whereas
> 'save' is about copying a packet's mark to the ct mark.
>
> Similarly I've been made aware of the revision infrastructure and now
> that I understand that a little more have made use of it for this
> change.  Hopefully this addresses one of Pablo's concerns.
>
> I've not been able to address the 'I'd like an nftables version'.
> Quite simply it is beyond my knowledge and ability.  I am willing to
> contribute financially if someone wishes to step up to the nftables
> plate...yes I'd like to see the functionality implemented *that* much.

I'll do it (no financial contribution required :)). There is one thing I
want to find out before I get started.

Pablo, comparing the x_tables and nftables connmark implementations I
see that nftables doesn't support all the bit-twiddling that x_tables
does.  Why is this?  Was it not wanted or has it just not been imple-
mented?

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/1] netfilter: connmark: introduce set-dscpmark
  2019-12-05  8:56   ` [PATCH 0/1] " Jeremy Sowden
@ 2019-12-05  9:46     ` Kevin 'ldir' Darbyshire-Bryant
  2019-12-06  8:54       ` Jeremy Sowden
  2019-12-05 10:49     ` Florian Westphal
  1 sibling, 1 reply; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-12-05  9:46 UTC (permalink / raw)
  To: Jeremy Sowden; +Cc: Pablo Neira Ayuso, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]



> On 5 Dec 2019, at 08:56, Jeremy Sowden <jeremy@azazel.net> wrote:
> 
> On 2019-12-03, at 16:06:52 +0000, Kevin Darbyshire-Bryant wrote:
>> Greetings.  The following patch is similar to one I submitted as an
>> RFC quite a while back (April).  Since then I've realised that the
>> option should have been in the 'set mark' family as opposed to 'save
>> mark' because 'set' is about setting the ct mark directly, whereas
>> 'save' is about copying a packet's mark to the ct mark.
>> 
>> Similarly I've been made aware of the revision infrastructure and now
>> that I understand that a little more have made use of it for this
>> change.  Hopefully this addresses one of Pablo's concerns.
>> 
>> I've not been able to address the 'I'd like an nftables version'.
>> Quite simply it is beyond my knowledge and ability.  I am willing to
>> contribute financially if someone wishes to step up to the nftables
>> plate...yes I'd like to see the functionality implemented *that* much.
> 
> I'll do it (no financial contribution required :)). There is one thing I
> want to find out before I get started.

Hi Jeremy,

You’ll permit me to make a donation in appreciation of your efforts though?

I’m not totally convinced that what I’ve submitted for x_tables is the
‘perfect’ way of implementing the function so it’s a plea for guidance as
much as anything :-)

> Pablo, comparing the x_tables and nftables connmark implementations I
> see that nftables doesn't support all the bit-twiddling that x_tables
> does.  Why is this?  Was it not wanted or has it just not been imple-
> mented?
> 
> J.

Thanks,

Kevin

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/1] netfilter: connmark: introduce set-dscpmark
  2019-12-05  8:56   ` [PATCH 0/1] " Jeremy Sowden
  2019-12-05  9:46     ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-12-05 10:49     ` Florian Westphal
  2019-12-05 22:00       ` Jeremy Sowden
  1 sibling, 1 reply; 26+ messages in thread
From: Florian Westphal @ 2019-12-05 10:49 UTC (permalink / raw)
  To: Jeremy Sowden; +Cc: Kevin Darbyshire-Bryant, Pablo Neira Ayuso, netfilter-devel

Jeremy Sowden <jeremy@azazel.net> wrote:
> Pablo, comparing the x_tables and nftables connmark implementations I
> see that nftables doesn't support all the bit-twiddling that x_tables
> does.  Why is this?  Was it not wanted or has it just not been imple-
> mented?

The latter.  It would be needed to extend nft_bitwise.c to accept
a second register value and extend nft userspace to accept non-immediate
values as second operand.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/1] netfilter: connmark: introduce set-dscpmark
  2019-12-05 10:49     ` Florian Westphal
@ 2019-12-05 22:00       ` Jeremy Sowden
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Sowden @ 2019-12-05 22:00 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kevin Darbyshire-Bryant, Pablo Neira Ayuso, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 541 bytes --]

On 2019-12-05, at 11:49:59 +0100, Florian Westphal wrote:
> Jeremy Sowden <jeremy@azazel.net> wrote:
> > Pablo, comparing the x_tables and nftables connmark implementations
> > I see that nftables doesn't support all the bit-twiddling that
> > x_tables does.  Why is this?  Was it not wanted or has it just not
> > been imple- mented?
>
> The latter.  It would be needed to extend nft_bitwise.c to accept a
> second register value and extend nft userspace to accept non-immediate
> values as second operand.

Thanks for the explanation.

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/1] netfilter: connmark: introduce set-dscpmark
  2019-12-05  9:46     ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-12-06  8:54       ` Jeremy Sowden
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Sowden @ 2019-12-06  8:54 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant; +Cc: Netfilter Devel

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On 2019-12-05, at 09:46:38 +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
> On 5 Dec 2019, at 08:56, Jeremy Sowden <jeremy@azazel.net> wrote:
> > On 2019-12-03, at 16:06:52 +0000, Kevin Darbyshire-Bryant wrote:
> > > Greetings.  The following patch is similar to one I submitted as
> > > an RFC quite a while back (April).  Since then I've realised that
> > > the option should have been in the 'set mark' family as opposed to
> > > 'save mark' because 'set' is about setting the ct mark directly,
> > > whereas 'save' is about copying a packet's mark to the ct mark.
> > >
> > > Similarly I've been made aware of the revision infrastructure and
> > > now that I understand that a little more have made use of it for
> > > this change.  Hopefully this addresses one of Pablo's concerns.
> > >
> > > I've not been able to address the 'I'd like an nftables version'.
> > > Quite simply it is beyond my knowledge and ability.
> >
> > I'll do it [...].
>
> [...]
>
> I'm not totally convinced that what I've submitted for x_tables is the
> 'perfect' way of implementing the function so it's a plea for guidance
> as much as anything :-)

Understood. :) I'll port it to nft as a starting-point and then we can
see what feedback we get.

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark.
  2019-12-03 16:06 ` [PATCH 0/1] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
  2019-12-03 16:06   ` [PATCH 1/1] " Kevin Darbyshire-Bryant
  2019-12-05  8:56   ` [PATCH 0/1] " Jeremy Sowden
@ 2019-12-09 21:42   ` Jeremy Sowden
  2019-12-09 21:42     ` [RFC PATCH nftables] Add "ct dscpmark" conntrack statement Jeremy Sowden
  2019-12-09 22:47     ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Florian Westphal
  2019-12-11 13:01   ` [PATCH nf-next v2] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
  3 siblings, 2 replies; 26+ messages in thread
From: Jeremy Sowden @ 2019-12-09 21:42 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Netfilter Devel, Kevin Darbyshire-Bryant

"ct dscpmark" is a method of storing the DSCP of an ip packet into the
conntrack mark.  In combination with a suitable tc filter action
(act_ctinfo) DSCP values are able to be stored in the mark on egress and
restored on ingress across links that otherwise alter or bleach DSCP.

This is useful for qdiscs such as CAKE which are able to shape according
to policies based on DSCP.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

The "ct dscpmark" conntrack statement solves the problem of storing the
DSCP to the conntrack mark in a way suitable for the new act_ctinfo tc
action to restore.

The "ct dscpmark" statement accepts 2 parameters, a 32bit 'dscpmask' and a
32bit 'statemask'.  The dscp mask must be 6 contiguous bits and
represents the area where the DSCP will be stored in the connmark.  The
state mask is a minimum 1 bit length mask that must not overlap with the
dscpmask.  It represents a flag which is set when the DSCP has been
stored in the conntrack mark. This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A state
mask of zero disables the setting of a status bit/s.

For example,

  table ip t {
    chain c {
      type filter hook postrouting priority mangle; policy accept;
      oif eth0 ct dscpmark set 0xfc000000/0x01000000
    }
  }

would store the DSCP in the top 6 bits of the 32bit mark field, and use
the LSB of the top byte as the 'DSCP has been stored' marker.

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      ^                   ^
      |                   |
      ---|             Conditional flag
         |             set this when dscp
|-ip diffserv-|        stored in mark
| 6 bits      |
|-------------|

an identically configured tc action to restore looks like:

  tc filter show dev eth0 ingress
  filter parent ffff: protocol all pref 10 u32 chain 0
  filter parent ffff: protocol all pref 10 u32 chain 0 fh 800: ht divisor 1
  filter parent ffff: protocol all pref 10 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1: not_in_hw
    match 00000000/00000000 at 0
          action order 1: ctinfo zone 0 pipe
           index 2 ref 1 bind 1 dscp 0xfc000000/0x1000000

          action order 2: mirred (Egress Redirect to device ifb4eth0) stolen
          index 1 ref 1 bind 1

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

Suggested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
---
 include/uapi/linux/netfilter/nf_tables.h |  2 ++
 net/netfilter/Kconfig                    |  7 ++++++
 net/netfilter/nft_ct.c                   | 32 ++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index bb9b049310df..bc69fb904783 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -984,6 +984,7 @@ enum nft_socket_keys {
  * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
  * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
  * @NFT_CT_ID: conntrack id
+ * @NFT_CT_DSCPMARK: conntrack DSCP mark
  */
 enum nft_ct_keys {
 	NFT_CT_STATE,
@@ -1010,6 +1011,7 @@ enum nft_ct_keys {
 	NFT_CT_SRC_IP6,
 	NFT_CT_DST_IP6,
 	NFT_CT_ID,
+	NFT_CT_DSCPMARK,
 	__NFT_CT_MAX
 };
 #define NFT_CT_MAX		(__NFT_CT_MAX - 1)
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 91efae88e8c2..d17c87aab667 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -91,6 +91,13 @@ config NF_CONNTRACK_MARK
 	  of packets, but this mark value is kept in the conntrack session
 	  instead of the individual packets.
 
+config NF_CONNTRACK_DSCPMARK
+	bool  'DSCP connection mark tracking support'
+	depends on NF_CONNTRACK_MARK
+	help
+	  This option enables support for copying the DiffServ code-point
+	  from a packet's IP header and storing it as the connection mark.
+
 config NF_CONNTRACK_SECMARK
 	bool  'Connection tracking security mark support'
 	depends on NETWORK_SECMARK
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 46ca8bcca1bd..bdd32307462c 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -301,6 +301,31 @@ static void nft_ct_set_eval(const struct nft_expr *expr,
 		}
 		break;
 #endif
+#ifdef CONFIG_NF_CONNTRACK_DSCPMARK
+	case NFT_CT_DSCPMARK: {
+		u64 ct_dscpmark;
+		u32 dscpshift, statemask, dscp;
+
+		ct_dscpmark = nft_reg_load64(&regs->data[priv->sreg]);
+
+		dscpshift = ct_dscpmark >> 32;
+		statemask = ct_dscpmark & 0xffffffff;
+
+		if (skb->protocol == htons(ETH_P_IP))
+			dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+		else if (skb->protocol == htons(ETH_P_IPV6))
+			dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+		else    /* protocol doesn't have diffserv */
+			break;
+
+		value = (dscp << dscpshift) | statemask;
+		if (ct->mark != value) {
+			ct->mark = value;
+			nf_conntrack_event_cache(IPCT_MARK, ct);
+		}
+		break;
+	}
+#endif
 #ifdef CONFIG_NF_CONNTRACK_SECMARK
 	case NFT_CT_SECMARK:
 		if (ct->secmark != value) {
@@ -554,6 +579,13 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
 		len = FIELD_SIZEOF(struct nf_conn, mark);
 		break;
 #endif
+#ifdef CONFIG_NF_CONNTRACK_DSCPMARK
+	case NFT_CT_DSCPMARK:
+		if (tb[NFTA_CT_DIRECTION])
+			return -EINVAL;
+		len = FIELD_SIZEOF(struct nf_conn, mark) * 2;
+		break;
+#endif
 #ifdef CONFIG_NF_CONNTRACK_LABELS
 	case NFT_CT_LABELS:
 		if (tb[NFTA_CT_DIRECTION])
-- 
2.24.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH nftables] Add "ct dscpmark" conntrack statement.
  2019-12-09 21:42   ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Jeremy Sowden
@ 2019-12-09 21:42     ` Jeremy Sowden
  2019-12-09 22:47     ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Florian Westphal
  1 sibling, 0 replies; 26+ messages in thread
From: Jeremy Sowden @ 2019-12-09 21:42 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Netfilter Devel, Kevin Darbyshire-Bryant

"ct dscpmark" is a method of storing the DSCP of an ip packet into the
conntrack mark.  In combination with a suitable tc filter action
(act_ctinfo) DSCP values are able to be stored in the mark on egress and
restored on ingress across links that otherwise alter or bleach DSCP.

This is useful for qdiscs such as CAKE which are able to shape according
to policies based on DSCP.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

The "ct dscpmark" conntrack statement solves the problem of storing the
DSCP to the conntrack mark in a way suitable for the new act_ctinfo tc
action to restore.

The "ct dscpmark" statement accepts 2 parameters, a 32bit 'dscpmask' and a
32bit 'statemask'.  The dscp mask must be 6 contiguous bits and
represents the area where the DSCP will be stored in the connmark.  The
state mask is a minimum 1 bit length mask that must not overlap with the
dscpmask.  It represents a flag which is set when the DSCP has been
stored in the conntrack mark. This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A state
mask of zero disables the setting of a status bit/s.

For example,

  table ip t {
    chain c {
      type filter hook postrouting priority mangle; policy accept;
      oif eth0 ct dscpmark set 0xfc000000/0x01000000
    }
  }

would store the DSCP in the top 6 bits of the 32bit mark field, and use
the LSB of the top byte as the 'DSCP has been stored' marker.

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      ^                   ^
      |                   |
      ---|             Conditional flag
         |             set this when dscp
|-ip diffserv-|        stored in mark
| 6 bits      |
|-------------|

an identically configured tc action to restore looks like:

  tc filter show dev eth0 ingress
  filter parent ffff: protocol all pref 10 u32 chain 0
  filter parent ffff: protocol all pref 10 u32 chain 0 fh 800: ht divisor 1
  filter parent ffff: protocol all pref 10 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1: not_in_hw
    match 00000000/00000000 at 0
          action order 1: ctinfo zone 0 pipe
           index 2 ref 1 bind 1 dscp 0xfc000000/0x1000000

          action order 2: mirred (Egress Redirect to device ifb4eth0) stolen
          index 1 ref 1 bind 1

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

Suggested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
---
 doc/statements.txt                            | 12 ++++-
 include/ct.h                                  |  1 +
 include/datatype.h                            |  2 +
 include/json.h                                |  2 +
 include/linux/netfilter/nf_tables.h           |  2 +
 src/ct.c                                      | 25 +++++++++
 src/datatype.c                                |  1 +
 src/json.c                                    |  9 ++++
 src/parser_bison.y                            | 53 ++++++++++++++++++-
 src/scanner.l                                 |  1 +
 .../shell/testcases/chains/0040ct_dscpmark_0  | 21 ++++++++
 .../shell/testcases/chains/0040ct_dscpmark_1  | 14 +++++
 .../shell/testcases/chains/0040ct_dscpmark_2  | 14 +++++
 .../shell/testcases/chains/0040ct_dscpmark_3  | 14 +++++
 14 files changed, 167 insertions(+), 4 deletions(-)
 create mode 100755 tests/shell/testcases/chains/0040ct_dscpmark_0
 create mode 100755 tests/shell/testcases/chains/0040ct_dscpmark_1
 create mode 100755 tests/shell/testcases/chains/0040ct_dscpmark_2
 create mode 100755 tests/shell/testcases/chains/0040ct_dscpmark_3

diff --git a/doc/statements.txt b/doc/statements.txt
index ced311cb8d17..c3e8a208ec65 100644
--- a/doc/statements.txt
+++ b/doc/statements.txt
@@ -211,9 +211,9 @@ CONNTRACK STATEMENT
 The conntrack statement can be used to set the conntrack mark and conntrack labels.
 
 [verse]
-*ct* {*mark* | *event* | *label* | *zone*} *set* 'value'
+*ct* {*mark* | *dscpmark* | *event* | *label* | *zone*} *set* 'value'
 
-The ct statement sets meta data associated with a connection. The zone id
+The ct statement sets metadata associated with a connection. The zone id
 has to be assigned before a conntrack lookup takes place, i.e. this has to be
 done in prerouting and possibly output (if locally generated packets need to be
 placed in a distinct zone), with a hook priority of -300.
@@ -231,6 +231,9 @@ quoted string
 |mark|
 Connection tracking mark |
 mark
+|dscpmark|
+A pair of bitmasks indicating that the DiffServ code-point should be used as the ct mark |
+dscp-mask/state-mask
 |label|
 Connection tracking label|
 label
@@ -263,6 +266,11 @@ table inet raw {
 ct event set new,related,destroy
 --------------------------------------
 
+.use the DSCP shifted to the top six bits ORed with 0x101 as the ct mark:
+-------------------------------------------------------------------------
+ct dscpmark 0xfc000000/0x00000101
+-------------------------------------------------------------------------
+
 META STATEMENT
 ~~~~~~~~~~~~~~
 A meta statement sets the value of a meta expression. The existing meta fields
diff --git a/include/ct.h b/include/ct.h
index efb2d4185543..0f1562819cd7 100644
--- a/include/ct.h
+++ b/include/ct.h
@@ -39,5 +39,6 @@ extern const char *ct_label2str(const struct symbol_table *tbl,
 extern const struct datatype ct_dir_type;
 extern const struct datatype ct_state_type;
 extern const struct datatype ct_status_type;
+extern const struct datatype ct_dscpmark_type;
 
 #endif /* NFTABLES_CT_H */
diff --git a/include/datatype.h b/include/datatype.h
index 49b8f608aa1d..38cab83f6560 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -48,6 +48,7 @@
  * @TYPE_TIME_DATA	Date type (integer subtype)
  * @TYPE_TIME_HOUR	Hour type (integer subtype)
  * @TYPE_TIME_DAY	Day type (integer subtype)
+ * @TYPE_CT_DSCPMARK	conntrack DSCP mark type (integer subtype)
  */
 enum datatypes {
 	TYPE_INVALID,
@@ -96,6 +97,7 @@ enum datatypes {
 	TYPE_TIME_DATE,
 	TYPE_TIME_HOUR,
 	TYPE_TIME_DAY,
+	TYPE_CT_DSCPMARK,
 	__TYPE_MAX
 };
 #define TYPE_MAX		(__TYPE_MAX - 1)
diff --git a/include/json.h b/include/json.h
index 20d6c2a4a8e7..4d20940c01cf 100644
--- a/include/json.h
+++ b/include/json.h
@@ -65,6 +65,7 @@ json_t *ct_label_type_json(const struct expr *expr, struct output_ctx *octx);
 json_t *time_type_json(const struct expr *expr, struct output_ctx *octx);
 json_t *uid_type_json(const struct expr *expr, struct output_ctx *octx);
 json_t *gid_type_json(const struct expr *expr, struct output_ctx *octx);
+json_t *ct_dscpmark_type_json(const struct expr *expr, struct output_ctx *octx);
 
 json_t *expr_stmt_json(const struct stmt *stmt, struct output_ctx *octx);
 json_t *payload_stmt_json(const struct stmt *stmt, struct output_ctx *octx);
@@ -162,6 +163,7 @@ EXPR_PRINT_STUB(ct_label_type)
 EXPR_PRINT_STUB(time_type)
 EXPR_PRINT_STUB(uid_type)
 EXPR_PRINT_STUB(gid_type)
+EXPR_PRINT_STUB(ct_dscpmark_type)
 
 STMT_PRINT_STUB(expr)
 STMT_PRINT_STUB(payload)
diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h
index ed8881ad18ed..8afd22516a67 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -982,6 +982,7 @@ enum nft_socket_keys {
  * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
  * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
  * @NFT_CT_ID: conntrack id
+ * @NFT_CT_DSCPMARK: conntrack DSCP mark
  */
 enum nft_ct_keys {
 	NFT_CT_STATE,
@@ -1008,6 +1009,7 @@ enum nft_ct_keys {
 	NFT_CT_SRC_IP6,
 	NFT_CT_DST_IP6,
 	NFT_CT_ID,
+	NFT_CT_DSCPMARK,
 	__NFT_CT_MAX
 };
 #define NFT_CT_MAX		(__NFT_CT_MAX - 1)
diff --git a/src/ct.c b/src/ct.c
index 9e6a8351ffb2..efee7bd66589 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -141,6 +141,28 @@ static const struct datatype ct_event_type = {
 	.sym_tbl	= &ct_events_tbl,
 };
 
+static void ct_dscpmark_type_print(const struct expr *expr,
+				   struct output_ctx *octx)
+{
+	uint64_t ct_dscpmark = mpz_get_uint64(expr->value);
+	uint32_t dscpshift   = ct_dscpmark >> 32;
+	uint32_t statemask   = ct_dscpmark & 0xffffffff;
+
+	nft_print(octx, "0x%08" PRIx32 "/0x%08" PRIx32,
+		  0x3f << dscpshift, statemask);
+}
+
+const struct datatype ct_dscpmark_type = {
+	.type		= TYPE_CT_DSCPMARK,
+	.name		= "ct_dscpmark",
+	.desc		= "conntrack dscp mark",
+	.byteorder	= BYTEORDER_HOST_ENDIAN,
+	.size		= 8 * BITS_PER_BYTE,
+	.basetype	= &integer_type,
+	.print		= ct_dscpmark_type_print,
+	.json		= ct_dscpmark_type_json,
+};
+
 #define CT_LABEL_BIT_SIZE 128
 
 const char *ct_label2str(const struct symbol_table *ct_label_tbl,
@@ -301,6 +323,9 @@ const struct ct_template ct_templates[__NFT_CT_MAX] = {
 					      BYTEORDER_BIG_ENDIAN, 128),
 	[NFT_CT_SECMARK]	= CT_TEMPLATE("secmark", &integer_type,
 					      BYTEORDER_HOST_ENDIAN, 32),
+	[NFT_CT_DSCPMARK]	= CT_TEMPLATE("dscpmark", &ct_dscpmark_type,
+					      BYTEORDER_HOST_ENDIAN,
+					      8 * BITS_PER_BYTE),
 };
 
 static void ct_print(enum nft_ct_keys key, int8_t dir, uint8_t nfproto,
diff --git a/src/datatype.c b/src/datatype.c
index b9e167e03765..bd4be5f3a556 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -74,6 +74,7 @@ static const struct datatype *datatypes[TYPE_MAX + 1] = {
 	[TYPE_TIME_DATE]	= &date_type,
 	[TYPE_TIME_HOUR]	= &hour_type,
 	[TYPE_TIME_DAY]		= &day_type,
+	[TYPE_CT_DSCPMARK]      = &ct_dscpmark_type,
 };
 
 const struct datatype *datatype_lookup(enum datatypes type)
diff --git a/src/json.c b/src/json.c
index 3498e24db363..b8714891f501 100644
--- a/src/json.c
+++ b/src/json.c
@@ -1093,6 +1093,15 @@ json_t *gid_type_json(const struct expr *expr, struct output_ctx *octx)
 	return json_integer(gid);
 }
 
+json_t *ct_dscpmark_type_json(const struct expr *expr, struct output_ctx *octx)
+{
+	uint64_t ct_dscpmark = mpz_get_uint64(expr->value);
+	uint32_t dscpshift   = ct_dscpmark >> 32;
+	uint32_t statemask   = ct_dscpmark & 0xffffffff;
+
+	return json_pack("[ii]", 0x3f << dscpshift, statemask);
+}
+
 json_t *expr_stmt_json(const struct stmt *stmt, struct output_ctx *octx)
 {
 	return expr_print_json(stmt->expr, octx);
diff --git a/src/parser_bison.y b/src/parser_bison.y
index 707f46716ed3..e7a2e222f88d 100644
--- a/src/parser_bison.y
+++ b/src/parser_bison.y
@@ -448,6 +448,7 @@ int nft_lex(void *, void *, void *);
 %token STATUS			"status"
 %token ORIGINAL			"original"
 %token REPLY			"reply"
+%token DSCPMARK			"dscpmark"
 
 %token COUNTER			"counter"
 %token NAME			"name"
@@ -658,8 +659,8 @@ int nft_lex(void *, void *, void *);
 
 %type <expr>			multiton_stmt_expr
 %destructor { expr_free($$); }	multiton_stmt_expr
-%type <expr>			prefix_stmt_expr range_stmt_expr wildcard_expr
-%destructor { expr_free($$); }	prefix_stmt_expr range_stmt_expr wildcard_expr
+%type <expr>			ct_dscpmark_stmt_expr prefix_stmt_expr range_stmt_expr wildcard_expr
+%destructor { expr_free($$); }	ct_dscpmark_stmt_expr prefix_stmt_expr range_stmt_expr wildcard_expr
 
 %type <expr>			primary_stmt_expr basic_stmt_expr
 %destructor { expr_free($$); }	primary_stmt_expr basic_stmt_expr
@@ -3020,6 +3021,50 @@ map_stmt_expr		:	concat_stmt_expr	MAP	map_stmt_expr_set
 			|	concat_stmt_expr	{ $$ = $1; }
 			;
 
+ct_dscpmark_stmt_expr	:	NUM	SLASH	NUM
+			{
+				uint32_t dscpmask  = $1;
+				uint32_t statemask = $3;
+				uint32_t dscpshift = 0;
+				uint64_t ct_dscpmark;
+
+				if (dscpmask & statemask) {
+					erec_queue(error(&@$,
+							 "State mask overlaps with DSCP mask"),
+						   state->msgs);
+					YYERROR;
+				}
+
+				for (uint32_t m = 0x3f;
+				     m != (m & dscpmask) && dscpshift <= 26;
+				     m <<= 1, dscpshift++)
+					;
+
+				if (dscpshift > 26) {
+					erec_queue(error(&@1,
+							 "DSCP mask not found"),
+						   state->msgs);
+					YYERROR;
+				}
+
+				if (~(0x3f << dscpshift) & dscpmask) {
+					erec_queue(error(&@1,
+							 "Invalid DSCP mask"),
+						   state->msgs);
+					YYERROR;
+				}
+
+				ct_dscpmark = ((uint64_t) dscpshift << 32) | statemask;
+
+				$$ = constant_expr_alloc(&@$,
+							 &ct_dscpmark_type,
+							 BYTEORDER_HOST_ENDIAN,
+							 sizeof(ct_dscpmark) *
+							 BITS_PER_BYTE,
+							 &ct_dscpmark);
+			}
+			;
+
 prefix_stmt_expr	:	basic_stmt_expr	SLASH	NUM
 			{
 				$$ = prefix_expr_alloc(&@$, $1, $3);
@@ -4468,6 +4513,10 @@ ct_stmt			:	CT	ct_key		SET	stmt_expr
 			{
 				$$ = ct_stmt_alloc(&@$, $3, $2, $5);
 			}
+			|	CT	DSCPMARK	SET	ct_dscpmark_stmt_expr
+			{
+				$$ = ct_stmt_alloc(&@$, NFT_CT_DSCPMARK, -1, $4);
+			}
 			;
 
 payload_stmt		:	payload_expr		SET	stmt_expr
diff --git a/src/scanner.l b/src/scanner.l
index d32adf4897ae..23a5c06bf267 100644
--- a/src/scanner.l
+++ b/src/scanner.l
@@ -544,6 +544,7 @@ addrstring	({macaddr}|{ip4addr}|{ip6addr})
 "label"			{ return LABEL; }
 "state"			{ return STATE; }
 "status"		{ return STATUS; }
+"dscpmark"		{ return DSCPMARK; }
 
 "numgen"		{ return NUMGEN; }
 "inc"			{ return INC; }
diff --git a/tests/shell/testcases/chains/0040ct_dscpmark_0 b/tests/shell/testcases/chains/0040ct_dscpmark_0
new file mode 100755
index 000000000000..44d7afcf5de3
--- /dev/null
+++ b/tests/shell/testcases/chains/0040ct_dscpmark_0
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+set -e
+
+RULESET="
+  add table t
+  add chain t c { type filter hook output priority mangle; }
+  add rule t c oif lo tcp dport ssh ct dscpmark set 0xfc000000/0x00010000
+"
+
+OUTPUT='table ip t {
+	chain c {
+		type filter hook output priority mangle; policy accept;
+		oif "lo" tcp dport 22 ct dscpmark set 0xfc000000/0x00010000
+	}
+}'
+
+$NFT -f - <<< "$RULESET"
+
+test "$OUTPUT" = "$($NFT list ruleset)"
+
diff --git a/tests/shell/testcases/chains/0040ct_dscpmark_1 b/tests/shell/testcases/chains/0040ct_dscpmark_1
new file mode 100755
index 000000000000..b5d7a2b3f494
--- /dev/null
+++ b/tests/shell/testcases/chains/0040ct_dscpmark_1
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+RULESET="
+  add table t
+  add chain t c { type filter hook output priority mangle; }
+  add rule t c oif lo tcp dport ssh ct dscpmark set 0xfc000000/0x10000000
+"
+
+if $NFT -f - <<< "$RULESET" 2>/dev/null; then
+	echo "E: accepted overlapping ct dscpmark fields" 1>&2
+	exit 1
+fi
+exit 0
+
diff --git a/tests/shell/testcases/chains/0040ct_dscpmark_2 b/tests/shell/testcases/chains/0040ct_dscpmark_2
new file mode 100755
index 000000000000..054c5f78cba4
--- /dev/null
+++ b/tests/shell/testcases/chains/0040ct_dscpmark_2
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+RULESET="
+  add table t
+  add chain t c { type filter hook output priority mangle; }
+  add rule t c oif lo tcp dport ssh ct dscpmark set 0xf0000000/0x00000000
+"
+
+if $NFT -f - <<< "$RULESET" 2>/dev/null; then
+	echo "E: accepted missing ct dscpmark dscpmask field" 1>&2
+	exit 1
+fi
+exit 0
+
diff --git a/tests/shell/testcases/chains/0040ct_dscpmark_3 b/tests/shell/testcases/chains/0040ct_dscpmark_3
new file mode 100755
index 000000000000..f6fb40863acd
--- /dev/null
+++ b/tests/shell/testcases/chains/0040ct_dscpmark_3
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+RULESET="
+  add table t
+  add chain t c { type filter hook output priority mangle; }
+  add rule t c oif lo tcp dport ssh ct dscpmark set 0xfc000001/0x01000000
+"
+
+if $NFT -f - <<< "$RULESET" 2>/dev/null; then
+	echo "E: accepted invalid ct dscpmark dscpmask field" 1>&2
+	exit 1
+fi
+exit 0
+
-- 
2.24.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark.
  2019-12-09 21:42   ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Jeremy Sowden
  2019-12-09 21:42     ` [RFC PATCH nftables] Add "ct dscpmark" conntrack statement Jeremy Sowden
@ 2019-12-09 22:47     ` Florian Westphal
  2019-12-09 23:23       ` Jeremy Sowden
  1 sibling, 1 reply; 26+ messages in thread
From: Florian Westphal @ 2019-12-09 22:47 UTC (permalink / raw)
  To: Jeremy Sowden; +Cc: Pablo Neira Ayuso, Netfilter Devel, Kevin Darbyshire-Bryant

Jeremy Sowden <jeremy@azazel.net> wrote:
> "ct dscpmark" is a method of storing the DSCP of an ip packet into the
> conntrack mark.  In combination with a suitable tc filter action
> (act_ctinfo) DSCP values are able to be stored in the mark on egress and
> restored on ingress across links that otherwise alter or bleach DSCP.
> 
> This is useful for qdiscs such as CAKE which are able to shape according
> to policies based on DSCP.
> 
> Ingress classification is traditionally a challenging task since
> iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
> lookups, hence are unable to see internal IPv4 addresses as used on the
> typical home masquerading gateway.
> 
> The "ct dscpmark" conntrack statement solves the problem of storing the
> DSCP to the conntrack mark in a way suitable for the new act_ctinfo tc
> action to restore.

Yes, but if someone else wants to store ip saddr or udp port or ifindex
or whatever we need to extend this again.

nft should be able to support:

nft add rule inet filter forward ct mark set ip dscp

(nft will reject this because types are different).

Same for

nft add rule inet filter forward ct mark set ip dscp << 16

(nft will claim the shift is unsupported for a 8 bit type).

We need a cast operator for this.  Something like

nft add rule inet filter forward ct mark set typeof(ct mark) ip dscp

or anything else that tells the parser that we really want the diffserv
value to be assigned to a mark type.

As far as I can see, no kernel changes would be reqired for this.

A cheap starting point would be to try to get rid of the sanity test
and make nft just accept the right-hand-side of 'ct mark set',
then see how to best add an 'do this anyway' override in the grammar.

I have older patches that adds a 'typeof' keyword for set definitions,
maybe it could be used for this casting too.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark.
  2019-12-09 22:47     ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Florian Westphal
@ 2019-12-09 23:23       ` Jeremy Sowden
  2019-12-10  1:25         ` Florian Westphal
  0 siblings, 1 reply; 26+ messages in thread
From: Jeremy Sowden @ 2019-12-09 23:23 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Netfilter Devel, Kevin Darbyshire-Bryant

[-- Attachment #1: Type: text/plain, Size: 2077 bytes --]

On 2019-12-09, at 23:47:10 +0100, Florian Westphal wrote:
> Jeremy Sowden wrote:
> > "ct dscpmark" is a method of storing the DSCP of an ip packet into
> > the conntrack mark.  In combination with a suitable tc filter action
> > (act_ctinfo) DSCP values are able to be stored in the mark on egress
> > and restored on ingress across links that otherwise alter or bleach
> > DSCP.
> >
> > This is useful for qdiscs such as CAKE which are able to shape
> > according to policies based on DSCP.
> >
> > Ingress classification is traditionally a challenging task since
> > iptables rules haven't yet run and tc filter/eBPF programs are
> > pre-NAT lookups, hence are unable to see internal IPv4 addresses as
> > used on the typical home masquerading gateway.
> >
> > The "ct dscpmark" conntrack statement solves the problem of storing
> > the DSCP to the conntrack mark in a way suitable for the new
> > act_ctinfo tc action to restore.
>
> Yes, but if someone else wants to store ip saddr or udp port or
> ifindex or whatever we need to extend this again.
>
> nft should be able to support:
>
> nft add rule inet filter forward ct mark set ip dscp
>
> (nft will reject this because types are different).
>
> Same for
>
> nft add rule inet filter forward ct mark set ip dscp << 16
>
> (nft will claim the shift is unsupported for a 8 bit type).
>
> We need a cast operator for this.  Something like
>
> nft add rule inet filter forward ct mark set typeof(ct mark) ip dscp
>
> or anything else that tells the parser that we really want the
> diffserv value to be assigned to a mark type.
>
> As far as I can see, no kernel changes would be reqired for this.
>
> A cheap starting point would be to try to get rid of the sanity test
> and make nft just accept the right-hand-side of 'ct mark set', then
> see how to best add an 'do this anyway' override in the grammar.
>
> I have older patches that adds a 'typeof' keyword for set definitions,
> maybe it could be used for this casting too.

These?

  https://lore.kernel.org/netfilter-devel/20190816144241.11469-1-fw@strlen.de/

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/1] netfilter: connmark: introduce set-dscpmark
  2019-12-03 16:06   ` [PATCH 1/1] " Kevin Darbyshire-Bryant
@ 2019-12-09 23:57     ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 0 replies; 26+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-12-09 23:57 UTC (permalink / raw)
  To: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 737 bytes --]



> On 3 Dec 2019, at 16:06, Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> wrote:
> 
> set-dscpmark is a method of storing the DSCP of an ip packet into
> conntrack mark.  In combination with a suitable tc filter action
> (act_ctinfo) DSCP values are able to be stored in the mark on egress and
> restored on ingress across links that otherwise alter or bleach DSCP.
> 
<snip>

Well no one has yet screamed.  I do think I’ve done this wrong:

> 
> +enum {
> +	XT_CONNMARK_VALUE = BIT(0),
> +	XT_CONNMARK_DSCP = BIT(1)
> +};

and they should be = (1 << bit_number) instead.

I’ll send a v2 with that change (to nf-next) but anything else? or is it all too horrid
to contemplate :-)

Cheers,

Kevin

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark.
  2019-12-09 23:23       ` Jeremy Sowden
@ 2019-12-10  1:25         ` Florian Westphal
  2019-12-10 11:01           ` Jeremy Sowden
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Westphal @ 2019-12-10  1:25 UTC (permalink / raw)
  To: Jeremy Sowden
  Cc: Florian Westphal, Pablo Neira Ayuso, Netfilter Devel,
	Kevin Darbyshire-Bryant

Jeremy Sowden <jeremy@azazel.net> wrote:
> > I have older patches that adds a 'typeof' keyword for set definitions,
> > maybe it could be used for this casting too.
> 
> These?
> 
>   https://lore.kernel.org/netfilter-devel/20190816144241.11469-1-fw@strlen.de/

Yes, still did not yet have time to catch up and implement what Pablo
suggested though.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark.
  2019-12-10  1:25         ` Florian Westphal
@ 2019-12-10 11:01           ` Jeremy Sowden
  2019-12-10 11:32             ` Florian Westphal
  0 siblings, 1 reply; 26+ messages in thread
From: Jeremy Sowden @ 2019-12-10 11:01 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Netfilter Devel, Kevin Darbyshire-Bryant

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]

On 2019-12-10, at 02:25:42 +0100, Florian Westphal wrote:
> Jeremy Sowden <jeremy@azazel.net> wrote:
> > > I have older patches that adds a 'typeof' keyword for set
> > > definitions, maybe it could be used for this casting too.
> >
> > These?
> >
> >   https://lore.kernel.org/netfilter-devel/20190816144241.11469-1-fw@strlen.de/
>
> Yes, still did not yet have time to catch up and implement what Pablo
> suggested though.

I'll take a look.

Thanks for the pointers, Florian.

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark.
  2019-12-10 11:01           ` Jeremy Sowden
@ 2019-12-10 11:32             ` Florian Westphal
  2019-12-10 19:52               ` Jeremy Sowden
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Westphal @ 2019-12-10 11:32 UTC (permalink / raw)
  To: Jeremy Sowden
  Cc: Florian Westphal, Pablo Neira Ayuso, Netfilter Devel,
	Kevin Darbyshire-Bryant

Jeremy Sowden <jeremy@azazel.net> wrote:
> On 2019-12-10, at 02:25:42 +0100, Florian Westphal wrote:
> > Jeremy Sowden <jeremy@azazel.net> wrote:
> > > > I have older patches that adds a 'typeof' keyword for set
> > > > definitions, maybe it could be used for this casting too.
> > >
> > > These?
> > >
> > >   https://lore.kernel.org/netfilter-devel/20190816144241.11469-1-fw@strlen.de/
> >
> > Yes, still did not yet have time to catch up and implement what Pablo
> > suggested though.
> 
> I'll take a look.

No need, I plan to resurrect this work soon.
If you really want to have a stab at it, let me know and I will rebase
what I have locally and push it out to a scratch repo for you.

Its not related to the 'ct mark' issue.  On second thought, reusing the
typeof keyword doesn't look like a good idea either.

We have, in most simple cases:

ct mark set 1
tcp dport set 42
ip daddr set 10.1.2.3

i.e. type on right side matches type of the left-hand expression.

tcp dport set 65536

would throw an error, as the number is out of range for the expected
port.

I thought that we could reuse typeof keyword:

tcp dport set typeof tcp dport 65536

But I'm not sure, it looks redundant, and I can't think of a
use-case/reason where one would need an 'intermediate type'
different from what is on the left-hand side.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark.
  2019-12-10 11:32             ` Florian Westphal
@ 2019-12-10 19:52               ` Jeremy Sowden
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Sowden @ 2019-12-10 19:52 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Netfilter Devel, Kevin Darbyshire-Bryant

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

On 2019-12-10, at 12:32:34 +0100, Florian Westphal wrote:
> Jeremy Sowden <jeremy@azazel.net> wrote:
> > On 2019-12-10, at 02:25:42 +0100, Florian Westphal wrote:
> > > Jeremy Sowden <jeremy@azazel.net> wrote:
> > > > > I have older patches that adds a 'typeof' keyword for set
> > > > > definitions, maybe it could be used for this casting too.
> > > >
> > > > These?
> > > >
> > > >   https://lore.kernel.org/netfilter-devel/20190816144241.11469-1-fw@strlen.de/
> > >
> > > Yes, still did not yet have time to catch up and implement what Pablo
> > > suggested though.
> >
> > I'll take a look.
>
> No need, I plan to resurrect this work soon.
> If you really want to have a stab at it, let me know and I will rebase
> what I have locally and push it out to a scratch repo for you.

I'll concentrate on the ct mark stuff.

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH nf-next v2] netfilter: connmark: introduce set-dscpmark
  2019-12-03 16:06 ` [PATCH 0/1] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
                     ` (2 preceding siblings ...)
  2019-12-09 21:42   ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Jeremy Sowden
@ 2019-12-11 13:01   ` Kevin Darbyshire-Bryant
  3 siblings, 0 replies; 26+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-12-11 13:01 UTC (permalink / raw)
  To: ldir; +Cc: netfilter-devel

set-dscpmark is a method of storing the DSCP of an ip packet into
conntrack mark.  In combination with a suitable tc filter action
(act_ctinfo) DSCP values are able to be stored in the mark on egress and
restored on ingress across links that otherwise alter or bleach DSCP.

This is useful for qdiscs such as CAKE which are able to shape according
to policies based on DSCP.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

x_tables CONNMARK set-dscpmark target solves the problem of storing the
DSCP to the conntrack mark in a way suitable for the new act_ctinfo tc
action to restore.

The set-dscpmark option accepts 2 parameters, a 32bit 'dscpmask' and a
32bit 'statemask'.  The dscp mask must be 6 contiguous bits and
represents the area where the DSCP will be stored in the connmark.  The
state mask is a minimum 1 bit length mask that must not overlap with the
dscpmask.  It represents a flag which is set when the DSCP has been
stored in the conntrack mark. This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A state
mask of zero disables the setting of a status bit/s.

example syntax with a suitably modified iptables user space application:

iptables -A QOS_MARK_eth0 -t mangle -j CONNMARK --set-dscpmark 0xfc000000/0x01000000

Would store the DSCP in the top 6 bits of the 32bit mark field, and use
the LSB of the top byte as the 'DSCP has been stored' marker.

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      ^                   ^
      |                   |
      ---|             Conditional flag
         |             set this when dscp
|-ip diffserv-|        stored in mark
| 6 bits      |
|-------------|

an identically configured tc action to restore looks like:

tc filter show dev eth0 ingress
filter parent ffff: protocol all pref 10 u32 chain 0
filter parent ffff: protocol all pref 10 u32 chain 0 fh 800: ht divisor 1
filter parent ffff: protocol all pref 10 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1: not_in_hw
  match 00000000/00000000 at 0
	action order 1: ctinfo zone 0 pipe
	 index 2 ref 1 bind 1 dscp 0xfc000000/0x1000000

	action order 2: mirred (Egress Redirect to device ifb4eth0) stolen
	index 1 ref 1 bind 1

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
v2 - change BIT(n) to (1 << n) in xt_connmark header

 include/uapi/linux/netfilter/xt_connmark.h | 10 ++++
 net/netfilter/xt_connmark.c                | 57 ++++++++++++++++++----
 2 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/netfilter/xt_connmark.h b/include/uapi/linux/netfilter/xt_connmark.h
index 1aa5c955ee1e..44ca3aaabb96 100644
--- a/include/uapi/linux/netfilter/xt_connmark.h
+++ b/include/uapi/linux/netfilter/xt_connmark.h
@@ -19,6 +19,11 @@ enum {
 	XT_CONNMARK_RESTORE
 };
 
+enum {
+	XT_CONNMARK_VALUE	= (1 << 0),
+	XT_CONNMARK_DSCP	= (1 << 1)
+};
+
 enum {
 	D_SHIFT_LEFT = 0,
 	D_SHIFT_RIGHT,
@@ -34,6 +39,11 @@ struct xt_connmark_tginfo2 {
 	__u8 shift_dir, shift_bits, mode;
 };
 
+struct xt_connmark_tginfo3 {
+	__u32 ctmark, ctmask, nfmask;
+	__u8 shift_dir, shift_bits, mode, func;
+};
+
 struct xt_connmark_mtinfo1 {
 	__u32 mark, mask;
 	__u8 invert;
diff --git a/net/netfilter/xt_connmark.c b/net/netfilter/xt_connmark.c
index eec2f3a88d73..188fd2495121 100644
--- a/net/netfilter/xt_connmark.c
+++ b/net/netfilter/xt_connmark.c
@@ -24,12 +24,13 @@ MODULE_ALIAS("ipt_connmark");
 MODULE_ALIAS("ip6t_connmark");
 
 static unsigned int
-connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
+connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo3 *info)
 {
 	enum ip_conntrack_info ctinfo;
 	u_int32_t new_targetmark;
 	struct nf_conn *ct;
 	u_int32_t newmark;
+	u_int8_t dscp;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if (ct == NULL)
@@ -37,12 +38,24 @@ connmark_tg_shift(struct sk_buff *skb, const struct xt_connmark_tginfo2 *info)
 
 	switch (info->mode) {
 	case XT_CONNMARK_SET:
-		newmark = (ct->mark & ~info->ctmask) ^ info->ctmark;
-		if (info->shift_dir == D_SHIFT_RIGHT)
-			newmark >>= info->shift_bits;
-		else
-			newmark <<= info->shift_bits;
-
+		newmark = ct->mark;
+		if (info->func & XT_CONNMARK_VALUE) {
+			newmark = (newmark & ~info->ctmask) ^ info->ctmark;
+			if (info->shift_dir == D_SHIFT_RIGHT)
+				newmark >>= info->shift_bits;
+			else
+				newmark <<= info->shift_bits;
+		} else if (info->func & XT_CONNMARK_DSCP) {
+			if (skb->protocol == htons(ETH_P_IP))
+				dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+			else if (skb->protocol == htons(ETH_P_IPV6))
+				dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+			else	/* protocol doesn't have diffserv */
+				break;
+
+			newmark = (newmark & ~info->ctmark) |
+				  (info->ctmask | (dscp << info->shift_bits));
+		}
 		if (ct->mark != newmark) {
 			ct->mark = newmark;
 			nf_conntrack_event_cache(IPCT_MARK, ct);
@@ -81,20 +94,36 @@ static unsigned int
 connmark_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_connmark_tginfo1 *info = par->targinfo;
-	const struct xt_connmark_tginfo2 info2 = {
+	const struct xt_connmark_tginfo3 info3 = {
 		.ctmark	= info->ctmark,
 		.ctmask	= info->ctmask,
 		.nfmask	= info->nfmask,
 		.mode	= info->mode,
+		.func	= XT_CONNMARK_VALUE
 	};
 
-	return connmark_tg_shift(skb, &info2);
+	return connmark_tg_shift(skb, &info3);
 }
 
 static unsigned int
 connmark_tg_v2(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_connmark_tginfo2 *info = par->targinfo;
+	const struct xt_connmark_tginfo3 info3 = {
+		.ctmark	= info->ctmark,
+		.ctmask	= info->ctmask,
+		.nfmask	= info->nfmask,
+		.mode	= info->mode,
+		.func	= XT_CONNMARK_VALUE
+	};
+
+	return connmark_tg_shift(skb, &info3);
+}
+
+static unsigned int
+connmark_tg_v3(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_connmark_tginfo3 *info = par->targinfo;
 
 	return connmark_tg_shift(skb, info);
 }
@@ -165,6 +194,16 @@ static struct xt_target connmark_tg_reg[] __read_mostly = {
 		.targetsize     = sizeof(struct xt_connmark_tginfo2),
 		.destroy        = connmark_tg_destroy,
 		.me             = THIS_MODULE,
+	},
+	{
+		.name           = "CONNMARK",
+		.revision       = 3,
+		.family         = NFPROTO_UNSPEC,
+		.checkentry     = connmark_tg_check,
+		.target         = connmark_tg_v3,
+		.targetsize     = sizeof(struct xt_connmark_tginfo3),
+		.destroy        = connmark_tg_destroy,
+		.me             = THIS_MODULE,
 	}
 };
 
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-12-11 13:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-24 14:23 [RFC PATCH 0/1] netfilter: xt_connmark: add savedscp-mark action Kevin 'ldir' Darbyshire-Bryant
2019-03-24 14:23 ` [PATCH 1/1] netfilter: connmark: introduce savedscp Kevin 'ldir' Darbyshire-Bryant
2019-04-08 22:39   ` Pablo Neira Ayuso
2019-04-08 23:16     ` Kevin 'ldir' Darbyshire-Bryant
2019-04-09 14:23       ` [RFC nf-next v2 0/2] xt_connmark: add savedscp-mark action Kevin 'ldir' Darbyshire-Bryant
2019-04-09 14:23         ` [RFC nf-next v2 1/2] netfilter: connmark: introduce savedscp Kevin 'ldir' Darbyshire-Bryant
2019-04-30 12:29           ` Pablo Neira Ayuso
2019-04-30 20:40             ` Kevin 'ldir' Darbyshire-Bryant
2019-04-09 14:23         ` [RFC nf-next 2/2] iptables: connmark - add savedscp option Kevin 'ldir' Darbyshire-Bryant
2019-12-03 16:06 ` [PATCH 0/1] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant
2019-12-03 16:06   ` [PATCH 1/1] " Kevin Darbyshire-Bryant
2019-12-09 23:57     ` Kevin 'ldir' Darbyshire-Bryant
2019-12-05  8:56   ` [PATCH 0/1] " Jeremy Sowden
2019-12-05  9:46     ` Kevin 'ldir' Darbyshire-Bryant
2019-12-06  8:54       ` Jeremy Sowden
2019-12-05 10:49     ` Florian Westphal
2019-12-05 22:00       ` Jeremy Sowden
2019-12-09 21:42   ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Jeremy Sowden
2019-12-09 21:42     ` [RFC PATCH nftables] Add "ct dscpmark" conntrack statement Jeremy Sowden
2019-12-09 22:47     ` [RFC PATCH nf-next] netfilter: conntrack: add support for storing DiffServ code-point as CT mark Florian Westphal
2019-12-09 23:23       ` Jeremy Sowden
2019-12-10  1:25         ` Florian Westphal
2019-12-10 11:01           ` Jeremy Sowden
2019-12-10 11:32             ` Florian Westphal
2019-12-10 19:52               ` Jeremy Sowden
2019-12-11 13:01   ` [PATCH nf-next v2] netfilter: connmark: introduce set-dscpmark Kevin Darbyshire-Bryant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).