All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action
@ 2019-03-19 19:49 Kevin 'ldir' Darbyshire-Bryant
  2019-03-19 19:49 ` [PATCH 1/1] " Kevin 'ldir' Darbyshire-Bryant
  2019-03-22 14:09 ` [RFC PATCH 1/1 v2] " Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 2 replies; 27+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-03-19 19:49 UTC (permalink / raw)
  To: netdev; +Cc: jiri, xiyou.wangcong, jhs, Kevin 'ldir' Darbyshire-Bryant

With nervousness and trepidation I'm submitting the attached RFC patch
for 'conndscp'.

Conndscp is a new tc filter action module.  It is designed to copy DSCPs
to conntrack marks and the reverse operation of conntrack mark contained
DSCPs to the diffserv field of suitable skbs.

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to implement
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

conndscp understands the following parameters:

mask - a 32 bit mask of at least 6 contiguous bits where conndscp will
place the DSCP in conntrack mark.  The DSCP is left-shifted by the
number of unset lower bits of the mask before storing into the mark
field.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by mask.  This represents a conditional operation flag - get
will only store the DSCP if the flag is unset.  set will only restore
the DSCP if the flag is set.  This is useful to implement a 'one shot'
iptables based classification where the 'complicated' iptables rules are
only run once to classify the connection on initial (egress) packet and
subsequent packets are all marked/restored with the same DSCP.  A mask
of zero disables the conditional behaviour.

mode - get/set/both - get stores the DSCP into the mark, set restores
the DSCP into the diffserv field from the mark, both 'gets' the mark and
then 'sets' it in that order.

optional parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>


A typical example of using conndscp to restore DSCP values for use with
a qdisc (e.g. CAKE) is shown below, using top 6 bits to store the DSCP
and the bottom bit of top byte as the state flag.

# egress qdisc
tc qdisc add dev eth0 cake bandwidth 20000kbit
# put an action on the egress interface to get DSCP to connmark->mark
# and to set DSCP from the stored connmark.
# this seems counter intuitive but it ensures once the mark is set that all
# subsequent egress packets have the same stored DSCP avoiding iptables rules
# to mark every packet, conndscp does it for us and then CAKE is happy using the
# DSCP
tc filter add dev eth0 protocol all prio 10 u32 match u32 0 0 flowid 1:1 action \
	conndscp mask 0xfc000000 statemask 0x01000000 mode both


#ingress qdisc via an ifb

tc qdisc add dev eth0 handle ffff: ingress
tc qdisc add dev ifb4eth0 cake badnwidth 80000kbit
ip link set ifb4eth0 up
# redirect all packets arriving on eth0 to ifb4eth0 and restore the DSCP from connmark
tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \
	match u32 0 0 flowid 1:1 action \
	conndscp mask 0xfc000000 statemask 0x01000000 mode set \
	mirred egress redirect dev ifb4eth0

#iptables rules using the statemask flag to only do it once

iptables -t mangle -N QOS_MARK_eth0

iptables -t mangle -A QOS_MARK_eth0 -m set --match-set Bulk4  dst -j DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
#add more rules similar to above as required


# send unmarked packets to the marking chain - conndscp will set the statemask bit
# if not already set.
iptables -t mangle -A POSTROUTING -o eth0 -m connmark --mark 0x00000000/0x01000000 -g QOS_MARK_eth0

conndscp (almost) shamelessly copies code from connmark and therefore
contains the same limitations.

I am not a full time programmer, conndscp represents something of the
order of a 2 week struggle, my C is awful, kernel & network knowledge
worse, though I like to think improving.  There are no doubt issues with
this patch/feature but I hope constructive feedback, quite possibly in
very short words for my simple brain, will knock it into shape.

Thanks for your time.

Kevin Darbyshire-Bryant (1):
  net: sched: Introduce conndscp action

 include/net/tc_act/tc_conndscp.h          |  19 ++
 include/uapi/linux/tc_act/tc_conndscp.h   |  33 +++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_conndscp.c                  | 333 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 6 files changed, 400 insertions(+)
 create mode 100644 include/net/tc_act/tc_conndscp.h
 create mode 100644 include/uapi/linux/tc_act/tc_conndscp.h
 create mode 100644 net/sched/act_conndscp.c

-- 
2.17.2 (Apple Git-113)


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-04-09 11:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 19:49 [RFC PATCH 0/1 net-next] net: sched: Introduce conndscp action Kevin 'ldir' Darbyshire-Bryant
2019-03-19 19:49 ` [PATCH 1/1] " Kevin 'ldir' Darbyshire-Bryant
2019-03-21 17:58   ` kbuild test robot
2019-03-22 14:09 ` [RFC PATCH 1/1 v2] " Kevin 'ldir' Darbyshire-Bryant
2019-03-22 17:39   ` Cong Wang
2019-03-22 18:26     ` Kevin 'ldir' Darbyshire-Bryant
2019-03-22 20:05       ` Cong Wang
2019-03-22 20:50         ` Kevin 'ldir' Darbyshire-Bryant
2019-03-22 21:31           ` Cong Wang
2019-03-22 22:06             ` Kevin 'ldir' Darbyshire-Bryant
2019-03-22 23:09               ` Cong Wang
2019-03-23 17:45                 ` Kevin 'ldir' Darbyshire-Bryant
2019-03-25 19:17                   ` Cong Wang
2019-03-27 20:32                     ` Kevin 'ldir' Darbyshire-Bryant
2019-03-29 20:45                       ` [RFC net-next 0/1] net: sched: Introduce conntrack action Kevin 'ldir' Darbyshire-Bryant
2019-03-29 20:45                         ` [RFC net-next 1/1] " Kevin 'ldir' Darbyshire-Bryant
2019-04-01 13:14                         ` [RFC net-next 0/1] " Marcelo Ricardo Leitner
2019-04-01 13:54                           ` Kevin 'ldir' Darbyshire-Bryant
2019-04-01 14:22                             ` Paul Blakey
2019-04-01 21:06                               ` Cong Wang
2019-04-02  9:24                                 ` Kevin 'ldir' Darbyshire-Bryant
2019-04-03  7:47                                   ` Paul Blakey
2019-04-03  8:23                                     ` Kevin 'ldir' Darbyshire-Bryant
2019-04-03 11:56                                       ` Paul Blakey
2019-04-03 12:35                                         ` Kevin 'ldir' Darbyshire-Bryant
2019-04-09 11:33                                           ` [RFC net-next 0/1] net: sched: Introduce act_ctinfo action Kevin 'ldir' Darbyshire-Bryant
2019-04-09 11:33                                             ` [RFC net-next 1/1] " Kevin 'ldir' Darbyshire-Bryant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.