linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 0/6] Allow excluding sw flow key from upcalls
@ 2022-11-22 14:03 Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 1/6] openvswitch: exclude kernel " Aaron Conole
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Aaron Conole @ 2022-11-22 14:03 UTC (permalink / raw)
  To: netdev
  Cc: Pravin B Shelar, Jakub Kicinski, David S. Miller, Paolo Abeni,
	Eric Dumazet, Thomas Graf, dev, Eelco Chaudron, Ilya Maximets,
	Shuah Khan, linux-kernel, linux-kselftest

Userspace applications can choose to completely ignore the kernel provided
flow key and instead regenerate a fresh key for processing in userspace.
Currently, userspace ovs-vswitchd does this in some instances (for example,
MISS upcall command).  This means that kernel spends time to copy and send
the flow key into userspace without any benefit to the system.

Introduce a way for userspace to tell kernel not to send the flow key.
This lets userspace and kernel space save time and memory pressure.

This patch set is quite a bit larger because it introduces the ability to
decode a sw flow key into a compatible datapath-string.  We use this as a
method of implementing a test to show that the feature is working by
decoding and dumping the flow (to make sure we capture the correct packet).

Aaron Conole (6):
  openvswitch: exclude kernel flow key from upcalls
  selftests: openvswitch: add interface support
  selftests: openvswitch: add flow dump support
  selftests: openvswitch: adjust datapath NL message
  selftests: openvswitch: add upcall support
  selftests: openvswitch: add exclude support for packet commands

 include/uapi/linux/openvswitch.h              |    6 +
 net/openvswitch/datapath.c                    |   26 +-
 net/openvswitch/datapath.h                    |    2 +
 .../selftests/net/openvswitch/openvswitch.sh  |  101 +-
 .../selftests/net/openvswitch/ovs-dpctl.py    | 1069 ++++++++++++++++-
 5 files changed, 1183 insertions(+), 21 deletions(-)

-- 
2.34.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC net-next 1/6] openvswitch: exclude kernel flow key from upcalls
  2022-11-22 14:03 [RFC net-next 0/6] Allow excluding sw flow key from upcalls Aaron Conole
@ 2022-11-22 14:03 ` Aaron Conole
  2022-11-23 21:22   ` Ilya Maximets
  2022-11-22 14:03 ` [RFC net-next 2/6] selftests: openvswitch: add interface support Aaron Conole
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Aaron Conole @ 2022-11-22 14:03 UTC (permalink / raw)
  To: netdev
  Cc: Pravin B Shelar, Jakub Kicinski, David S. Miller, Paolo Abeni,
	Eric Dumazet, Thomas Graf, dev, Eelco Chaudron, Ilya Maximets,
	Shuah Khan, linux-kernel, linux-kselftest

When processing upcall commands, two groups of data are available to
userspace for processing: the actual packet data and the kernel
sw flow key data.  The inclusion of the flow key allows the userspace
avoid running through the dissection again.

However, the userspace can choose to ignore the flow key data, as is
the case in some ovs-vswitchd upcall processing.  For these messages,
having the flow key data merely adds additional data to the upcall
pipeline without any actual gain.  Userspace simply throws the data
away anyway.

Introduce a new feature OVS_DP_F_EXCLUDE_UPCALL_FLOW_KEY which signals
that the userspace doesn't want upcalls included with specific class
of message (for example MISS messages).  The associated attribute
OVS_DP_ATTR_EXCLUDE_CMDS tells which specific commands to omit via a
bitmask.

A test will be added to showcase using the feature.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 include/uapi/linux/openvswitch.h |  6 ++++++
 net/openvswitch/datapath.c       | 26 ++++++++++++++++++++++----
 net/openvswitch/datapath.h       |  2 ++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 94066f87e9ee..238e62ecba46 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -95,6 +95,9 @@ enum ovs_datapath_attr {
 				     * per-cpu dispatch mode
 				     */
 	OVS_DP_ATTR_IFINDEX,
+	OVS_DP_ATTR_EXCLUDE_CMDS,	/* u32 mask of OVS_PACKET_CMDs for
+					 * omitting FLOW_KEY attribute
+					 */
 	__OVS_DP_ATTR_MAX
 };
 
@@ -138,6 +141,9 @@ struct ovs_vport_stats {
 /* Allow per-cpu dispatch of upcalls */
 #define OVS_DP_F_DISPATCH_UPCALL_PER_CPU	(1 << 3)
 
+/* Drop Flow key data from upcall packet cmds */
+#define OVS_DP_F_EXCLUDE_UPCALL_FLOW_KEY	(1 << 4)
+
 /* Fixed logical ports. */
 #define OVSP_LOCAL      ((__u32)0)
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 861dfb8daf4a..6afde7de492c 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -470,9 +470,13 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 	}
 	upcall->dp_ifindex = dp_ifindex;
 
-	err = ovs_nla_put_key(key, key, OVS_PACKET_ATTR_KEY, false, user_skb);
-	if (err)
-		goto out;
+	if (!(dp->user_features & OVS_DP_F_EXCLUDE_UPCALL_FLOW_KEY) ||
+	    !(dp->upcall_exclude_cmds & (1U << upcall_info->cmd))) {
+		err = ovs_nla_put_key(key, key, OVS_PACKET_ATTR_KEY, false,
+				      user_skb);
+		if (err)
+			goto out;
+	}
 
 	if (upcall_info->userdata)
 		__nla_put(user_skb, OVS_PACKET_ATTR_USERDATA,
@@ -1526,6 +1530,7 @@ static size_t ovs_dp_cmd_msg_size(void)
 	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_USER_FEATURES */
 	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_MASKS_CACHE_SIZE */
 	msgsize += nla_total_size(sizeof(u32) * nr_cpu_ids); /* OVS_DP_ATTR_PER_CPU_PIDS */
+	msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_EXCLUDE_CMDS */
 
 	return msgsize;
 }
@@ -1574,6 +1579,10 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 			goto nla_put_failure;
 	}
 
+	if (nla_put_u32(skb, OVS_DP_ATTR_EXCLUDE_CMDS,
+			dp->upcall_exclude_cmds))
+		goto nla_put_failure;
+
 	genlmsg_end(skb, ovs_header);
 	return 0;
 
@@ -1684,7 +1693,8 @@ static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
 		if (user_features & ~(OVS_DP_F_VPORT_PIDS |
 				      OVS_DP_F_UNALIGNED |
 				      OVS_DP_F_TC_RECIRC_SHARING |
-				      OVS_DP_F_DISPATCH_UPCALL_PER_CPU))
+				      OVS_DP_F_DISPATCH_UPCALL_PER_CPU |
+				      OVS_DP_F_EXCLUDE_UPCALL_FLOW_KEY))
 			return -EOPNOTSUPP;
 
 #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
@@ -1705,6 +1715,14 @@ static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
 
 	dp->user_features = user_features;
 
+	if (dp->user_features & OVS_DP_F_EXCLUDE_UPCALL_FLOW_KEY) {
+		if (!a[OVS_DP_ATTR_EXCLUDE_CMDS])
+			return -EINVAL;
+
+		dp->upcall_exclude_cmds =
+			nla_get_u32(a[OVS_DP_ATTR_EXCLUDE_CMDS]);
+	}
+
 	if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU &&
 	    a[OVS_DP_ATTR_PER_CPU_PIDS]) {
 		/* Upcall Netlink Port IDs have been updated */
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 0cd29971a907..3c951e25509e 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -101,6 +101,8 @@ struct datapath {
 
 	u32 max_headroom;
 
+	u32 upcall_exclude_cmds;
+
 	/* Switch meters. */
 	struct dp_meter_table meter_tbl;
 
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC net-next 2/6] selftests: openvswitch: add interface support
  2022-11-22 14:03 [RFC net-next 0/6] Allow excluding sw flow key from upcalls Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 1/6] openvswitch: exclude kernel " Aaron Conole
@ 2022-11-22 14:03 ` Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 3/6] selftests: openvswitch: add flow dump support Aaron Conole
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Aaron Conole @ 2022-11-22 14:03 UTC (permalink / raw)
  To: netdev
  Cc: Pravin B Shelar, Jakub Kicinski, David S. Miller, Paolo Abeni,
	Eric Dumazet, Thomas Graf, dev, Eelco Chaudron, Ilya Maximets,
	Shuah Khan, linux-kernel, linux-kselftest

Includes an associated test to generate netns and connect
interfaces, with the option to include tcp tracing.

This will be used in the future when flow support is added
for additional test cases.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 .../selftests/net/openvswitch/openvswitch.sh  |  54 ++++++++
 .../selftests/net/openvswitch/ovs-dpctl.py    | 120 ++++++++++++++++--
 2 files changed, 165 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 7ce46700a3ae..ce14913150fe 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -70,6 +70,49 @@ ovs_add_dp () {
 	on_exit "ovs_sbx $sbxname python3 $ovs_base/ovs-dpctl.py del-dp $1;"
 }
 
+ovs_add_if () {
+	info "Adding IF to DP: br:$2 if:$3"
+	ovs_sbx "$1" python3 $ovs_base/ovs-dpctl.py add-if "$2" "$3" || return 1
+}
+
+ovs_del_if () {
+	info "Deleting IF from DP: br:$2 if:$3"
+	ovs_sbx "$1" python3 $ovs_base/ovs-dpctl.py del-if "$2" "$3" || return 1
+}
+
+ovs_netns_spawn_daemon() {
+	sbx=$1
+	shift
+	netns=$1
+	shift
+	info "spawning cmd: $*"
+	ip netns exec $netns $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
+	pid=$!
+	ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null"
+}
+
+ovs_add_netns_and_veths () {
+	info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
+
+	ovs_sbx "$1" ip netns add "$3" || return 1
+	on_exit "ovs_sbx $1 ip netns del $3"
+	ovs_sbx "$1" ip link add "$4" type veth peer name "$5" || return 1
+	on_exit "ovs_sbx $1 ip link del $4 >/dev/null 2>&1"
+	ovs_sbx "$1" ip link set "$4" up || return 1
+	ovs_sbx "$1" ip link set "$5" netns "$3" || return 1
+	ovs_sbx "$1" ip netns exec "$3" ip link set "$5" up || return 1
+
+	if [ "$6" != "" ]; then
+		ovs_sbx "$1" ip netns exec "$4" ip addr add "$6" dev "$5" \
+		    || return 1
+	fi
+	ovs_add_if "$1" "$2" "$4" || return 1
+	[ $TRACING -eq 1 ] && ovs_netns_spawn_daemon "$1" "$3" \
+			tcpdump -i any -s 65535 >> ${ovs_dir}/tcpdump_"$3".log
+
+	return 0
+}
+
 usage() {
 	echo
 	echo "$0 [OPTIONS] [TEST]..."
@@ -101,6 +144,17 @@ test_netlink_checks () {
 		return 1
 	fi
 
+	ovs_add_netns_and_veths "test_netlink_checks" nv0 left left0 l0 || \
+	    return 1
+	ovs_add_netns_and_veths "test_netlink_checks" nv0 right right0 r0 || \
+	    return 1
+	[ $(python3 $ovs_base/ovs-dpctl.py show nv0 | grep port | \
+	    wc -l) == 3 ] || \
+	      return 1
+	ovs_del_if "test_netlink_checks" nv0 right0 || return 1
+	[ $(python3 $ovs_base/ovs-dpctl.py show nv0 | grep port | \
+	    wc -l) == 2 ] || \
+	      return 1
 	return 0
 }
 
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 3243c90d449e..338e9b2cd660 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -170,6 +170,13 @@ class OvsDatapath(GenericNetlinkSocket):
 
 
 class OvsVport(GenericNetlinkSocket):
+
+    OVS_VPORT_TYPE_NETDEV = 1
+    OVS_VPORT_TYPE_INTERNAL = 2
+    OVS_VPORT_TYPE_GRE = 3
+    OVS_VPORT_TYPE_VXLAN = 4
+    OVS_VPORT_TYPE_GENEVE = 5
+
     class ovs_vport_msg(ovs_dp_msg):
         nla_map = (
             ("OVS_VPORT_ATTR_UNSPEC", "none"),
@@ -197,17 +204,30 @@ class OvsVport(GenericNetlinkSocket):
             )
 
     def type_to_str(vport_type):
-        if vport_type == 1:
+        if vport_type == OvsVport.OVS_VPORT_TYPE_NETDEV:
             return "netdev"
-        elif vport_type == 2:
+        elif vport_type == OvsVport.OVS_VPORT_TYPE_INTERNAL:
             return "internal"
-        elif vport_type == 3:
+        elif vport_type == OvsVport.OVS_VPORT_TYPE_GRE:
             return "gre"
-        elif vport_type == 4:
+        elif vport_type == OvsVport.OVS_VPORT_TYPE_VXLAN:
             return "vxlan"
-        elif vport_type == 5:
+        elif vport_type == OvsVport.OVS_VPORT_TYPE_GENEVE:
             return "geneve"
-        return "unknown:%d" % vport_type
+        raise ValueError("Unknown vport type:%d" % vport_type)
+
+    def str_to_type(vport_type):
+        if vport_type == "netdev":
+            return OvsVport.OVS_VPORT_TYPE_NETDEV
+        elif vport_type == "internal":
+            return OvsVport.OVS_VPORT_TYPE_INTERNAL
+        elif vport_type == "gre":
+            return OvsVport.OVS_VPORT_TYPE_INTERNAL
+        elif vport_type == "vxlan":
+            return OvsVport.OVS_VPORT_TYPE_VXLAN
+        elif vport_type == "geneve":
+            return OvsVport.OVS_VPORT_TYPE_GENEVE
+        raise ValueError("Unknown vport type: '%s'" % vport_type)
 
     def __init__(self):
         GenericNetlinkSocket.__init__(self)
@@ -238,8 +258,51 @@ class OvsVport(GenericNetlinkSocket):
                 raise ne
         return reply
 
+    def attach(self, dpindex, vport_ifname, ptype):
+        msg = OvsVport.ovs_vport_msg()
+
+        msg["cmd"] = OVS_VPORT_CMD_NEW
+        msg["version"] = OVS_DATAPATH_VERSION
+        msg["reserved"] = 0
+        msg["dpifindex"] = dpindex
+        port_type = OvsVport.str_to_type(ptype)
+
+        msg["attrs"].append(["OVS_VPORT_ATTR_TYPE", port_type])
+        msg["attrs"].append(["OVS_VPORT_ATTR_NAME", vport_ifname])
+        msg["attrs"].append(["OVS_VPORT_ATTR_UPCALL_PID", [self.pid]])
+
+        try:
+            reply = self.nlm_request(
+                msg, msg_type=self.prid, msg_flags=NLM_F_REQUEST | NLM_F_ACK
+            )
+            reply = reply[0]
+        except NetlinkError as ne:
+            raise ne
+        return reply
+
+    def detach(self, dpindex, vport_ifname):
+        msg = OvsVport.ovs_vport_msg()
+
+        msg["cmd"] = OVS_VPORT_CMD_DEL
+        msg["version"] = OVS_DATAPATH_VERSION
+        msg["reserved"] = 0
+        msg["dpifindex"] = dpindex
+        msg["attrs"].append(["OVS_VPORT_ATTR_NAME", vport_ifname])
 
-def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB()):
+        try:
+            reply = self.nlm_request(
+                msg, msg_type=self.prid, msg_flags=NLM_F_REQUEST | NLM_F_ACK
+            )
+            reply = reply[0]
+        except NetlinkError as ne:
+            if ne.code == errno.ENODEV:
+                reply = None
+            else:
+                raise ne
+        return reply
+
+
+def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB(), vpl=OvsVport()):
     dp_name = dp_lookup_rep.get_attr("OVS_DP_ATTR_NAME")
     base_stats = dp_lookup_rep.get_attr("OVS_DP_ATTR_STATS")
     megaflow_stats = dp_lookup_rep.get_attr("OVS_DP_ATTR_MEGAFLOW_STATS")
@@ -265,7 +328,6 @@ def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB()):
         print("  features: 0x%X" % user_features)
 
     # port print out
-    vpl = OvsVport()
     for iface in ndb.interfaces:
         rep = vpl.info(iface.ifname, ifindex)
         if rep is not None:
@@ -312,9 +374,25 @@ def main(argv):
     deldpcmd = subparsers.add_parser("del-dp")
     deldpcmd.add_argument("deldp", help="Datapath Name")
 
+    addifcmd = subparsers.add_parser("add-if")
+    addifcmd.add_argument("dpname", help="Datapath Name")
+    addifcmd.add_argument("addif", help="Interface name for adding")
+    addifcmd.add_argument(
+        "-t",
+        "--ptype",
+        type=str,
+        default="netdev",
+        choices=["netdev", "internal"],
+        help="Interface type (default netdev)",
+    )
+    delifcmd = subparsers.add_parser("del-if")
+    delifcmd.add_argument("dpname", help="Datapath Name")
+    delifcmd.add_argument("delif", help="Interface name for adding")
+
     args = parser.parse_args()
 
     ovsdp = OvsDatapath()
+    ovsvp = OvsVport()
     ndb = NDB()
 
     if hasattr(args, "showdp"):
@@ -328,7 +406,7 @@ def main(argv):
 
             if rep is not None:
                 found = True
-                print_ovsdp_full(rep, iface.index, ndb)
+                print_ovsdp_full(rep, iface.index, ndb, ovsvp)
 
         if not found:
             msg = "No DP found"
@@ -343,6 +421,30 @@ def main(argv):
             print("DP '%s' added" % args.adddp)
     elif hasattr(args, "deldp"):
         ovsdp.destroy(args.deldp)
+    elif hasattr(args, "addif"):
+        rep = ovsdp.info(args.dpname, 0)
+        if rep is None:
+            print("DP '%s' not found." % args.dpname)
+            return 1
+        rep = ovsvp.attach(rep["dpifindex"], args.addif, args.ptype)
+        msg = "vport '%s'" % args.addif
+        if rep and rep["error"] == 0:
+            msg += " added."
+        else:
+            msg += " failed to add."
+        print(msg)
+    elif hasattr(args, "delif"):
+        rep = ovsdp.info(args.dpname, 0)
+        if rep is None:
+            print("DP '%s' not found." % args.dpname)
+            return 1
+        rep = ovsvp.detach(rep["dpifindex"], args.delif)
+        msg = "vport '%s'" % args.delif
+        if rep and rep["error"] == 0:
+            msg += " removed."
+        else:
+            msg += " failed to remove."
+        print(msg)
 
     return 0
 
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC net-next 3/6] selftests: openvswitch: add flow dump support
  2022-11-22 14:03 [RFC net-next 0/6] Allow excluding sw flow key from upcalls Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 1/6] openvswitch: exclude kernel " Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 2/6] selftests: openvswitch: add interface support Aaron Conole
@ 2022-11-22 14:03 ` Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 4/6] selftests: openvswitch: adjust datapath NL message Aaron Conole
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Aaron Conole @ 2022-11-22 14:03 UTC (permalink / raw)
  To: netdev
  Cc: Pravin B Shelar, Jakub Kicinski, David S. Miller, Paolo Abeni,
	Eric Dumazet, Thomas Graf, dev, Eelco Chaudron, Ilya Maximets,
	Shuah Khan, linux-kernel, linux-kselftest

Add a basic set of fields to print in a 'dpflow' format.  This will be
used by future commits to check for flow fields after parsing, as
well as verifying the flow fields pushed into the kernel from
userspace.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 .../selftests/net/openvswitch/ovs-dpctl.py    | 781 +++++++++++++++++-
 1 file changed, 780 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 338e9b2cd660..d654fe1fe4e6 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -6,12 +6,16 @@
 
 import argparse
 import errno
+import ipaddress
+import logging
 import sys
+import time
 
 try:
     from pyroute2 import NDB
 
     from pyroute2.netlink import NLM_F_ACK
+    from pyroute2.netlink import NLM_F_DUMP
     from pyroute2.netlink import NLM_F_REQUEST
     from pyroute2.netlink import genlmsg
     from pyroute2.netlink import nla
@@ -40,6 +44,11 @@ OVS_VPORT_CMD_DEL = 2
 OVS_VPORT_CMD_GET = 3
 OVS_VPORT_CMD_SET = 4
 
+OVS_FLOW_CMD_NEW = 1
+OVS_FLOW_CMD_DEL = 2
+OVS_FLOW_CMD_GET = 3
+OVS_FLOW_CMD_SET = 4
+
 
 class ovs_dp_msg(genlmsg):
     # include the OVS version
@@ -302,6 +311,760 @@ class OvsVport(GenericNetlinkSocket):
         return reply
 
 
+def macstr(mac):
+    outstr = ":".join(["%02X" % i for i in mac])
+    return outstr
+
+
+class OvsFlow(GenericNetlinkSocket):
+    class ovs_flow_msg(ovs_dp_msg):
+        nla_map = (
+            ("OVS_FLOW_ATTR_UNSPEC", "none"),
+            ("OVS_FLOW_ATTR_KEY", "nested"),
+            ("OVS_FLOW_ATTR_ACTIONS", "nested"),
+            ("OVS_FLOW_ATTR_STATS", "flowstats"),
+            ("OVS_FLOW_ATTR_TCP_FLAGS", "uint8"),
+            ("OVS_FLOW_ATTR_USED", "uint64"),
+            ("OVS_FLOW_ATTR_CLEAR", "none"),
+            ("OVS_FLOW_ATTR_MASK", "nested"),
+            ("OVS_FLOW_ATTR_PROBE", "none"),
+            ("OVS_FLOW_ATTR_UFID", "array(uint32)"),
+            ("OVS_FLOW_ATTR_UFID_FLAGS", "uint32"),
+        )
+
+        class nestedacts(nla):
+            __slots__ = ()
+
+            nla_map = ()
+
+        class flowstats(nla):
+            fields = (
+                ("packets", "=Q"),
+                ("bytes", "=Q"),
+            )
+
+        class nestedflow(nla):
+            nla_map = (
+                ("OVS_KEY_ATTR_UNSPEC", "none"),
+                ("OVS_KEY_ATTR_ENCAP", "none"),
+                ("OVS_KEY_ATTR_PRIORITY", "uint32"),
+                ("OVS_KEY_ATTR_IN_PORT", "uint32"),
+                ("OVS_KEY_ATTR_ETHERNET", "ethaddr"),
+                ("OVS_KEY_ATTR_VLAN", "uint16"),
+                ("OVS_KEY_ATTR_ETHERTYPE", "be16"),
+                ("OVS_KEY_ATTR_IPV4", "ovs_key_ipv4"),
+                ("OVS_KEY_ATTR_IPV6", "ovs_key_ipv6"),
+                ("OVS_KEY_ATTR_TCP", "ovs_key_tcp"),
+                ("OVS_KEY_ATTR_UDP", "ovs_key_udp"),
+                ("OVS_KEY_ATTR_ICMP", "ovs_key_icmp"),
+                ("OVS_KEY_ATTR_ICMPV6", "ovs_key_icmpv6"),
+                ("OVS_KEY_ATTR_ARP", "ovs_key_arp"),
+                ("OVS_KEY_ATTR_ND", "ovs_key_nd"),
+                ("OVS_KEY_ATTR_SKB_MARK", "uint32"),
+                ("OVS_KEY_ATTR_TUNNEL", "none"),
+                ("OVS_KEY_ATTR_SCTP", "ovs_key_sctp"),
+                ("OVS_KEY_ATTR_TCP_FLAGS", "uint16"),
+                ("OVS_KEY_ATTR_DP_HASH", "uint32"),
+                ("OVS_KEY_ATTR_RECIRC_ID", "uint32"),
+                ("OVS_KEY_ATTR_MPLS", "array(ovs_key_mpls)"),
+                ("OVS_KEY_ATTR_CT_STATE", "uint32"),
+                ("OVS_KEY_ATTR_CT_ZONE", "uint16"),
+                ("OVS_KEY_ATTR_CT_MARK", "uint32"),
+                ("OVS_KEY_ATTR_CT_LABELS", "none"),
+                ("OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4", "ovs_key_ct_tuple_ipv4"),
+                ("OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6", "ovs_key_ct_tuple_ipv6"),
+                ("OVS_KEY_ATTR_NSH", "none"),
+                ("OVS_KEY_ATTR_PACKET_TYPE", "none"),
+                ("OVS_KEY_ATTR_ND_EXTENSIONS", "none"),
+                ("OVS_KEY_ATTR_TUNNEL_INFO", "none"),
+                ("OVS_KEY_ATTR_IPV6_EXTENSIONS", "none"),
+            )
+
+            class ovs_key_proto(nla):
+                fields = (
+                    ("src", "!H"),
+                    ("dst", "!H"),
+                )
+
+                fields_map = (
+                    ("src", "src", "%d", int),
+                    ("dst", "dst", "%d", int),
+                )
+
+                def __init__(
+                    self,
+                    protostr,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    self.proto_str = protostr
+                    nla.__init__(
+                        self,
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+                def dpstr(self, masked=None, more=False):
+                    outstr = self.proto_str + "("
+                    first = False
+                    for f in self.fields_map:
+                        if first:
+                            outstr += ","
+                        if masked is None:
+                            outstr += "%s=" % f[0]
+                            if isinstance(f[2], str):
+                                outstr += f[2] % self[f[1]]
+                            else:
+                                outstr += f[2](self[f[1]])
+                            first = True
+                        elif more or f[3](masked[f[1]]) != 0:
+                            outstr += "%s=" % f[0]
+                            if isinstance(f[2], str):
+                                outstr += f[2] % self[f[1]]
+                            else:
+                                outstr += f[2](self[f[1]])
+                            outstr += "/"
+                            if isinstance(f[2], str):
+                                outstr += f[2] % masked[f[1]]
+                            else:
+                                outstr += f[2](masked[f[1]])
+                            first = True
+                    outstr += ")"
+                    return outstr
+
+            class ethaddr(ovs_key_proto):
+                fields = (
+                    ("src", "!6s"),
+                    ("dst", "!6s"),
+                )
+
+                fields_map = (
+                    ("src", "src", macstr, lambda x: int.from_bytes(x, "big")),
+                    ("dst", "dst", macstr, lambda x: int.from_bytes(x, "big")),
+                )
+
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "eth",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_ipv4(ovs_key_proto):
+                fields = (
+                    ("src", "!I"),
+                    ("dst", "!I"),
+                    ("proto", "B"),
+                    ("tos", "B"),
+                    ("ttl", "B"),
+                    ("frag", "B"),
+                )
+
+                fields_map = (
+                    (
+                        "src",
+                        "src",
+                        lambda x: str(ipaddress.IPv4Address(x)),
+                        int,
+                    ),
+                    (
+                        "dst",
+                        "dst",
+                        lambda x: str(ipaddress.IPv4Address(x)),
+                        int,
+                    ),
+                    ("proto", "proto", "%d", int),
+                    ("tos", "tos", "%d", int),
+                    ("ttl", "ttl", "%d", int),
+                    ("frag", "frag", "%d", int),
+                )
+
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "ipv4",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_ipv6(ovs_key_proto):
+                fields = (
+                    ("src", "!16s"),
+                    ("dst", "!16s"),
+                    ("label", "!I"),
+                    ("proto", "B"),
+                    ("tclass", "B"),
+                    ("hlimit", "B"),
+                    ("frag", "B"),
+                )
+
+                fields_map = (
+                    (
+                        "src",
+                        "src",
+                        lambda x: str(ipaddress.IPv6Address(x)),
+                        lambda x: int.from_bytes(x, "big"),
+                    ),
+                    (
+                        "dst",
+                        "dst",
+                        lambda x: str(ipaddress.IPv6Address(x)),
+                        lambda x: int.from_bytes(x, "big"),
+                    ),
+                    ("label", "label", "%d", int),
+                    ("proto", "proto", "%d", int),
+                    ("tclass", "tclass", "%d", int),
+                    ("hlimit", "hlimit", "%d", int),
+                    ("frag", "frag", "%d", int),
+                )
+
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "ipv6",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_tcp(ovs_key_proto):
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "tcp",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_udp(ovs_key_proto):
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "udp",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_sctp(ovs_key_proto):
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "sctp",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_icmp(ovs_key_proto):
+                fields = (
+                    ("type", "B"),
+                    ("code", "B"),
+                )
+
+                fields_map = (
+                    ("type", "type", "%d", int),
+                    ("code", "code", "%d", int),
+                )
+
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "icmp",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_icmpv6(ovs_key_icmp):
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "icmpv6",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_arp(ovs_key_proto):
+                fields = (
+                    ("sip", "!I"),
+                    ("tip", "!I"),
+                    ("op", "!H"),
+                    ("sha", "!6s"),
+                    ("tha", "!6s"),
+                    ("pad", "xx"),
+                )
+
+                fields_map = (
+                    (
+                        "sip",
+                        "sip",
+                        lambda x: str(ipaddress.IPv4Address(x)),
+                        int,
+                    ),
+                    (
+                        "tip",
+                        "tip",
+                        lambda x: str(ipaddress.IPv4Address(x)),
+                        int,
+                    ),
+                    ("op", "op", "%d", int),
+                    ("sha", "sha", macstr, lambda x: int.from_bytes(x, "big")),
+                    ("tha", "tha", macstr, lambda x: int.from_bytes(x, "big")),
+                )
+
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "arp",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_nd(ovs_key_proto):
+                fields = (
+                    ("target", "!16s"),
+                    ("sll", "!6s"),
+                    ("tll", "!6s"),
+                )
+
+                fields_map = (
+                    (
+                        "target",
+                        "target",
+                        lambda x: str(ipaddress.IPv6Address(x)),
+                        lambda x: int.from_bytes(x, "big"),
+                    ),
+                    ("sll", "sll", macstr, lambda x: int.from_bytes(x, "big")),
+                    ("tll", "tll", macstr, lambda x: int.from_bytes(x, "big")),
+                )
+
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "nd",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_ct_tuple_ipv4(ovs_key_proto):
+                fields = (
+                    ("src", "!I"),
+                    ("dst", "!I"),
+                    ("tp_src", "!H"),
+                    ("tp_dst", "!H"),
+                    ("proto", "B"),
+                )
+
+                fields_map = (
+                    (
+                        "src",
+                        "src",
+                        lambda x: str(ipaddress.IPv4Address(x)),
+                        int,
+                    ),
+                    (
+                        "dst",
+                        "dst",
+                        lambda x: str(ipaddress.IPv6Address(x)),
+                        int,
+                    ),
+                    ("tp_src", "tp_src", "%d", int),
+                    ("tp_dst", "tp_dst", "%d", int),
+                    ("proto", "proto", "%d", int),
+                )
+
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "ct_tuple4",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_ct_tuple_ipv6(nla):
+                fields = (
+                    ("src", "!16s"),
+                    ("dst", "!16s"),
+                    ("tp_src", "!H"),
+                    ("tp_dst", "!H"),
+                    ("proto", "B"),
+                )
+
+                fields_map = (
+                    (
+                        "src",
+                        "src",
+                        lambda x: str(ipaddress.IPv6Address(x)),
+                        lambda x: int.from_bytes(x, "big"),
+                    ),
+                    (
+                        "dst",
+                        "dst",
+                        lambda x: str(ipaddress.IPv6Address(x)),
+                        lambda x: int.from_bytes(x, "big"),
+                    ),
+                    ("tp_src", "tp_src", "%d", int),
+                    ("tp_dst", "tp_dst", "%d", int),
+                    ("proto", "proto", "%d", int),
+                )
+
+                def __init__(
+                    self,
+                    data=None,
+                    offset=None,
+                    parent=None,
+                    length=None,
+                    init=None,
+                ):
+                    OvsFlow.ovs_flow_msg.nestedflow.ovs_key_proto.__init__(
+                        self,
+                        "ct_tuple6",
+                        data=data,
+                        offset=offset,
+                        parent=parent,
+                        length=length,
+                        init=init,
+                    )
+
+            class ovs_key_mpls(nla):
+                fields = (("lse", ">I"),)
+
+            def dpstr(self, mask=None, more=False):
+                print_str = ""
+
+                for field in (
+                    (
+                        "OVS_KEY_ATTR_PRIORITY",
+                        "skb_priority",
+                        "%d",
+                        lambda x: False,
+                        True,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_SKB_MARK",
+                        "skb_mark",
+                        "%d",
+                        lambda x: False,
+                        True,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_RECIRC_ID",
+                        "recirc_id",
+                        "0x%08X",
+                        lambda x: False,
+                        True,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_DP_HASH",
+                        "dp_hash",
+                        "0x%08X",
+                        lambda x: False,
+                        True,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_CT_STATE",
+                        "ct_state",
+                        "0x%04x",
+                        lambda x: False,
+                        True,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_CT_ZONE",
+                        "ct_zone",
+                        "0x%04x",
+                        lambda x: False,
+                        True,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_CT_MARK",
+                        "ct_mark",
+                        "0x%08x",
+                        lambda x: False,
+                        True,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4",
+                        None,
+                        None,
+                        False,
+                        False,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6",
+                        None,
+                        None,
+                        False,
+                        False,
+                    ),
+                    (
+                        "OVS_KEY_ATTR_IN_PORT",
+                        "in_port",
+                        "%d",
+                        lambda x: True,
+                        True,
+                    ),
+                    ("OVS_KEY_ATTR_ETHERNET", None, None, False, False),
+                    (
+                        "OVS_KEY_ATTR_ETHERTYPE",
+                        "eth_type",
+                        "0x%04x",
+                        lambda x: int(x) == 0xFFFF,
+                        True,
+                    ),
+                    ("OVS_KEY_ATTR_IPV4", None, None, False, False),
+                    ("OVS_KEY_ATTR_IPV6", None, None, False, False),
+                    ("OVS_KEY_ATTR_ARP", None, None, False, False),
+                    ("OVS_KEY_ATTR_TCP", None, None, False, False),
+                    (
+                        "OVS_KEY_ATTR_TCP_FLAGS",
+                        "tcp_flags",
+                        "0x%04x",
+                        lambda x: False,
+                        True,
+                    ),
+                    ("OVS_KEY_ATTR_UDP", None, None, False, False),
+                    ("OVS_KEY_ATTR_SCTP", None, None, False, False),
+                    ("OVS_KEY_ATTR_ICMP", None, None, False, False),
+                    ("OVS_KEY_ATTR_ICMPV6", None, None, False, False),
+                    ("OVS_KEY_ATTR_ND", None, None, False, False),
+                ):
+                    v = self.get_attr(field[0])
+                    if v is not None:
+                        m = None if mask is None else mask.get_attr(field[0])
+                        if field[4] is False:
+                            print_str += v.dpstr(m, more)
+                            print_str += ","
+                        else:
+                            if m is None or field[3](m):
+                                print_str += field[1] + "("
+                                print_str += field[2] % v
+                                print_str += "),"
+                            elif more or m != 0:
+                                print_str += field[1] + "("
+                                print_str += (
+                                    (field[2] % v) + "/" + (field[2] % m)
+                                )
+                                print_str += "),"
+
+                return print_str
+
+        def dpstr(self, more=False):
+            ufid = self.get_attr("OVS_FLOW_ATTR_UFID")
+            ufid_str = ""
+            if ufid is not None:
+                ufid_str = (
+                    "ufid:{:08x}-{:04x}-{:04x}-{:04x}-{:04x}{:08x}".format(
+                        ufid[0],
+                        ufid[1] >> 16,
+                        ufid[1] & 0xFFFF,
+                        ufid[2] >> 16,
+                        ufid[2] & 0,
+                        ufid[3],
+                    )
+                )
+
+            key_field = self.get_attr("OVS_FLOW_ATTR_KEY")
+            keymsg = None
+            if key_field is not None:
+                keymsg = OvsFlow.ovs_flow_msg.nestedflow(data=key_field)
+                keymsg.decode()
+
+            mask_field = self.get_attr("OVS_FLOW_ATTR_MASK")
+            maskmsg = None
+            if mask_field is not None:
+                maskmsg = OvsFlow.ovs_flow_msg.nestedflow(data=mask_field)
+                maskmsg.decode()
+
+            acts_field = self.get_attr("OVS_FLOW_ATTR_ACTIONS")
+            actsmsg = None
+            if acts_field is not None:
+                actsmsg = OvsFlow.ovs_flow_msg.nestedacts(data=acts_field)
+                actsmsg.decode()
+
+            print_str = ""
+
+            if more:
+                print_str += ufid_str + ","
+
+            if keymsg is not None:
+                print_str += keymsg.dpstr(maskmsg, more)
+
+            stats = self.get_attr("OVS_FLOW_ATTR_STATS")
+            if stats is None:
+                print_str += " packets:0, bytes:0,"
+            else:
+                print_str += " packets:%d, bytes:%d," % (
+                    stats["packets"],
+                    stats["bytes"],
+                )
+
+            used = self.get_attr("OVS_FLOW_ATTR_USED")
+            print_str += " used:"
+            if used is None:
+                print_str += "never,"
+            else:
+                used_time = int(used)
+                cur_time_sec = time.clock_gettime(time.CLOCK_MONOTONIC)
+                used_time = (cur_time_sec * 1000 * 1000) - used_time
+                print_str += "{}s,".format(used_time / 1000)
+
+            print_str += " actions:"
+            if actsmsg is None or "attrs" not in actsmsg:
+                print_str += "drop"
+
+            return print_str
+
+    def __init__(self):
+        GenericNetlinkSocket.__init__(self)
+        self.bind(OVS_FLOW_FAMILY, OvsFlow.ovs_flow_msg)
+
+    def dump(self, dpifindex, flowspec=None):
+        """
+        Returns a list of messages containing flows.
+
+        dpifindex should be a valid datapath obtained by calling
+        into the OvsDatapath lookup
+
+        flowpsec is a string which represents a flow in the dpctl
+        format.
+        """
+        msg = OvsFlow.ovs_flow_msg()
+
+        msg["cmd"] = OVS_FLOW_CMD_GET
+        msg["version"] = OVS_DATAPATH_VERSION
+        msg["reserved"] = 0
+        msg["dpifindex"] = dpifindex
+
+        msg_flags = NLM_F_REQUEST | NLM_F_ACK
+        if flowspec is None:
+            msg_flags |= NLM_F_DUMP
+        rep = None
+
+        try:
+            rep = self.nlm_request(
+                msg,
+                msg_type=self.prid,
+                msg_flags=msg_flags,
+            )
+        except NetlinkError as ne:
+            raise ne
+        return rep
+
+
 def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB(), vpl=OvsVport()):
     dp_name = dp_lookup_rep.get_attr("OVS_DP_ATTR_NAME")
     base_stats = dp_lookup_rep.get_attr("OVS_DP_ATTR_STATS")
@@ -348,6 +1111,7 @@ def main(argv):
         "--verbose",
         action="count",
         help="Increment 'verbose' output counter.",
+        default=0,
     )
     subparsers = parser.add_subparsers()
 
@@ -389,10 +1153,18 @@ def main(argv):
     delifcmd.add_argument("dpname", help="Datapath Name")
     delifcmd.add_argument("delif", help="Interface name for adding")
 
+    dumpflcmd = subparsers.add_parser("dump-flows")
+    dumpflcmd.add_argument("dumpdp", help="Datapath Name")
+
     args = parser.parse_args()
 
+    if args.verbose > 0:
+        if args.verbose > 1:
+            logging.basicConfig(level=logging.DEBUG)
+
     ovsdp = OvsDatapath()
     ovsvp = OvsVport()
+    ovsflow = OvsFlow()
     ndb = NDB()
 
     if hasattr(args, "showdp"):
@@ -445,7 +1217,14 @@ def main(argv):
         else:
             msg += " failed to remove."
         print(msg)
-
+    elif hasattr(args, "dumpdp"):
+        rep = ovsdp.info(args.dumpdp, 0)
+        if rep is None:
+            print("DP '%s' not found." % args.dumpdp)
+            return 1
+        rep = ovsflow.dump(rep["dpifindex"])
+        for flow in rep:
+            print(flow.dpstr(True if args.verbose > 0 else False))
     return 0
 
 
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC net-next 4/6] selftests: openvswitch: adjust datapath NL message
  2022-11-22 14:03 [RFC net-next 0/6] Allow excluding sw flow key from upcalls Aaron Conole
                   ` (2 preceding siblings ...)
  2022-11-22 14:03 ` [RFC net-next 3/6] selftests: openvswitch: add flow dump support Aaron Conole
@ 2022-11-22 14:03 ` Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 5/6] selftests: openvswitch: add upcall support Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 6/6] selftests: openvswitch: add exclude support for packet commands Aaron Conole
  5 siblings, 0 replies; 13+ messages in thread
From: Aaron Conole @ 2022-11-22 14:03 UTC (permalink / raw)
  To: netdev
  Cc: Pravin B Shelar, Jakub Kicinski, David S. Miller, Paolo Abeni,
	Eric Dumazet, Thomas Graf, dev, Eelco Chaudron, Ilya Maximets,
	Shuah Khan, linux-kernel, linux-kselftest

The netlink message for creating a new datapath takes an array
of ports for the PID creation.  This shouldn't cause much issue
but correct it for future messages where we need to do decode
of datapath information that could include the per-cpu PID
map

Fixes: 25f16c873fb1 ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index d654fe1fe4e6..fe14da358901 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -71,7 +71,7 @@ class OvsDatapath(GenericNetlinkSocket):
         nla_map = (
             ("OVS_DP_ATTR_UNSPEC", "none"),
             ("OVS_DP_ATTR_NAME", "asciiz"),
-            ("OVS_DP_ATTR_UPCALL_PID", "uint32"),
+            ("OVS_DP_ATTR_UPCALL_PID", "array(uint32)"),
             ("OVS_DP_ATTR_STATS", "dpstats"),
             ("OVS_DP_ATTR_MEGAFLOW_STATS", "megaflowstats"),
             ("OVS_DP_ATTR_USER_FEATURES", "uint32"),
@@ -141,7 +141,7 @@ class OvsDatapath(GenericNetlinkSocket):
 
         msg["attrs"].append(["OVS_DP_ATTR_USER_FEATURES", dpfeatures])
         if not shouldUpcall:
-            msg["attrs"].append(["OVS_DP_ATTR_UPCALL_PID", 0])
+            msg["attrs"].append(["OVS_DP_ATTR_UPCALL_PID", [0]])
 
         try:
             reply = self.nlm_request(
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC net-next 5/6] selftests: openvswitch: add upcall support
  2022-11-22 14:03 [RFC net-next 0/6] Allow excluding sw flow key from upcalls Aaron Conole
                   ` (3 preceding siblings ...)
  2022-11-22 14:03 ` [RFC net-next 4/6] selftests: openvswitch: adjust datapath NL message Aaron Conole
@ 2022-11-22 14:03 ` Aaron Conole
  2022-11-22 14:03 ` [RFC net-next 6/6] selftests: openvswitch: add exclude support for packet commands Aaron Conole
  5 siblings, 0 replies; 13+ messages in thread
From: Aaron Conole @ 2022-11-22 14:03 UTC (permalink / raw)
  To: netdev
  Cc: Pravin B Shelar, Jakub Kicinski, David S. Miller, Paolo Abeni,
	Eric Dumazet, Thomas Graf, dev, Eelco Chaudron, Ilya Maximets,
	Shuah Khan, linux-kernel, linux-kselftest

Future tests can make use of CMD_MISS events to do things like
cross validated packet contents with the flow key that was
generated by flow key extraction.  This will also be used in
an upcoming commit to allow removing the flow key from upcall
messages.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 .../selftests/net/openvswitch/ovs-dpctl.py    | 140 +++++++++++++++++-
 1 file changed, 133 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index fe14da358901..94204af48d28 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -8,6 +8,8 @@ import argparse
 import errno
 import ipaddress
 import logging
+import multiprocessing
+import struct
 import sys
 import time
 
@@ -58,6 +60,53 @@ class ovs_dp_msg(genlmsg):
     fields = genlmsg.fields + (("dpifindex", "I"),)
 
 
+class OvsPacket(GenericNetlinkSocket):
+    OVS_PACKET_CMD_MISS = 1                # Flow table miss
+    OVS_PACKET_CMD_ACTION = 2              # USERSPACE action
+    OVS_PACKET_CMD_EXECUTE = 3             # Apply actions to packet
+
+    class ovs_packet_msg(ovs_dp_msg):
+        nla_map = (
+            ('OVS_PACKET_ATTR_UNSPEC', 'none'),
+            ('OVS_PACKET_ATTR_PACKET', 'array(uint8)'),
+            ('OVS_PACKET_ATTR_KEY', 'nested'),
+            ('OVS_PACKET_ATTR_ACTIONS', 'nested'),
+            ('OVS_PACKET_ATTR_USERDATA', 'nested'),
+            ('OVS_PACKET_ATTR_EGRESS_TUN_KEY', 'nested'),
+	    ('OVS_PACKET_ATTR_UNUSED1', 'none'),
+	    ('OVS_PACKET_ATTR_UNUSED2', 'none'),
+            ('OVS_PACKET_ATTR_PROBE', 'none'),
+            ('OVS_PACKET_ATTR_MRU', 'uint16'),
+            ('OVS_PACKET_ATTR_LEN', 'uint32'),
+            ('OVS_PACKET_ATTR_HASH', 'uint64'),
+        )
+
+    def __init__(self):
+        GenericNetlinkSocket.__init__(self)
+        print("Binding to packet family")
+        self.bind(OVS_PACKET_FAMILY, OvsPacket.ovs_packet_msg)
+        print("Port", self.epid)
+
+    def upcall_handler(self, up=None):
+        print("listening on upcall packet handler:", self.epid)
+        while True:
+            try:
+                msgs = self.get()
+                for msg in msgs:
+                    if not up:
+                        continue
+                    if msg["cmd"] == OvsPacket.OVS_PACKET_CMD_MISS:
+                        up.miss(msg)
+                    elif msg["cmd"] == OvsPacket.OVS_PACKET_CMD_ACTION:
+                        up.action(msg)
+                    elif msg["cmd"] == OvsPacket.OVS_PACKET_CMD_EXECUTE:
+                        up.execute(msg)
+                    else:
+                        print("Unkonwn cmd: %d" % msg["cmd"])
+            except NetlinkError as ne:
+                raise ne
+
+
 class OvsDatapath(GenericNetlinkSocket):
 
     OVS_DP_F_VPORT_PIDS = 1 << 1
@@ -122,7 +171,7 @@ class OvsDatapath(GenericNetlinkSocket):
 
         return reply
 
-    def create(self, dpname, shouldUpcall=False, versionStr=None):
+    def create(self, dpname, shouldUpcall=False, versionStr=None, p=OvsPacket()):
         msg = OvsDatapath.dp_cmd_msg()
         msg["cmd"] = OVS_DP_CMD_NEW
         if versionStr is None:
@@ -139,9 +188,19 @@ class OvsDatapath(GenericNetlinkSocket):
         else:
             dpfeatures = OvsDatapath.OVS_DP_F_VPORT_PIDS
 
-        msg["attrs"].append(["OVS_DP_ATTR_USER_FEATURES", dpfeatures])
         if not shouldUpcall:
             msg["attrs"].append(["OVS_DP_ATTR_UPCALL_PID", [0]])
+        else:
+            if versionStr is None or versionStr.find(":") == -1:
+                dpfeatures |= OvsDatapath.OVS_DP_F_DISPATCH_UPCALL_PER_CPU
+                dpfeatures &= ~OvsDatapath.OVS_DP_F_VPORT_PIDS
+
+            nproc = multiprocessing.cpu_count()
+            procarray = []
+            for i in range(1, nproc):
+                procarray += [int(p.epid)]
+            msg["attrs"].append(["OVS_DP_ATTR_UPCALL_PID", procarray])
+        msg["attrs"].append(["OVS_DP_ATTR_USER_FEATURES", dpfeatures])
 
         try:
             reply = self.nlm_request(
@@ -238,9 +297,10 @@ class OvsVport(GenericNetlinkSocket):
             return OvsVport.OVS_VPORT_TYPE_GENEVE
         raise ValueError("Unknown vport type: '%s'" % vport_type)
 
-    def __init__(self):
+    def __init__(self, packet=OvsPacket()):
         GenericNetlinkSocket.__init__(self)
         self.bind(OVS_VPORT_FAMILY, OvsVport.ovs_vport_msg)
+        self.upcall_packet = packet
 
     def info(self, vport_name, dpifindex=0, portno=None):
         msg = OvsVport.ovs_vport_msg()
@@ -278,7 +338,36 @@ class OvsVport(GenericNetlinkSocket):
 
         msg["attrs"].append(["OVS_VPORT_ATTR_TYPE", port_type])
         msg["attrs"].append(["OVS_VPORT_ATTR_NAME", vport_ifname])
-        msg["attrs"].append(["OVS_VPORT_ATTR_UPCALL_PID", [self.pid]])
+        msg["attrs"].append(["OVS_VPORT_ATTR_UPCALL_PID",
+                             [self.upcall_packet.epid]])
+
+        try:
+            reply = self.nlm_request(
+                msg, msg_type=self.prid, msg_flags=NLM_F_REQUEST | NLM_F_ACK
+            )
+            reply = reply[0]
+        except NetlinkError as ne:
+            if ne.code == errno.EEXIST:
+                reply = None
+            else:
+                raise ne
+        return reply
+
+    def reset_upcall(self, dpindex, vport_ifname, p=None):
+        msg = OvsVport.ovs_vport_msg()
+
+        msg["cmd"] = OVS_VPORT_CMD_SET
+        msg["version"] = OVS_DATAPATH_VERSION
+        msg["reserved"] = 0
+        msg["dpifindex"] = dpindex
+        msg["attrs"].append(["OVS_VPORT_ATTR_NAME", vport_ifname])
+
+        if p == None:
+            p = self.upcall_packet
+        else:
+            self.upcall_packet = p
+
+        msg["attrs"].append(["OVS_VPORT_ATTR_UPCALL_PID", [p.epid]])
 
         try:
             reply = self.nlm_request(
@@ -310,6 +399,9 @@ class OvsVport(GenericNetlinkSocket):
                 raise ne
         return reply
 
+    def upcall_handler(self, handler=None):
+        self.upcall_packet.upcall_handler(handler)
+
 
 def macstr(mac):
     outstr = ":".join(["%02X" % i for i in mac])
@@ -1064,6 +1156,26 @@ class OvsFlow(GenericNetlinkSocket):
             raise ne
         return rep
 
+    def miss(self, packetmsg):
+        seq = packetmsg["header"]["sequence_number"]
+        keystr = "(none)"
+        key_field = packetmsg.get_attr("OVS_PACKET_ATTR_KEY")
+        if key_field is not None:
+            keymsg = OvsFlow.ovs_flow_msg.nestedflow(data=key_field)
+            keymsg.decode()
+            keystr = keymsg.dpstr(None, True)
+
+        pktdata = packetmsg.get_attr("OVS_PACKET_ATTR_PACKET")
+        pktpres = "yes" if pktdata is not None else "no"
+
+        print("MISS upcall[%d/%s]: %s" % (seq, pktpres, keystr), flush = True)
+
+    def execute(self, packetmsg):
+        print("userspace execute command")
+
+    def action(self, packetmsg):
+        print("userspace action command")
+
 
 def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB(), vpl=OvsVport()):
     dp_name = dp_lookup_rep.get_attr("OVS_DP_ATTR_NAME")
@@ -1141,6 +1253,12 @@ def main(argv):
     addifcmd = subparsers.add_parser("add-if")
     addifcmd.add_argument("dpname", help="Datapath Name")
     addifcmd.add_argument("addif", help="Interface name for adding")
+    addifcmd.add_argument(
+        "-u",
+        "--upcall",
+        action="store_true",
+        help="Leave open a reader for upcalls",
+    )
     addifcmd.add_argument(
         "-t",
         "--ptype",
@@ -1162,8 +1280,9 @@ def main(argv):
         if args.verbose > 1:
             logging.basicConfig(level=logging.DEBUG)
 
+    ovspk = OvsPacket()
     ovsdp = OvsDatapath()
-    ovsvp = OvsVport()
+    ovsvp = OvsVport(ovspk)
     ovsflow = OvsFlow()
     ndb = NDB()
 
@@ -1186,11 +1305,13 @@ def main(argv):
                 msg += ":'%s'" % args.showdp
             print(msg)
     elif hasattr(args, "adddp"):
-        rep = ovsdp.create(args.adddp, args.upcall, args.versioning)
+        rep = ovsdp.create(args.adddp, args.upcall, args.versioning, ovspk)
         if rep is None:
             print("DP '%s' already exists" % args.adddp)
         else:
             print("DP '%s' added" % args.adddp)
+        if args.upcall:
+            ovspk.upcall_handler(ovsflow)
     elif hasattr(args, "deldp"):
         ovsdp.destroy(args.deldp)
     elif hasattr(args, "addif"):
@@ -1198,13 +1319,18 @@ def main(argv):
         if rep is None:
             print("DP '%s' not found." % args.dpname)
             return 1
-        rep = ovsvp.attach(rep["dpifindex"], args.addif, args.ptype)
+        dpindex = rep["dpifindex"]
+        rep = ovsvp.attach(dpindex, args.addif, args.ptype)
         msg = "vport '%s'" % args.addif
         if rep and rep["error"] == 0:
             msg += " added."
         else:
             msg += " failed to add."
         print(msg)
+        if args.upcall:
+            if rep is None:
+                rep = ovsvp.reset_upcall(dpindex, args.addif, ovspk)
+            ovsvp.upcall_handler(ovsflow)
     elif hasattr(args, "delif"):
         rep = ovsdp.info(args.dpname, 0)
         if rep is None:
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC net-next 6/6] selftests: openvswitch: add exclude support for packet commands
  2022-11-22 14:03 [RFC net-next 0/6] Allow excluding sw flow key from upcalls Aaron Conole
                   ` (4 preceding siblings ...)
  2022-11-22 14:03 ` [RFC net-next 5/6] selftests: openvswitch: add upcall support Aaron Conole
@ 2022-11-22 14:03 ` Aaron Conole
  5 siblings, 0 replies; 13+ messages in thread
From: Aaron Conole @ 2022-11-22 14:03 UTC (permalink / raw)
  To: netdev
  Cc: Pravin B Shelar, Jakub Kicinski, David S. Miller, Paolo Abeni,
	Eric Dumazet, Thomas Graf, dev, Eelco Chaudron, Ilya Maximets,
	Shuah Khan, linux-kernel, linux-kselftest

Introduce a test case to show that we can exclude flows based
on specific configurations.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 .../selftests/net/openvswitch/openvswitch.sh  | 53 +++++++++++++++++--
 .../selftests/net/openvswitch/ovs-dpctl.py    | 34 +++++++++++-
 2 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index ce14913150fe..f04f2f748332 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -11,7 +11,8 @@ VERBOSE=0
 TRACING=0
 
 tests="
-	netlink_checks				ovsnl: validate netlink attrs and settings"
+	netlink_checks				ovsnl: validate netlink attrs and settings
+	upcall_interfaces			ovs: test the upcall interfaces"
 
 info() {
     [ $VERBOSE = 0 ] || echo $*
@@ -72,7 +73,15 @@ ovs_add_dp () {
 
 ovs_add_if () {
 	info "Adding IF to DP: br:$2 if:$3"
-	ovs_sbx "$1" python3 $ovs_base/ovs-dpctl.py add-if "$2" "$3" || return 1
+	if [ "$4" != "-u" ]; then
+		ovs_sbx "$1" python3 $ovs_base/ovs-dpctl.py add-if "$2" "$3" \
+		    || return 1
+	else
+		python3 $ovs_base/ovs-dpctl.py add-if \
+		    -u "$2" "$3" >$ovs_dir/$3.out 2>$ovs_dir/$3.err &
+		pid=$!
+		on_exit "ovs_sbx $1 kill -TERM $pid 2>/dev/null"
+	fi
 }
 
 ovs_del_if () {
@@ -103,10 +112,16 @@ ovs_add_netns_and_veths () {
 	ovs_sbx "$1" ip netns exec "$3" ip link set "$5" up || return 1
 
 	if [ "$6" != "" ]; then
-		ovs_sbx "$1" ip netns exec "$4" ip addr add "$6" dev "$5" \
+		ovs_sbx "$1" ip netns exec "$3" ip addr add "$6" dev "$5" \
 		    || return 1
 	fi
-	ovs_add_if "$1" "$2" "$4" || return 1
+
+	if [ "$7" != "-u" ]; then
+		ovs_add_if "$1" "$2" "$4" || return 1
+	else
+		ovs_add_if "$1" "$2" "$4" -u || return 1
+	fi
+
 	[ $TRACING -eq 1 ] && ovs_netns_spawn_daemon "$1" "$3" \
 			tcpdump -i any -s 65535 >> ${ovs_dir}/tcpdump_"$3".log
 
@@ -158,6 +173,36 @@ test_netlink_checks () {
 	return 0
 }
 
+test_upcall_interfaces() {
+	sbx_add "test_upcall_interfaces" || return 1
+
+	info "setting up new DP"
+	ovs_add_dp "test_upcall_interfaces" ui0 || return 1
+
+	ovs_add_netns_and_veths "test_upcall_interfaces" ui0 upc left0 l0 \
+	    172.31.110.1/24 -u || return 1
+
+	sleep 1
+	info "sending arping"
+	ip netns exec upc arping -I l0 172.31.110.20 -c 1 \
+	    >$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
+
+	grep -E "MISS upcall\[0/yes\]: .*arp\(sip=172.31.110.1,tip=172.31.110.20,op=1,sha=" $ovs_dir/left0.out >/dev/null 2>&1 || return 1
+	# now tear down the DP and set it up with the new options
+	ovs_sbx "test_upcall_interfaces" python3 $ovs_base/ovs-dpctl.py \
+	    del-dp ui0 || return 1
+	ovs_sbx "test_upcall_interfaces" python3 $ovs_base/ovs-dpctl.py \
+	    add-dp -e miss -- ui0 || return 1
+	ovs_add_if "test_upcall_interfaces" ui0 left0 -u || return 1
+
+	sleep 1
+	info "sending second arping"
+	ip netns exec upc arping -I l0 172.31.110.20 -c 1 \
+	    >$ovs_dir/arping.stdout 2>$ovs_dir/arping.stderr
+	grep -E "MISS upcall\[0/yes\]: \(none\)" $ovs_dir/left0.out >/dev/null 2>&1 || return 1
+	return 0
+}
+
 run_test() {
 	(
 	tname="$1"
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 94204af48d28..ba115fb51773 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -111,6 +111,7 @@ class OvsDatapath(GenericNetlinkSocket):
 
     OVS_DP_F_VPORT_PIDS = 1 << 1
     OVS_DP_F_DISPATCH_UPCALL_PER_CPU = 1 << 3
+    OVS_DP_F_EXCLUDE_UPCALL_FLOW_KEY = 1 << 4
 
     class dp_cmd_msg(ovs_dp_msg):
         """
@@ -127,6 +128,8 @@ class OvsDatapath(GenericNetlinkSocket):
             ("OVS_DP_ATTR_PAD", "none"),
             ("OVS_DP_ATTR_MASKS_CACHE_SIZE", "uint32"),
             ("OVS_DP_ATTR_PER_CPU_PIDS", "array(uint32)"),
+            ("OVS_DP_ATTR_IFINDEX", "uint32"),
+            ("OVS_DP_ATTR_EXCLUDE_CMDS", "uint32"),
         )
 
         class dpstats(nla):
@@ -171,7 +174,8 @@ class OvsDatapath(GenericNetlinkSocket):
 
         return reply
 
-    def create(self, dpname, shouldUpcall=False, versionStr=None, p=OvsPacket()):
+    def create(self, dpname, shouldUpcall=False, versionStr=None, p=OvsPacket(),
+               exclude=[]):
         msg = OvsDatapath.dp_cmd_msg()
         msg["cmd"] = OVS_DP_CMD_NEW
         if versionStr is None:
@@ -200,6 +204,23 @@ class OvsDatapath(GenericNetlinkSocket):
             for i in range(1, nproc):
                 procarray += [int(p.epid)]
             msg["attrs"].append(["OVS_DP_ATTR_UPCALL_PID", procarray])
+
+        excluded = 0
+        print("exclude", exclude)
+        if len(exclude) > 0:
+            for ex in exclude:
+                if ex == "miss":
+                    excluded |= 1 << OvsPacket.OVS_PACKET_CMD_MISS
+                elif ex == "action":
+                    excluded |= 1 << OvsPacket.OVS_PACKET_CMD_ACTION
+                elif ex == "execute":
+                    excluded |= 1 << OvsPacket.OVS_PACKET_CMD_EXECUTE
+                else:
+                    print("DP CREATE: Unknown type: '%s'" % ex)
+            msg["attrs"].append(["OVS_DP_ATTR_EXCLUDE_CMDS", excluded])
+            if versionStr is None or versionStr.find(":") == -1:
+                dpfeatures |= OvsDatapath.OVS_DP_F_EXCLUDE_UPCALL_FLOW_KEY
+
         msg["attrs"].append(["OVS_DP_ATTR_USER_FEATURES", dpfeatures])
 
         try:
@@ -1240,6 +1261,14 @@ def main(argv):
         action="store_true",
         help="Leave open a reader for upcalls",
     )
+    adddpcmd.add_argument(
+        "-e",
+        "--exclude",
+        type=str,
+        default=[],
+        nargs="+",
+        help="Exclude flow key from upcall packet commands"
+    )
     adddpcmd.add_argument(
         "-V",
         "--versioning",
@@ -1305,7 +1334,8 @@ def main(argv):
                 msg += ":'%s'" % args.showdp
             print(msg)
     elif hasattr(args, "adddp"):
-        rep = ovsdp.create(args.adddp, args.upcall, args.versioning, ovspk)
+        rep = ovsdp.create(args.adddp, args.upcall, args.versioning, ovspk,
+                           args.exclude)
         if rep is None:
             print("DP '%s' already exists" % args.adddp)
         else:
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC net-next 1/6] openvswitch: exclude kernel flow key from upcalls
  2022-11-22 14:03 ` [RFC net-next 1/6] openvswitch: exclude kernel " Aaron Conole
@ 2022-11-23 21:22   ` Ilya Maximets
  2022-11-25 15:29     ` [ovs-dev] " Adrian Moreno
  2022-11-29 14:30     ` Aaron Conole
  0 siblings, 2 replies; 13+ messages in thread
From: Ilya Maximets @ 2022-11-23 21:22 UTC (permalink / raw)
  To: Aaron Conole, netdev
  Cc: i.maximets, Pravin B Shelar, Jakub Kicinski, David S. Miller,
	Paolo Abeni, Eric Dumazet, Thomas Graf, dev, Eelco Chaudron,
	Shuah Khan, linux-kernel, linux-kselftest

On 11/22/22 15:03, Aaron Conole wrote:
> When processing upcall commands, two groups of data are available to
> userspace for processing: the actual packet data and the kernel
> sw flow key data.  The inclusion of the flow key allows the userspace
> avoid running through the dissection again.
> 
> However, the userspace can choose to ignore the flow key data, as is
> the case in some ovs-vswitchd upcall processing.  For these messages,
> having the flow key data merely adds additional data to the upcall
> pipeline without any actual gain.  Userspace simply throws the data
> away anyway.

Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
packet from scratch and using the newly parsed key for the OpenFlow
translation, the kernel-porvided key is still used in a few important
places.  Mainly for the compatibility checking.  The use is described
here in more details:
  https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility

We need to compare the key generated in userspace with the key
generated by the kernel to know if it's safe to install the new flow
to the kernel, i.e. if the kernel and OVS userpsace are parsing the
packet in the same way.

On the other hand, OVS today doesn't check the data, it only checks
which fields are present.  So, if we can generate and pass the bitmap
of fields present in the key or something similar without sending the
full key, that might still save some CPU cycles and memory in the
socket buffer while preserving the ability to check for forward and
backward compatibility.  What do you think?


The rest of the patch set seems useful even without patch #1 though.

Nit: This patch #1 should probably be merged with the patch #6 and be
at the end of a patch set, so the selftest and the main code are updated
at the same time.

Best regards, Ilya Maximets.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ovs-dev] [RFC net-next 1/6] openvswitch: exclude kernel flow key from upcalls
  2022-11-23 21:22   ` Ilya Maximets
@ 2022-11-25 15:29     ` Adrian Moreno
  2022-11-25 15:51       ` Ilya Maximets
  2022-11-29 14:30     ` Aaron Conole
  1 sibling, 1 reply; 13+ messages in thread
From: Adrian Moreno @ 2022-11-25 15:29 UTC (permalink / raw)
  To: Ilya Maximets, Aaron Conole, netdev
  Cc: dev, linux-kernel, Eric Dumazet, linux-kselftest, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, David S. Miller



On 11/23/22 22:22, Ilya Maximets wrote:
> On 11/22/22 15:03, Aaron Conole wrote:
>> When processing upcall commands, two groups of data are available to
>> userspace for processing: the actual packet data and the kernel
>> sw flow key data.  The inclusion of the flow key allows the userspace
>> avoid running through the dissection again.
>>
>> However, the userspace can choose to ignore the flow key data, as is
>> the case in some ovs-vswitchd upcall processing.  For these messages,
>> having the flow key data merely adds additional data to the upcall
>> pipeline without any actual gain.  Userspace simply throws the data
>> away anyway.
> 
> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
> packet from scratch and using the newly parsed key for the OpenFlow
> translation, the kernel-porvided key is still used in a few important
> places.  Mainly for the compatibility checking.  The use is described
> here in more details:
>    https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
> 
> We need to compare the key generated in userspace with the key
> generated by the kernel to know if it's safe to install the new flow
> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
> packet in the same way.
> 

Hi Ilya,

Do we need to do that for every packet?
Could we send a bitmask of supported fields to userspace at feature negotiation 
and let OVS slowpath flows that it knows the kernel won't be able to handle 
properly?


> On the other hand, OVS today doesn't check the data, it only checks
> which fields are present.  So, if we can generate and pass the bitmap
> of fields present in the key or something similar without sending the
> full key, that might still save some CPU cycles and memory in the
> socket buffer while preserving the ability to check for forward and
> backward compatibility.  What do you think?
> 
> 
> The rest of the patch set seems useful even without patch #1 though.
> 
> Nit: This patch #1 should probably be merged with the patch #6 and be
> at the end of a patch set, so the selftest and the main code are updated
> at the same time.
> 
> Best regards, Ilya Maximets.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

Thanks
-- 
Adrián Moreno


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ovs-dev] [RFC net-next 1/6] openvswitch: exclude kernel flow key from upcalls
  2022-11-25 15:29     ` [ovs-dev] " Adrian Moreno
@ 2022-11-25 15:51       ` Ilya Maximets
  2022-11-28  9:12         ` Adrian Moreno
  0 siblings, 1 reply; 13+ messages in thread
From: Ilya Maximets @ 2022-11-25 15:51 UTC (permalink / raw)
  To: Adrian Moreno, Aaron Conole, netdev
  Cc: i.maximets, dev, linux-kernel, Eric Dumazet, linux-kselftest,
	Jakub Kicinski, Paolo Abeni, Shuah Khan, David S. Miller

On 11/25/22 16:29, Adrian Moreno wrote:
> 
> 
> On 11/23/22 22:22, Ilya Maximets wrote:
>> On 11/22/22 15:03, Aaron Conole wrote:
>>> When processing upcall commands, two groups of data are available to
>>> userspace for processing: the actual packet data and the kernel
>>> sw flow key data.  The inclusion of the flow key allows the userspace
>>> avoid running through the dissection again.
>>>
>>> However, the userspace can choose to ignore the flow key data, as is
>>> the case in some ovs-vswitchd upcall processing.  For these messages,
>>> having the flow key data merely adds additional data to the upcall
>>> pipeline without any actual gain.  Userspace simply throws the data
>>> away anyway.
>>
>> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
>> packet from scratch and using the newly parsed key for the OpenFlow
>> translation, the kernel-porvided key is still used in a few important
>> places.  Mainly for the compatibility checking.  The use is described
>> here in more details:
>>    https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>>
>> We need to compare the key generated in userspace with the key
>> generated by the kernel to know if it's safe to install the new flow
>> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
>> packet in the same way.
>>
> 
> Hi Ilya,
> 
> Do we need to do that for every packet?
> Could we send a bitmask of supported fields to userspace at feature
> negotiation and let OVS slowpath flows that it knows the kernel won't
> be able to handle properly?

It's not that simple, because supported fields in a packet depend
on previous fields in that same packet.  For example, parsing TCP
header is generally supported, but it won't be parsed for IPv6
fragments (even the first one), number of vlan headers will affect
the parsing as we do not parse deeper than 2 vlan headers, etc.
So, I'm afraid we have to have a per-packet information, unless we
can somehow probe all the possible valid combinations of packet
headers.

> 
> 
>> On the other hand, OVS today doesn't check the data, it only checks
>> which fields are present.  So, if we can generate and pass the bitmap
>> of fields present in the key or something similar without sending the
>> full key, that might still save some CPU cycles and memory in the
>> socket buffer while preserving the ability to check for forward and
>> backward compatibility.  What do you think?
>>
>>
>> The rest of the patch set seems useful even without patch #1 though.
>>
>> Nit: This patch #1 should probably be merged with the patch #6 and be
>> at the end of a patch set, so the selftest and the main code are updated
>> at the same time.
>>
>> Best regards, Ilya Maximets.
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 
> Thanks


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ovs-dev] [RFC net-next 1/6] openvswitch: exclude kernel flow key from upcalls
  2022-11-25 15:51       ` Ilya Maximets
@ 2022-11-28  9:12         ` Adrian Moreno
  2022-11-29 14:26           ` Aaron Conole
  0 siblings, 1 reply; 13+ messages in thread
From: Adrian Moreno @ 2022-11-28  9:12 UTC (permalink / raw)
  To: Ilya Maximets, Aaron Conole, netdev
  Cc: dev, linux-kernel, Eric Dumazet, linux-kselftest, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, David S. Miller



On 11/25/22 16:51, Ilya Maximets wrote:
> On 11/25/22 16:29, Adrian Moreno wrote:
>>
>>
>> On 11/23/22 22:22, Ilya Maximets wrote:
>>> On 11/22/22 15:03, Aaron Conole wrote:
>>>> When processing upcall commands, two groups of data are available to
>>>> userspace for processing: the actual packet data and the kernel
>>>> sw flow key data.  The inclusion of the flow key allows the userspace
>>>> avoid running through the dissection again.
>>>>
>>>> However, the userspace can choose to ignore the flow key data, as is
>>>> the case in some ovs-vswitchd upcall processing.  For these messages,
>>>> having the flow key data merely adds additional data to the upcall
>>>> pipeline without any actual gain.  Userspace simply throws the data
>>>> away anyway.
>>>
>>> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
>>> packet from scratch and using the newly parsed key for the OpenFlow
>>> translation, the kernel-porvided key is still used in a few important
>>> places.  Mainly for the compatibility checking.  The use is described
>>> here in more details:
>>>     https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>>>
>>> We need to compare the key generated in userspace with the key
>>> generated by the kernel to know if it's safe to install the new flow
>>> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
>>> packet in the same way.
>>>
>>
>> Hi Ilya,
>>
>> Do we need to do that for every packet?
>> Could we send a bitmask of supported fields to userspace at feature
>> negotiation and let OVS slowpath flows that it knows the kernel won't
>> be able to handle properly?
> 
> It's not that simple, because supported fields in a packet depend
> on previous fields in that same packet.  For example, parsing TCP
> header is generally supported, but it won't be parsed for IPv6
> fragments (even the first one), number of vlan headers will affect
> the parsing as we do not parse deeper than 2 vlan headers, etc.
> So, I'm afraid we have to have a per-packet information, unless we
> can somehow probe all the possible valid combinations of packet
> headers.
> 

Surely. I understand that we'd need more than just a bit per field. Things like 
L4 on IPv6 frags would need another bit and the number of VLAN headers would 
need some more. But, are these a handful of exceptions or do we really need all 
the possible combinations of headers? If it's a matter of naming a handful of 
corner cases I think we could consider expressing them at initialization time 
and safe some buffer space plus computation time both in kernel and userspace.

-- 
Adrián Moreno

>>
>>
>>> On the other hand, OVS today doesn't check the data, it only checks
>>> which fields are present.  So, if we can generate and pass the bitmap
>>> of fields present in the key or something similar without sending the
>>> full key, that might still save some CPU cycles and memory in the
>>> socket buffer while preserving the ability to check for forward and
>>> backward compatibility.  What do you think?
>>>
>>>
>>> The rest of the patch set seems useful even without patch #1 though.
>>>
>>> Nit: This patch #1 should probably be merged with the patch #6 and be
>>> at the end of a patch set, so the selftest and the main code are updated
>>> at the same time.
>>>
>>> Best regards, Ilya Maximets.
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>> Thanks
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ovs-dev] [RFC net-next 1/6] openvswitch: exclude kernel flow key from upcalls
  2022-11-28  9:12         ` Adrian Moreno
@ 2022-11-29 14:26           ` Aaron Conole
  0 siblings, 0 replies; 13+ messages in thread
From: Aaron Conole @ 2022-11-29 14:26 UTC (permalink / raw)
  To: Adrian Moreno
  Cc: Ilya Maximets, netdev, dev, linux-kernel, Eric Dumazet,
	linux-kselftest, Jakub Kicinski, Paolo Abeni, Shuah Khan,
	David S. Miller

Adrian Moreno <amorenoz@redhat.com> writes:

> On 11/25/22 16:51, Ilya Maximets wrote:
>> On 11/25/22 16:29, Adrian Moreno wrote:
>>>
>>>
>>> On 11/23/22 22:22, Ilya Maximets wrote:
>>>> On 11/22/22 15:03, Aaron Conole wrote:
>>>>> When processing upcall commands, two groups of data are available to
>>>>> userspace for processing: the actual packet data and the kernel
>>>>> sw flow key data.  The inclusion of the flow key allows the userspace
>>>>> avoid running through the dissection again.
>>>>>
>>>>> However, the userspace can choose to ignore the flow key data, as is
>>>>> the case in some ovs-vswitchd upcall processing.  For these messages,
>>>>> having the flow key data merely adds additional data to the upcall
>>>>> pipeline without any actual gain.  Userspace simply throws the data
>>>>> away anyway.
>>>>
>>>> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
>>>> packet from scratch and using the newly parsed key for the OpenFlow
>>>> translation, the kernel-porvided key is still used in a few important
>>>> places.  Mainly for the compatibility checking.  The use is described
>>>> here in more details:
>>>>     https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>>>>
>>>> We need to compare the key generated in userspace with the key
>>>> generated by the kernel to know if it's safe to install the new flow
>>>> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
>>>> packet in the same way.
>>>>
>>>
>>> Hi Ilya,
>>>
>>> Do we need to do that for every packet?
>>> Could we send a bitmask of supported fields to userspace at feature
>>> negotiation and let OVS slowpath flows that it knows the kernel won't
>>> be able to handle properly?
>> It's not that simple, because supported fields in a packet depend
>> on previous fields in that same packet.  For example, parsing TCP
>> header is generally supported, but it won't be parsed for IPv6
>> fragments (even the first one), number of vlan headers will affect
>> the parsing as we do not parse deeper than 2 vlan headers, etc.
>> So, I'm afraid we have to have a per-packet information, unless we
>> can somehow probe all the possible valid combinations of packet
>> headers.
>> 
>
> Surely. I understand that we'd need more than just a bit per
> field. Things like L4 on IPv6 frags would need another bit and the
> number of VLAN headers would need some more. But, are these a handful
> of exceptions or do we really need all the possible combinations of
> headers? If it's a matter of naming a handful of corner cases I think
> we could consider expressing them at initialization time and safe some
> buffer space plus computation time both in kernel and userspace.

I will take a bit more of a look here - there must surely be a way to
express this when pulling information via DP_GET command so that we
don't need to wait for a packet to come in to figure out whether we can
parse it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC net-next 1/6] openvswitch: exclude kernel flow key from upcalls
  2022-11-23 21:22   ` Ilya Maximets
  2022-11-25 15:29     ` [ovs-dev] " Adrian Moreno
@ 2022-11-29 14:30     ` Aaron Conole
  1 sibling, 0 replies; 13+ messages in thread
From: Aaron Conole @ 2022-11-29 14:30 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: netdev, Pravin B Shelar, Jakub Kicinski, David S. Miller,
	Paolo Abeni, Eric Dumazet, Thomas Graf, dev, Eelco Chaudron,
	Shuah Khan, linux-kernel, linux-kselftest

Ilya Maximets <i.maximets@ovn.org> writes:

> On 11/22/22 15:03, Aaron Conole wrote:
>> When processing upcall commands, two groups of data are available to
>> userspace for processing: the actual packet data and the kernel
>> sw flow key data.  The inclusion of the flow key allows the userspace
>> avoid running through the dissection again.
>> 
>> However, the userspace can choose to ignore the flow key data, as is
>> the case in some ovs-vswitchd upcall processing.  For these messages,
>> having the flow key data merely adds additional data to the upcall
>> pipeline without any actual gain.  Userspace simply throws the data
>> away anyway.
>
> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
> packet from scratch and using the newly parsed key for the OpenFlow
> translation, the kernel-porvided key is still used in a few important
> places.  Mainly for the compatibility checking.  The use is described
> here in more details:
>   https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>
> We need to compare the key generated in userspace with the key
> generated by the kernel to know if it's safe to install the new flow
> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
> packet in the same way.
>
> On the other hand, OVS today doesn't check the data, it only checks
> which fields are present.  So, if we can generate and pass the bitmap
> of fields present in the key or something similar without sending the
> full key, that might still save some CPU cycles and memory in the
> socket buffer while preserving the ability to check for forward and
> backward compatibility.  What do you think?

Maybe that can work.  I will try testing.  If so, then I would change
this semantic to send just the bitmap rather than omitting everything.

> The rest of the patch set seems useful even without patch #1 though.

I agree - but I didn't know if it made sense to submit the series
without adding something impactful (like a test).  I will work a bit
more on the flow area - maybe I can add enough actions and matches to
implement basic flow tests to submit while we think more about the feature.

> Nit: This patch #1 should probably be merged with the patch #6 and be
> at the end of a patch set, so the selftest and the main code are updated
> at the same time.

Okay - I can restructure them this way.

> Best regards, Ilya Maximets.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-29 14:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22 14:03 [RFC net-next 0/6] Allow excluding sw flow key from upcalls Aaron Conole
2022-11-22 14:03 ` [RFC net-next 1/6] openvswitch: exclude kernel " Aaron Conole
2022-11-23 21:22   ` Ilya Maximets
2022-11-25 15:29     ` [ovs-dev] " Adrian Moreno
2022-11-25 15:51       ` Ilya Maximets
2022-11-28  9:12         ` Adrian Moreno
2022-11-29 14:26           ` Aaron Conole
2022-11-29 14:30     ` Aaron Conole
2022-11-22 14:03 ` [RFC net-next 2/6] selftests: openvswitch: add interface support Aaron Conole
2022-11-22 14:03 ` [RFC net-next 3/6] selftests: openvswitch: add flow dump support Aaron Conole
2022-11-22 14:03 ` [RFC net-next 4/6] selftests: openvswitch: adjust datapath NL message Aaron Conole
2022-11-22 14:03 ` [RFC net-next 5/6] selftests: openvswitch: add upcall support Aaron Conole
2022-11-22 14:03 ` [RFC net-next 6/6] selftests: openvswitch: add exclude support for packet commands Aaron Conole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).