[PATCH net-next,v3 0/4] flow_offload: update mangle action representation

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
@ 2019-09-06  0:03 Pablo Neira Ayuso
  2019-09-06  0:04 ` [PATCH net-next,v3 1/4] net: flow_offload: flip mangle action mask Pablo Neira Ayuso
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06  0:03 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, jakub.kicinski, jiri, saeedm, vishal, vladbu

This patch updates the mangle action representation:

Patch 1) Undo bitwise NOT operation on the mangle mask (coming from tc
         pedit userspace).

Patch 2) mangle value &= mask from the front-end side.

Patch 3) adjust offset, length and coalesce consecutive pedit keys into
         one single action.

Patch 4) add support for payload mangling for netfilter.

After this patchset:

* Offset to payload does not need to be on the 32-bits boundaries anymore.
  This patchset adds front-end code to adjust the offset and length coming
  from the tc pedit representation, so drivers get an exact header field
  offset and length.

* This new front-end code coalesces consecutive pedit actions into one
  single action, so drivers can mangle IPv6 and ethernet address fields
  in one go, instead once for each 32-bit word.

On the driver side, diffstat -t shows that drivers code to deal with
payload mangling gets simplified:

        INSERTED,DELETED,MODIFIED,FILENAME
        46,116,0,drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c (-70 LOC)
        12,28,0,drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h (-16 LOC)
	26,54,0,drivers/net/ethernet/mellanox/mlx5/core/en_tc.c (-27 LOC)
        89,111,0,drivers/net/ethernet/netronome/nfp/flower/action.c (-17 LOC)

While, on the front-end side the balance is the following:

        123,22,0,net/sched/cls_api.c (+101 LOC)

Changes since v2:

* Fix is_action_keys_supported() breakage in mlx5 reported by Vlad Buslov.

Pablo Neira Ayuso (4):
  net: flow_offload: flip mangle action mask
  net: flow_offload: bitwise AND on mangle action value field
  net: flow_offload: mangle action at byte level
  netfilter: nft_payload: packet mangling offload support

 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 163 +++++-----------
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h   |  40 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  80 +++-----
 drivers/net/ethernet/netronome/nfp/flower/action.c | 204 ++++++++++-----------
 include/net/flow_offload.h                         |   7 +-
 net/netfilter/nft_payload.c                        |  73 ++++++++
 net/sched/cls_api.c                                | 144 ++++++++++++---
 7 files changed, 378 insertions(+), 333 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next,v3 1/4] net: flow_offload: flip mangle action mask
  2019-09-06  0:03 [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Pablo Neira Ayuso
@ 2019-09-06  0:04 ` Pablo Neira Ayuso
  2019-09-06  0:04 ` [PATCH net-next,v3 2/4] net: flow_offload: bitwise AND on mangle action value field Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06  0:04 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, jakub.kicinski, jiri, saeedm, vishal, vladbu

Userspace tc pedit action performs a bitwise NOT operation on the mask.
All of the existing drivers in the tree undo this operation. Prepare the
mangle mask in the way the drivers expect from the
tc_setup_flow_action() function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 12 ++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c      |  6 +++---
 drivers/net/ethernet/netronome/nfp/flower/action.c   |  8 ++++----
 net/sched/cls_api.c                                  |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index e447976bdd3e..2d26dbca701d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -275,7 +275,7 @@ static int cxgb4_validate_flow_match(struct net_device *dev,
 static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
 			  u8 field)
 {
-	u32 set_val = val & ~mask;
+	u32 set_val = val & mask;
 	u32 offset = 0;
 	u8 size = 1;
 	int i;
@@ -301,7 +301,7 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
 			offload_pedit(fs, val, mask, ETH_DMAC_31_0);
 			break;
 		case PEDIT_ETH_DMAC_47_32_SMAC_15_0:
-			if (~mask & PEDIT_ETH_DMAC_MASK)
+			if (mask & PEDIT_ETH_DMAC_MASK)
 				offload_pedit(fs, val, mask, ETH_DMAC_47_32);
 			else
 				offload_pedit(fs, val >> 16, mask >> 16,
@@ -353,7 +353,7 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
 	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
 		switch (offset) {
 		case PEDIT_TCP_SPORT_DPORT:
-			if (~mask & PEDIT_TCP_UDP_SPORT_MASK)
+			if (mask & PEDIT_TCP_UDP_SPORT_MASK)
 				offload_pedit(fs, cpu_to_be32(val) >> 16,
 					      cpu_to_be32(mask) >> 16,
 					      TCP_SPORT);
@@ -366,7 +366,7 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
 	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
 		switch (offset) {
 		case PEDIT_UDP_SPORT_DPORT:
-			if (~mask & PEDIT_TCP_UDP_SPORT_MASK)
+			if (mask & PEDIT_TCP_UDP_SPORT_MASK)
 				offload_pedit(fs, cpu_to_be32(val) >> 16,
 					      cpu_to_be32(mask) >> 16,
 					      UDP_SPORT);
@@ -510,7 +510,7 @@ static bool valid_pedit_action(struct net_device *dev,
 	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
 		switch (offset) {
 		case PEDIT_TCP_SPORT_DPORT:
-			if (!valid_l4_mask(~mask)) {
+			if (!valid_l4_mask(mask)) {
 				netdev_err(dev, "%s: Unsupported mask for TCP L4 ports\n",
 					   __func__);
 				return false;
@@ -525,7 +525,7 @@ static bool valid_pedit_action(struct net_device *dev,
 	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
 		switch (offset) {
 		case PEDIT_UDP_SPORT_DPORT:
-			if (!valid_l4_mask(~mask)) {
+			if (!valid_l4_mask(mask)) {
 				netdev_err(dev, "%s: Unsupported mask for UDP L4 ports\n",
 					   __func__);
 				return false;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 30d26eba75a3..3db63cf41ee5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2509,7 +2509,7 @@ static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 	val = act->mangle.val;
 	offset = act->mangle.offset;
 
-	err = set_pedit_val(htype, ~mask, val, offset, &hdrs[cmd]);
+	err = set_pedit_val(htype, mask, val, offset, &hdrs[cmd]);
 	if (err)
 		goto out_err;
 
@@ -2609,7 +2609,7 @@ static bool is_action_keys_supported(const struct flow_action_entry *act)
 
 	htype = act->mangle.htype;
 	offset = act->mangle.offset;
-	mask = ~act->mangle.mask;
+	mask = act->mangle.mask;
 	/* For IPv4 & IPv6 header check 4 byte word,
 	 * to determine that modified fields
 	 * are NOT ttl & hop_limit only.
@@ -2733,7 +2733,7 @@ static int add_vlan_rewrite_action(struct mlx5e_priv *priv, int namespace,
 		.id = FLOW_ACTION_MANGLE,
 		.mangle.htype = FLOW_ACT_MANGLE_HDR_TYPE_ETH,
 		.mangle.offset = offsetof(struct vlan_ethhdr, h_vlan_TCI),
-		.mangle.mask = ~(u32)be16_to_cpu(*(__be16 *)&mask16),
+		.mangle.mask = (u32)be16_to_cpu(*(__be16 *)&mask16),
 		.mangle.val = (u32)be16_to_cpu(*(__be16 *)&val16),
 	};
 	u8 match_prio_mask, match_prio_val;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 1b019fdfcd97..ee0066a7ba87 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -495,7 +495,7 @@ nfp_fl_set_eth(const struct flow_action_entry *act, u32 off,
 		return -EOPNOTSUPP;
 	}
 
-	mask = ~act->mangle.mask;
+	mask = act->mangle.mask;
 	exact = act->mangle.val;
 
 	if (exact & ~mask) {
@@ -532,7 +532,7 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	__be32 exact, mask;
 
 	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)~act->mangle.mask;
+	mask = (__force __be32)act->mangle.mask;
 	exact = (__force __be32)act->mangle.val;
 
 	if (exact & ~mask) {
@@ -673,7 +673,7 @@ nfp_fl_set_ip6(const struct flow_action_entry *act, u32 off,
 	u8 word;
 
 	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)~act->mangle.mask;
+	mask = (__force __be32)act->mangle.mask;
 	exact = (__force __be32)act->mangle.val;
 
 	if (exact & ~mask) {
@@ -713,7 +713,7 @@ nfp_fl_set_tport(const struct flow_action_entry *act, u32 off,
 		return -EOPNOTSUPP;
 	}
 
-	mask = ~act->mangle.mask;
+	mask = act->mangle.mask;
 	exact = act->mangle.val;
 
 	if (exact & ~mask) {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 671ca905dbb5..fbab004d0075 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3379,7 +3379,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 					goto err_out;
 				}
 				entry->mangle.htype = tcf_pedit_htype(act, k);
-				entry->mangle.mask = tcf_pedit_mask(act, k);
+				entry->mangle.mask = ~tcf_pedit_mask(act, k);
 				entry->mangle.val = tcf_pedit_val(act, k);
 				entry->mangle.offset = tcf_pedit_offset(act, k);
 				entry = &flow_action->entries[++j];
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next,v3 2/4] net: flow_offload: bitwise AND on mangle action value field
  2019-09-06  0:03 [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Pablo Neira Ayuso
  2019-09-06  0:04 ` [PATCH net-next,v3 1/4] net: flow_offload: flip mangle action mask Pablo Neira Ayuso
@ 2019-09-06  0:04 ` Pablo Neira Ayuso
  2019-09-06  0:04 ` [PATCH net-next,v3 3/4] net: flow_offload: mangle action at byte level Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06  0:04 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, jakub.kicinski, jiri, saeedm, vishal, vladbu

Drivers perform a bitwise AND on the value and the mask. Update
tc_setup_flow_action() to perform this operation so drivers do not need
to do this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c      | 2 +-
 drivers/net/ethernet/netronome/nfp/flower/action.c   | 9 ++++-----
 net/sched/cls_api.c                                  | 3 ++-
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 2d26dbca701d..5afc15a60199 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -275,7 +275,6 @@ static int cxgb4_validate_flow_match(struct net_device *dev,
 static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
 			  u8 field)
 {
-	u32 set_val = val & mask;
 	u32 offset = 0;
 	u8 size = 1;
 	int i;
@@ -287,7 +286,7 @@ static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
 			break;
 		}
 	}
-	memcpy((u8 *)fs + offset, &set_val, size);
+	memcpy((u8 *)fs + offset, &val, size);
 }
 
 static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 3db63cf41ee5..ec47e994b7e0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2214,7 +2214,7 @@ static int set_pedit_val(u8 hdr_type, u32 mask, u32 val, u32 offset,
 		goto out_err;
 
 	*curr_pmask |= mask;
-	*curr_pval  |= (val & mask);
+	*curr_pval  |= val;
 
 	return 0;
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index ee0066a7ba87..592c36ba9e3f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -477,7 +477,6 @@ static void nfp_fl_set_helper32(u32 value, u32 mask, u8 *p_exact, u8 *p_mask)
 	u32 oldvalue = get_unaligned((u32 *)p_exact);
 	u32 oldmask = get_unaligned((u32 *)p_mask);
 
-	value &= mask;
 	value |= oldvalue & ~mask;
 
 	put_unaligned(oldmask | mask, (u32 *)p_mask);
@@ -544,7 +543,7 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	case offsetof(struct iphdr, daddr):
 		set_ip_addr->ipv4_dst_mask |= mask;
 		set_ip_addr->ipv4_dst &= ~mask;
-		set_ip_addr->ipv4_dst |= exact & mask;
+		set_ip_addr->ipv4_dst |= exact;
 		set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
 		set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
 					   NFP_FL_LW_SIZ;
@@ -552,7 +551,7 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	case offsetof(struct iphdr, saddr):
 		set_ip_addr->ipv4_src_mask |= mask;
 		set_ip_addr->ipv4_src &= ~mask;
-		set_ip_addr->ipv4_src |= exact & mask;
+		set_ip_addr->ipv4_src |= exact;
 		set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
 		set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
 					   NFP_FL_LW_SIZ;
@@ -606,7 +605,7 @@ nfp_fl_set_ip6_helper(int opcode_tag, u8 word, __be32 exact, __be32 mask,
 {
 	ip6->ipv6[word].mask |= mask;
 	ip6->ipv6[word].exact &= ~mask;
-	ip6->ipv6[word].exact |= exact & mask;
+	ip6->ipv6[word].exact |= exact;
 
 	ip6->reserved = cpu_to_be16(0);
 	ip6->head.jump_id = opcode_tag;
@@ -651,7 +650,7 @@ nfp_fl_set_ip6_hop_limit_flow_label(u32 off, __be32 exact, __be32 mask,
 
 		ip_hl_fl->ipv6_label_mask |= mask;
 		ip_hl_fl->ipv6_label &= ~mask;
-		ip_hl_fl->ipv6_label |= exact & mask;
+		ip_hl_fl->ipv6_label |= exact;
 		break;
 	}
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index fbab004d0075..e30a151d8527 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3380,7 +3380,8 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 				}
 				entry->mangle.htype = tcf_pedit_htype(act, k);
 				entry->mangle.mask = ~tcf_pedit_mask(act, k);
-				entry->mangle.val = tcf_pedit_val(act, k);
+				entry->mangle.val = tcf_pedit_val(act, k) &
+							entry->mangle.mask;
 				entry->mangle.offset = tcf_pedit_offset(act, k);
 				entry = &flow_action->entries[++j];
 			}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next,v3 3/4] net: flow_offload: mangle action at byte level
  2019-09-06  0:03 [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Pablo Neira Ayuso
  2019-09-06  0:04 ` [PATCH net-next,v3 1/4] net: flow_offload: flip mangle action mask Pablo Neira Ayuso
  2019-09-06  0:04 ` [PATCH net-next,v3 2/4] net: flow_offload: bitwise AND on mangle action value field Pablo Neira Ayuso
@ 2019-09-06  0:04 ` Pablo Neira Ayuso
  2019-09-06  0:04 ` [PATCH net-next,v3 4/4] netfilter: nft_payload: packet mangling offload support Pablo Neira Ayuso
  2019-09-06 10:02 ` [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Edward Cree
  4 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06  0:04 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, jakub.kicinski, jiri, saeedm, vishal, vladbu

The flow mangle action is originally modeled after the tc pedit action,
this has a number of shortcomings:

1) The tc pedit offset must be set on the 32-bits boundaries. Many
   protocol header field offsets are not aligned to 32-bits, eg. port
   destination, port source and ethernet destination. This patch adjusts
   the offset accordingly and trim off length in these case, so drivers get
   an exact offset and length to the header fields.

2) The maximum mangle length is one word of 32-bits, hence you need to
   up to four actions to mangle an IPv6 address. This patch coalesces
   consecutive tc pedit actions into one single action so drivers can
   configure the IPv6 mangling in one go. Ethernet address fields now
   require one single action instead of two too.

The following drivers have been updated accordingly to use this new
mangle action layout:

1) The cxgb4 driver does not need to split protocol field matching
   larger than one 32-bit words into multiple definitions. Instead one
   single definition per protocol field is enough. Checking for
   transport protocol ports is also simplified.

2) The mlx5 driver logic to disallow IPv4 ttl and IPv6 hoplimit fields
   becomes more simple too.

3) The nfp driver uses the nfp_fl_set_helper() function to configure the
   payload mangling. The memchr_inv() function is used to check for
   proper initialization of the value and mask. The driver has been
   updated to refer to the exact protocol header offsets too.

As a result, this patch reduces code complexity on the driver side at
the cost of adding ~100 LOC at the core to perform offset and length
adjustment; and to coalesce consecutive actions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 162 +++++-----------
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h   |  40 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  80 +++-----
 drivers/net/ethernet/netronome/nfp/flower/action.c | 203 ++++++++++-----------
 include/net/flow_offload.h                         |   7 +-
 net/sched/cls_api.c                                | 145 ++++++++++++---
 6 files changed, 305 insertions(+), 332 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 5afc15a60199..ba1ced08e41c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -44,20 +44,12 @@
 #define STATS_CHECK_PERIOD (HZ / 2)
 
 static struct ch_tc_pedit_fields pedits[] = {
-	PEDIT_FIELDS(ETH_, DMAC_31_0, 4, dmac, 0),
-	PEDIT_FIELDS(ETH_, DMAC_47_32, 2, dmac, 4),
-	PEDIT_FIELDS(ETH_, SMAC_15_0, 2, smac, 0),
-	PEDIT_FIELDS(ETH_, SMAC_47_16, 4, smac, 2),
+	PEDIT_FIELDS(ETH_, DMAC, 6, dmac, 0),
+	PEDIT_FIELDS(ETH_, SMAC, 6, smac, 0),
 	PEDIT_FIELDS(IP4_, SRC, 4, nat_fip, 0),
 	PEDIT_FIELDS(IP4_, DST, 4, nat_lip, 0),
-	PEDIT_FIELDS(IP6_, SRC_31_0, 4, nat_fip, 0),
-	PEDIT_FIELDS(IP6_, SRC_63_32, 4, nat_fip, 4),
-	PEDIT_FIELDS(IP6_, SRC_95_64, 4, nat_fip, 8),
-	PEDIT_FIELDS(IP6_, SRC_127_96, 4, nat_fip, 12),
-	PEDIT_FIELDS(IP6_, DST_31_0, 4, nat_lip, 0),
-	PEDIT_FIELDS(IP6_, DST_63_32, 4, nat_lip, 4),
-	PEDIT_FIELDS(IP6_, DST_95_64, 4, nat_lip, 8),
-	PEDIT_FIELDS(IP6_, DST_127_96, 4, nat_lip, 12),
+	PEDIT_FIELDS(IP6_, SRC, 16, nat_fip, 0),
+	PEDIT_FIELDS(IP6_, DST, 16, nat_lip, 0),
 	PEDIT_FIELDS(TCP_, SPORT, 2, nat_fport, 0),
 	PEDIT_FIELDS(TCP_, DPORT, 2, nat_lport, 0),
 	PEDIT_FIELDS(UDP_, SPORT, 2, nat_fport, 0),
@@ -272,8 +264,8 @@ static int cxgb4_validate_flow_match(struct net_device *dev,
 	return 0;
 }
 
-static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
-			  u8 field)
+static void offload_pedit(struct ch_filter_specification *fs,
+			  struct flow_action_entry *act, u8 field)
 {
 	u32 offset = 0;
 	u8 size = 1;
@@ -286,92 +278,68 @@ static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
 			break;
 		}
 	}
-	memcpy((u8 *)fs + offset, &val, size);
+	memcpy((u8 *)fs + offset, act->mangle.val, size);
 }
 
-static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
-				u32 mask, u32 offset, u8 htype)
+static void process_pedit_field(struct ch_filter_specification *fs,
+				struct flow_action_entry *act)
 {
+	u32 offset = act->mangle.offset;
+	u8 htype = act->mangle.htype;
+
 	switch (htype) {
 	case FLOW_ACT_MANGLE_HDR_TYPE_ETH:
 		switch (offset) {
-		case PEDIT_ETH_DMAC_31_0:
+		case PEDIT_ETH_DMAC:
 			fs->newdmac = 1;
-			offload_pedit(fs, val, mask, ETH_DMAC_31_0);
-			break;
-		case PEDIT_ETH_DMAC_47_32_SMAC_15_0:
-			if (mask & PEDIT_ETH_DMAC_MASK)
-				offload_pedit(fs, val, mask, ETH_DMAC_47_32);
-			else
-				offload_pedit(fs, val >> 16, mask >> 16,
-					      ETH_SMAC_15_0);
+			offload_pedit(fs, act, ETH_DMAC);
 			break;
-		case PEDIT_ETH_SMAC_47_16:
+		case PEDIT_ETH_SMAC:
 			fs->newsmac = 1;
-			offload_pedit(fs, val, mask, ETH_SMAC_47_16);
+			offload_pedit(fs, act, ETH_SMAC);
+			break;
 		}
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_IP4:
 		switch (offset) {
 		case PEDIT_IP4_SRC:
-			offload_pedit(fs, val, mask, IP4_SRC);
+			offload_pedit(fs, act, IP4_SRC);
 			break;
 		case PEDIT_IP4_DST:
-			offload_pedit(fs, val, mask, IP4_DST);
+			offload_pedit(fs, act, IP4_DST);
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
 		switch (offset) {
-		case PEDIT_IP6_SRC_31_0:
-			offload_pedit(fs, val, mask, IP6_SRC_31_0);
-			break;
-		case PEDIT_IP6_SRC_63_32:
-			offload_pedit(fs, val, mask, IP6_SRC_63_32);
-			break;
-		case PEDIT_IP6_SRC_95_64:
-			offload_pedit(fs, val, mask, IP6_SRC_95_64);
-			break;
-		case PEDIT_IP6_SRC_127_96:
-			offload_pedit(fs, val, mask, IP6_SRC_127_96);
-			break;
-		case PEDIT_IP6_DST_31_0:
-			offload_pedit(fs, val, mask, IP6_DST_31_0);
+		case PEDIT_IP6_SRC:
+			offload_pedit(fs, act, IP6_SRC);
 			break;
-		case PEDIT_IP6_DST_63_32:
-			offload_pedit(fs, val, mask, IP6_DST_63_32);
+		case PEDIT_IP6_DST:
+			offload_pedit(fs, act, IP6_DST);
 			break;
-		case PEDIT_IP6_DST_95_64:
-			offload_pedit(fs, val, mask, IP6_DST_95_64);
-			break;
-		case PEDIT_IP6_DST_127_96:
-			offload_pedit(fs, val, mask, IP6_DST_127_96);
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
 		switch (offset) {
-		case PEDIT_TCP_SPORT_DPORT:
-			if (mask & PEDIT_TCP_UDP_SPORT_MASK)
-				offload_pedit(fs, cpu_to_be32(val) >> 16,
-					      cpu_to_be32(mask) >> 16,
-					      TCP_SPORT);
-			else
-				offload_pedit(fs, cpu_to_be32(val),
-					      cpu_to_be32(mask), TCP_DPORT);
+		case PEDIT_TCP_SPORT:
+			offload_pedit(fs, act, TCP_SPORT);
+			break;
+		case PEDIT_TCP_DPORT:
+			offload_pedit(fs, act, TCP_DPORT);
+			break;
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
 		switch (offset) {
-		case PEDIT_UDP_SPORT_DPORT:
-			if (mask & PEDIT_TCP_UDP_SPORT_MASK)
-				offload_pedit(fs, cpu_to_be32(val) >> 16,
-					      cpu_to_be32(mask) >> 16,
-					      UDP_SPORT);
-			else
-				offload_pedit(fs, cpu_to_be32(val),
-					      cpu_to_be32(mask), UDP_DPORT);
+		case PEDIT_UDP_SPORT:
+			offload_pedit(fs, act, UDP_SPORT);
+			break;
+		case PEDIT_UDP_DPORT:
+			offload_pedit(fs, act, UDP_DPORT);
+			break;
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 	}
@@ -424,17 +392,8 @@ static void cxgb4_process_flow_actions(struct net_device *in,
 			}
 			}
 			break;
-		case FLOW_ACTION_MANGLE: {
-			u32 mask, val, offset;
-			u8 htype;
-
-			htype = act->mangle.htype;
-			mask = act->mangle.mask;
-			val = act->mangle.val;
-			offset = act->mangle.offset;
-
-			process_pedit_field(fs, val, mask, offset, htype);
-			}
+		case FLOW_ACTION_MANGLE:
+			process_pedit_field(fs, act);
 			break;
 		default:
 			break;
@@ -442,35 +401,20 @@ static void cxgb4_process_flow_actions(struct net_device *in,
 	}
 }
 
-static bool valid_l4_mask(u32 mask)
-{
-	u16 hi, lo;
-
-	/* Either the upper 16-bits (SPORT) OR the lower
-	 * 16-bits (DPORT) can be set, but NOT BOTH.
-	 */
-	hi = (mask >> 16) & 0xFFFF;
-	lo = mask & 0xFFFF;
-
-	return hi && lo ? false : true;
-}
-
 static bool valid_pedit_action(struct net_device *dev,
 			       const struct flow_action_entry *act)
 {
-	u32 mask, offset;
+	u32 offset;
 	u8 htype;
 
 	htype = act->mangle.htype;
-	mask = act->mangle.mask;
 	offset = act->mangle.offset;
 
 	switch (htype) {
 	case FLOW_ACT_MANGLE_HDR_TYPE_ETH:
 		switch (offset) {
-		case PEDIT_ETH_DMAC_31_0:
-		case PEDIT_ETH_DMAC_47_32_SMAC_15_0:
-		case PEDIT_ETH_SMAC_47_16:
+		case PEDIT_ETH_DMAC:
+		case PEDIT_ETH_SMAC:
 			break;
 		default:
 			netdev_err(dev, "%s: Unsupported pedit field\n",
@@ -491,14 +435,8 @@ static bool valid_pedit_action(struct net_device *dev,
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
 		switch (offset) {
-		case PEDIT_IP6_SRC_31_0:
-		case PEDIT_IP6_SRC_63_32:
-		case PEDIT_IP6_SRC_95_64:
-		case PEDIT_IP6_SRC_127_96:
-		case PEDIT_IP6_DST_31_0:
-		case PEDIT_IP6_DST_63_32:
-		case PEDIT_IP6_DST_95_64:
-		case PEDIT_IP6_DST_127_96:
+		case PEDIT_IP6_SRC:
+		case PEDIT_IP6_DST:
 			break;
 		default:
 			netdev_err(dev, "%s: Unsupported pedit field\n",
@@ -508,12 +446,8 @@ static bool valid_pedit_action(struct net_device *dev,
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
 		switch (offset) {
-		case PEDIT_TCP_SPORT_DPORT:
-			if (!valid_l4_mask(mask)) {
-				netdev_err(dev, "%s: Unsupported mask for TCP L4 ports\n",
-					   __func__);
-				return false;
-			}
+		case PEDIT_TCP_SPORT:
+		case PEDIT_TCP_DPORT:
 			break;
 		default:
 			netdev_err(dev, "%s: Unsupported pedit field\n",
@@ -523,12 +457,8 @@ static bool valid_pedit_action(struct net_device *dev,
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
 		switch (offset) {
-		case PEDIT_UDP_SPORT_DPORT:
-			if (!valid_l4_mask(mask)) {
-				netdev_err(dev, "%s: Unsupported mask for UDP L4 ports\n",
-					   __func__);
-				return false;
-			}
+		case PEDIT_UDP_SPORT:
+		case PEDIT_UDP_DPORT:
 			break;
 		default:
 			netdev_err(dev, "%s: Unsupported pedit field\n",
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
index eb4c95248baf..03892755a18f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
@@ -55,23 +55,14 @@ struct ch_tc_flower_entry {
 };
 
 enum {
-	ETH_DMAC_31_0,	/* dmac bits 0.. 31 */
-	ETH_DMAC_47_32,	/* dmac bits 32..47 */
-	ETH_SMAC_15_0,	/* smac bits 0.. 15 */
-	ETH_SMAC_47_16,	/* smac bits 16..47 */
+	ETH_DMAC,	/* 48-bits dmac bits */
+	ETH_SMAC,	/* 48-bits smac bits */
 
 	IP4_SRC,	/* 32-bit IPv4 src  */
 	IP4_DST,	/* 32-bit IPv4 dst  */
 
-	IP6_SRC_31_0,	/* src bits 0..  31 */
-	IP6_SRC_63_32,	/* src bits 63.. 32 */
-	IP6_SRC_95_64,	/* src bits 95.. 64 */
-	IP6_SRC_127_96,	/* src bits 127..96 */
-
-	IP6_DST_31_0,	/* dst bits 0..  31 */
-	IP6_DST_63_32,	/* dst bits 63.. 32 */
-	IP6_DST_95_64,	/* dst bits 95.. 64 */
-	IP6_DST_127_96,	/* dst bits 127..96 */
+	IP6_SRC,	/* 128-bit IPv6 src */
+	IP6_DST,	/* 128-bit IPv6 dst */
 
 	TCP_SPORT,	/* 16-bit TCP sport */
 	TCP_DPORT,	/* 16-bit TCP dport */
@@ -90,23 +81,16 @@ struct ch_tc_pedit_fields {
 	{ type## field, size, \
 		offsetof(struct ch_filter_specification, fs_field) + (offset) }
 
-#define PEDIT_ETH_DMAC_MASK		0xffff
-#define PEDIT_TCP_UDP_SPORT_MASK	0xffff
-#define PEDIT_ETH_DMAC_31_0		0x0
-#define PEDIT_ETH_DMAC_47_32_SMAC_15_0	0x4
-#define PEDIT_ETH_SMAC_47_16		0x8
+#define PEDIT_ETH_DMAC			0x0
+#define PEDIT_ETH_SMAC			0x6
+#define PEDIT_IP6_SRC			0x8
+#define PEDIT_IP6_DST			0x18
 #define PEDIT_IP4_SRC			0xC
 #define PEDIT_IP4_DST			0x10
-#define PEDIT_IP6_SRC_31_0		0x8
-#define PEDIT_IP6_SRC_63_32		0xC
-#define PEDIT_IP6_SRC_95_64		0x10
-#define PEDIT_IP6_SRC_127_96		0x14
-#define PEDIT_IP6_DST_31_0		0x18
-#define PEDIT_IP6_DST_63_32		0x1C
-#define PEDIT_IP6_DST_95_64		0x20
-#define PEDIT_IP6_DST_127_96		0x24
-#define PEDIT_TCP_SPORT_DPORT		0x0
-#define PEDIT_UDP_SPORT_DPORT		0x0
+#define PEDIT_TCP_SPORT			0x0
+#define PEDIT_TCP_DPORT			0x2
+#define PEDIT_UDP_SPORT			0x0
+#define PEDIT_UDP_DPORT			0x2
 
 int cxgb4_tc_flower_replace(struct net_device *dev,
 			    struct flow_cls_offload *cls);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index ec47e994b7e0..0b8e34fb8a0a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2202,19 +2202,24 @@ static int pedit_header_offsets[] = {
 
 #define pedit_header(_ph, _htype) ((void *)(_ph) + pedit_header_offsets[_htype])
 
-static int set_pedit_val(u8 hdr_type, u32 mask, u32 val, u32 offset,
+static int set_pedit_val(u8 hdr_type, const struct flow_action_entry *act,
 			 struct pedit_headers_action *hdrs)
 {
-	u32 *curr_pmask, *curr_pval;
+	u32 offset = act->mangle.offset;
+	u8 *curr_pmask, *curr_pval;
+	int i;
 
-	curr_pmask = (u32 *)(pedit_header(&hdrs->masks, hdr_type) + offset);
-	curr_pval  = (u32 *)(pedit_header(&hdrs->vals, hdr_type) + offset);
+	curr_pmask = (u8 *)(pedit_header(&hdrs->masks, hdr_type) + offset);
+	curr_pval  = (u8 *)(pedit_header(&hdrs->vals, hdr_type) + offset);
 
-	if (*curr_pmask & mask)  /* disallow acting twice on the same location */
-		goto out_err;
+	for (i = 0; i < act->mangle.len; i++) {
+		/* disallow acting twice on the same location */
+		if (curr_pmask[i] & act->mangle.mask[i])
+			goto out_err;
 
-	*curr_pmask |= mask;
-	*curr_pval  |= val;
+		curr_pmask[i] |= act->mangle.mask[i];
+		curr_pval[i] |= act->mangle.val[i];
+	}
 
 	return 0;
 
@@ -2488,7 +2493,6 @@ static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 {
 	u8 cmd = (act->id == FLOW_ACTION_MANGLE) ? 0 : 1;
 	int err = -EOPNOTSUPP;
-	u32 mask, val, offset;
 	u8 htype;
 
 	htype = act->mangle.htype;
@@ -2505,11 +2509,7 @@ static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 		goto out_err;
 	}
 
-	mask = act->mangle.mask;
-	val = act->mangle.val;
-	offset = act->mangle.offset;
-
-	err = set_pedit_val(htype, mask, val, offset, &hdrs[cmd]);
+	err = set_pedit_val(htype, act, &hdrs[cmd]);
 	if (err)
 		goto out_err;
 
@@ -2590,49 +2590,19 @@ static bool csum_offload_supported(struct mlx5e_priv *priv,
 	return true;
 }
 
-struct ip_ttl_word {
-	__u8	ttl;
-	__u8	protocol;
-	__sum16	check;
-};
-
-struct ipv6_hoplimit_word {
-	__be16	payload_len;
-	__u8	nexthdr;
-	__u8	hop_limit;
-};
-
 static bool is_action_keys_supported(const struct flow_action_entry *act)
 {
-	u32 mask, offset;
-	u8 htype;
+	u32 offset = act->mangle.offset;
+	u8 htype = act->mangle.htype;
 
-	htype = act->mangle.htype;
-	offset = act->mangle.offset;
-	mask = act->mangle.mask;
-	/* For IPv4 & IPv6 header check 4 byte word,
-	 * to determine that modified fields
-	 * are NOT ttl & hop_limit only.
-	 */
 	if (htype == FLOW_ACT_MANGLE_HDR_TYPE_IP4) {
-		struct ip_ttl_word *ttl_word =
-			(struct ip_ttl_word *)&mask;
-
-		if (offset != offsetof(struct iphdr, ttl) ||
-		    ttl_word->protocol ||
-		    ttl_word->check) {
+		if (offset != offsetof(struct iphdr, ttl))
 			return true;
-		}
 	} else if (htype == FLOW_ACT_MANGLE_HDR_TYPE_IP6) {
-		struct ipv6_hoplimit_word *hoplimit_word =
-			(struct ipv6_hoplimit_word *)&mask;
-
-		if (offset != offsetof(struct ipv6hdr, payload_len) ||
-		    hoplimit_word->payload_len ||
-		    hoplimit_word->nexthdr) {
+		if (offset != offsetof(struct ipv6hdr, hop_limit))
 			return true;
-		}
 	}
+
 	return false;
 }
 
@@ -2727,19 +2697,21 @@ static int add_vlan_rewrite_action(struct mlx5e_priv *priv, int namespace,
 				   struct pedit_headers_action *hdrs,
 				   u32 *action, struct netlink_ext_ack *extack)
 {
-	u16 mask16 = VLAN_VID_MASK;
-	u16 val16 = act->vlan.vid & VLAN_VID_MASK;
-	const struct flow_action_entry pedit_act = {
+	__be16 mask16 = htons(VLAN_VID_MASK);
+	__be16 val16 = htons(act->vlan.vid & VLAN_VID_MASK);
+	struct flow_action_entry pedit_act = {
 		.id = FLOW_ACTION_MANGLE,
 		.mangle.htype = FLOW_ACT_MANGLE_HDR_TYPE_ETH,
 		.mangle.offset = offsetof(struct vlan_ethhdr, h_vlan_TCI),
-		.mangle.mask = (u32)be16_to_cpu(*(__be16 *)&mask16),
-		.mangle.val = (u32)be16_to_cpu(*(__be16 *)&val16),
+		.mangle.len = 2,
 	};
 	u8 match_prio_mask, match_prio_val;
 	void *headers_c, *headers_v;
 	int err;
 
+	memcpy(pedit_act.mangle.mask, &mask16, sizeof(__be16));
+	memcpy(pedit_act.mangle.val, &val16, sizeof(__be16));
+
 	headers_c = get_match_headers_criteria(*action, &parse_attr->spec);
 	headers_v = get_match_headers_value(*action, &parse_attr->spec);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 592c36ba9e3f..a525ec244b79 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -472,38 +472,44 @@ nfp_fl_set_ipv4_tun(struct nfp_app *app, struct nfp_fl_set_ipv4_tun *set_tun,
 	return 0;
 }
 
-static void nfp_fl_set_helper32(u32 value, u32 mask, u8 *p_exact, u8 *p_mask)
+static void nfp_fl_set_helper(const u8 *value, const u8 *mask,
+			      u8 *p_exact, u8 *p_mask, int len)
 {
-	u32 oldvalue = get_unaligned((u32 *)p_exact);
-	u32 oldmask = get_unaligned((u32 *)p_mask);
+	int i;
 
-	value |= oldvalue & ~mask;
-
-	put_unaligned(oldmask | mask, (u32 *)p_mask);
-	put_unaligned(value, (u32 *)p_exact);
+	for (i = 0; i < len; i++) {
+		p_exact[i] = (p_exact[i] & ~mask[i]) | value[i];
+		p_mask[i] |= mask[i];
+	}
 }
 
 static int
 nfp_fl_set_eth(const struct flow_action_entry *act, u32 off,
 	       struct nfp_fl_set_eth *set_eth, struct netlink_ext_ack *extack)
 {
-	u32 exact, mask;
+	int i;
 
-	if (off + 4 > ETH_ALEN * 2) {
+	switch (off) {
+	case offsetof(struct ethhdr, h_dest):
+		i = 0;
+		break;
+	case offsetof(struct ethhdr, h_source):
+		i = ETH_ALEN;
+		break;
+	default:
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit ethernet action");
 		return -EOPNOTSUPP;
 	}
 
-	mask = act->mangle.mask;
-	exact = act->mangle.val;
-
-	if (exact & ~mask) {
+	if (memchr_inv(&act->mangle.val, 0, act->mangle.len) &&
+	    !memchr_inv(&act->mangle.mask, 0, act->mangle.len)) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit ethernet action");
 		return -EOPNOTSUPP;
 	}
 
-	nfp_fl_set_helper32(exact, mask, &set_eth->eth_addr_val[off],
-			    &set_eth->eth_addr_mask[off]);
+	nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+			  &set_eth->eth_addr_val[i],
+			  &set_eth->eth_addr_mask[i], act->mangle.len);
 
 	set_eth->reserved = cpu_to_be16(0);
 	set_eth->head.jump_id = NFP_FL_ACTION_OPCODE_SET_ETHERNET;
@@ -524,68 +530,46 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	       struct nfp_fl_set_ip4_ttl_tos *set_ip_ttl_tos,
 	       struct netlink_ext_ack *extack)
 {
-	struct ipv4_ttl_word *ttl_word_mask;
-	struct ipv4_ttl_word *ttl_word;
-	struct iphdr *tos_word_mask;
-	struct iphdr *tos_word;
-	__be32 exact, mask;
-
-	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)act->mangle.mask;
-	exact = (__force __be32)act->mangle.val;
-
-	if (exact & ~mask) {
+	if (memchr_inv(act->mangle.val, 0, act->mangle.len) &&
+	    !memchr_inv(act->mangle.mask, 0, act->mangle.len)) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv4 action");
 		return -EOPNOTSUPP;
 	}
 
 	switch (off) {
 	case offsetof(struct iphdr, daddr):
-		set_ip_addr->ipv4_dst_mask |= mask;
-		set_ip_addr->ipv4_dst &= ~mask;
-		set_ip_addr->ipv4_dst |= exact;
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&set_ip_addr->ipv4_dst,
+				  (u8 *)&set_ip_addr->ipv4_dst_mask,
+				  act->mangle.len);
 		set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
 		set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
 					   NFP_FL_LW_SIZ;
 		break;
 	case offsetof(struct iphdr, saddr):
-		set_ip_addr->ipv4_src_mask |= mask;
-		set_ip_addr->ipv4_src &= ~mask;
-		set_ip_addr->ipv4_src |= exact;
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&set_ip_addr->ipv4_src,
+				  (u8 *)&set_ip_addr->ipv4_src_mask,
+				  act->mangle.len);
 		set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
 		set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
 					   NFP_FL_LW_SIZ;
 		break;
 	case offsetof(struct iphdr, ttl):
-		ttl_word_mask = (struct ipv4_ttl_word *)&mask;
-		ttl_word = (struct ipv4_ttl_word *)&exact;
-
-		if (ttl_word_mask->protocol || ttl_word_mask->check) {
-			NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv4 ttl action");
-			return -EOPNOTSUPP;
-		}
-
-		set_ip_ttl_tos->ipv4_ttl_mask |= ttl_word_mask->ttl;
-		set_ip_ttl_tos->ipv4_ttl &= ~ttl_word_mask->ttl;
-		set_ip_ttl_tos->ipv4_ttl |= ttl_word->ttl & ttl_word_mask->ttl;
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&set_ip_ttl_tos->ipv4_ttl,
+				  (u8 *)&set_ip_ttl_tos->ipv4_ttl_mask,
+				  act->mangle.len);
 		set_ip_ttl_tos->head.jump_id =
 			NFP_FL_ACTION_OPCODE_SET_IPV4_TTL_TOS;
 		set_ip_ttl_tos->head.len_lw = sizeof(*set_ip_ttl_tos) >>
 					      NFP_FL_LW_SIZ;
 		break;
-	case round_down(offsetof(struct iphdr, tos), 4):
-		tos_word_mask = (struct iphdr *)&mask;
-		tos_word = (struct iphdr *)&exact;
-
-		if (tos_word_mask->version || tos_word_mask->ihl ||
-		    tos_word_mask->tot_len) {
-			NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv4 tos action");
-			return -EOPNOTSUPP;
-		}
-
-		set_ip_ttl_tos->ipv4_tos_mask |= tos_word_mask->tos;
-		set_ip_ttl_tos->ipv4_tos &= ~tos_word_mask->tos;
-		set_ip_ttl_tos->ipv4_tos |= tos_word->tos & tos_word_mask->tos;
+	case offsetof(struct iphdr, tos):
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&set_ip_ttl_tos->ipv4_tos,
+				  (u8 *)&set_ip_ttl_tos->ipv4_tos_mask,
+				  act->mangle.len);
 		set_ip_ttl_tos->head.jump_id =
 			NFP_FL_ACTION_OPCODE_SET_IPV4_TTL_TOS;
 		set_ip_ttl_tos->head.len_lw = sizeof(*set_ip_ttl_tos) >>
@@ -600,12 +584,17 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 }
 
 static void
-nfp_fl_set_ip6_helper(int opcode_tag, u8 word, __be32 exact, __be32 mask,
-		      struct nfp_fl_set_ipv6_addr *ip6)
+nfp_fl_set_ip6_helper(int opcode_tag, const struct flow_action_entry *act,
+		      struct nfp_fl_set_ipv6_addr *ip6,
+		      struct netlink_ext_ack *extack)
 {
-	ip6->ipv6[word].mask |= mask;
-	ip6->ipv6[word].exact &= ~mask;
-	ip6->ipv6[word].exact |= exact;
+	int i, j;
+
+	for (i = 0, j = 0; i < sizeof(struct in6_addr); i++, j += sizeof(u32)) {
+		nfp_fl_set_helper(&act->mangle.val[j], &act->mangle.mask[j],
+				  (u8 *)&ip6->ipv6[i].exact,
+				  (u8 *)&ip6->ipv6[i].mask, sizeof(u32));
+	}
 
 	ip6->reserved = cpu_to_be16(0);
 	ip6->head.jump_id = opcode_tag;
@@ -619,39 +608,34 @@ struct ipv6_hop_limit_word {
 };
 
 static int
-nfp_fl_set_ip6_hop_limit_flow_label(u32 off, __be32 exact, __be32 mask,
+nfp_fl_set_ip6_hop_limit_flow_label(u32 off,
+				    const struct flow_action_entry *act,
 				    struct nfp_fl_set_ipv6_tc_hl_fl *ip_hl_fl,
 				    struct netlink_ext_ack *extack)
 {
-	struct ipv6_hop_limit_word *fl_hl_mask;
-	struct ipv6_hop_limit_word *fl_hl;
-
 	switch (off) {
-	case offsetof(struct ipv6hdr, payload_len):
-		fl_hl_mask = (struct ipv6_hop_limit_word *)&mask;
-		fl_hl = (struct ipv6_hop_limit_word *)&exact;
-
-		if (fl_hl_mask->nexthdr || fl_hl_mask->payload_len) {
-			NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 hop limit action");
-			return -EOPNOTSUPP;
-		}
-
-		ip_hl_fl->ipv6_hop_limit_mask |= fl_hl_mask->hop_limit;
-		ip_hl_fl->ipv6_hop_limit &= ~fl_hl_mask->hop_limit;
-		ip_hl_fl->ipv6_hop_limit |= fl_hl->hop_limit &
-					    fl_hl_mask->hop_limit;
+	case offsetof(struct ipv6hdr, hop_limit):
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  &ip_hl_fl->ipv6_hop_limit,
+				  &ip_hl_fl->ipv6_hop_limit_mask,
+				  act->mangle.len);
 		break;
-	case round_down(offsetof(struct ipv6hdr, flow_lbl), 4):
-		if (mask & ~IPV6_FLOW_LABEL_MASK ||
-		    exact & ~IPV6_FLOW_LABEL_MASK) {
+	case 1: /* offsetof(struct ipv6hdr, flow_lbl) */
+		/* The initial 4-bits are part of the traffic class field. */
+		if (act->mangle.val[0] & 0xf0 ||
+		    act->mangle.mask[0] & 0xf0) {
 			NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 flow label action");
 			return -EOPNOTSUPP;
 		}
 
-		ip_hl_fl->ipv6_label_mask |= mask;
-		ip_hl_fl->ipv6_label &= ~mask;
-		ip_hl_fl->ipv6_label |= exact;
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&ip_hl_fl->ipv6_label,
+				  (u8 *)&ip_hl_fl->ipv6_label_mask,
+				  act->mangle.len);
 		break;
+	default:
+		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 action");
+		return -EOPNOTSUPP;
 	}
 
 	ip_hl_fl->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV6_TC_HL_FL;
@@ -667,31 +651,23 @@ nfp_fl_set_ip6(const struct flow_action_entry *act, u32 off,
 	       struct nfp_fl_set_ipv6_tc_hl_fl *ip_hl_fl,
 	       struct netlink_ext_ack *extack)
 {
-	__be32 exact, mask;
 	int err = 0;
-	u8 word;
 
-	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)act->mangle.mask;
-	exact = (__force __be32)act->mangle.val;
-
-	if (exact & ~mask) {
+	if (memchr_inv(act->mangle.val, 0, act->mangle.len) &&
+	    !memchr_inv(act->mangle.mask, 0, act->mangle.len)) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 action");
 		return -EOPNOTSUPP;
 	}
 
 	if (off < offsetof(struct ipv6hdr, saddr)) {
-		err = nfp_fl_set_ip6_hop_limit_flow_label(off, exact, mask,
-							  ip_hl_fl, extack);
-	} else if (off < offsetof(struct ipv6hdr, daddr)) {
-		word = (off - offsetof(struct ipv6hdr, saddr)) / sizeof(exact);
-		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_SRC, word,
-				      exact, mask, ip_src);
-	} else if (off < offsetof(struct ipv6hdr, daddr) +
-		       sizeof(struct in6_addr)) {
-		word = (off - offsetof(struct ipv6hdr, daddr)) / sizeof(exact);
-		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_DST, word,
-				      exact, mask, ip_dst);
+		err = nfp_fl_set_ip6_hop_limit_flow_label(off, act, ip_hl_fl,
+							  extack);
+	} else if (off == offsetof(struct ipv6hdr, saddr)) {
+		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_SRC,
+				      act, ip_src, extack);
+	} else if (off == offsetof(struct ipv6hdr, daddr)) {
+		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_DST,
+				      act, ip_dst, extack);
 	} else {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: pedit on unsupported section of IPv6 header");
 		return -EOPNOTSUPP;
@@ -705,23 +681,30 @@ nfp_fl_set_tport(const struct flow_action_entry *act, u32 off,
 		 struct nfp_fl_set_tport *set_tport, int opcode,
 		 struct netlink_ext_ack *extack)
 {
-	u32 exact, mask;
+	int i;
 
-	if (off) {
+	switch (off) {
+	case offsetof(struct tcphdr, source):
+		i = 0;
+		break;
+	case offsetof(struct tcphdr, dest):
+		i = sizeof(u16);
+		break;
+	default:
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: pedit on unsupported section of L4 header");
 		return -EOPNOTSUPP;
 	}
 
-	mask = act->mangle.mask;
-	exact = act->mangle.val;
-
-	if (exact & ~mask) {
+	if (memchr_inv(act->mangle.val, 0, act->mangle.len) &&
+	    !memchr_inv(act->mangle.mask, 0, act->mangle.len)) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit L4 action");
 		return -EOPNOTSUPP;
 	}
 
-	nfp_fl_set_helper32(exact, mask, set_tport->tp_port_val,
-			    set_tport->tp_port_mask);
+	nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+			  &set_tport->tp_port_val[i],
+			  &set_tport->tp_port_mask[i],
+			  act->mangle.len);
 
 	set_tport->reserved = cpu_to_be16(0);
 	set_tport->head.jump_id = opcode;
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index fc881875f856..a2853b37dded 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -154,6 +154,8 @@ enum flow_action_mangle_base {
 	FLOW_ACT_MANGLE_HDR_TYPE_UDP,
 };
 
+#define FLOW_ACTION_MANGLE_MAXLEN	16
+
 struct flow_action_entry {
 	enum flow_action_id		id;
 	union {
@@ -167,8 +169,9 @@ struct flow_action_entry {
 		struct {				/* FLOW_ACTION_PACKET_EDIT */
 			enum flow_action_mangle_base htype;
 			u32		offset;
-			u32		mask;
-			u32		val;
+			u8		mask[FLOW_ACTION_MANGLE_MAXLEN];
+			u8		val[FLOW_ACTION_MANGLE_MAXLEN];
+			u8		len;
 		} mangle;
 		const struct ip_tunnel_info *tunnel;	/* FLOW_ACTION_TUNNEL_ENCAP */
 		u32			csum_flags;	/* FLOW_ACTION_CSUM */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e30a151d8527..468db33292a9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3289,11 +3289,128 @@ void tc_cleanup_flow_action(struct flow_action *flow_action)
 }
 EXPORT_SYMBOL(tc_cleanup_flow_action);
 
+static unsigned int flow_action_mangle_type(enum pedit_cmd cmd)
+{
+	switch (cmd) {
+	case TCA_PEDIT_KEY_EX_CMD_SET:
+		return FLOW_ACTION_MANGLE;
+	case TCA_PEDIT_KEY_EX_CMD_ADD:
+		return FLOW_ACTION_ADD;
+	default:
+		WARN_ON_ONCE(1);
+	}
+	return 0;
+}
+
+struct flow_action_mangle_ctx {
+	u8	cmd;
+	u8	offset;
+	u8	htype;
+	u8	idx;
+	u8	val[FLOW_ACTION_MANGLE_MAXLEN];
+	u8	mask[FLOW_ACTION_MANGLE_MAXLEN];
+	u32	first_word;
+	u32	last_word;
+};
+
+static void flow_action_mangle_entry(struct flow_action_entry *entry,
+				     const struct flow_action_mangle_ctx *ctx)
+{
+	u32 delta;
+
+	entry->id = ctx->cmd;
+	entry->mangle.htype = ctx->htype;
+	entry->mangle.offset = ctx->offset;
+	entry->mangle.len = ctx->idx;
+
+	/* Adjust offset. */
+	delta = sizeof(u32) - (fls(ctx->first_word) / BITS_PER_BYTE);
+	entry->mangle.offset += delta;
+
+	/* Adjust length. */
+	entry->mangle.len -= ((ffs(ctx->last_word) / BITS_PER_BYTE) + delta);
+
+	/* Copy value and mask from offset using the adjusted length. */
+	memcpy(entry->mangle.val, &ctx->val[delta], entry->mangle.len);
+	memcpy(entry->mangle.mask, &ctx->mask[delta], entry->mangle.len);
+}
+
+static void flow_action_mangle_ctx_update(struct flow_action_mangle_ctx *ctx,
+					  const struct tc_action *act, int k)
+{
+	u32 val, mask;
+
+	val = tcf_pedit_val(act, k);
+	mask = ~tcf_pedit_mask(act, k);
+
+	memcpy(&ctx->val[ctx->idx], &val, sizeof(u32));
+	memcpy(&ctx->mask[ctx->idx], &mask, sizeof(u32));
+	ctx->idx += sizeof(u32);
+}
+
+static void flow_action_mangle_ctx_init(struct flow_action_mangle_ctx *ctx,
+					const struct tc_action *act, int k)
+{
+	ctx->cmd = flow_action_mangle_type(tcf_pedit_cmd(act, k));
+	ctx->offset = tcf_pedit_offset(act, k);
+	ctx->htype = tcf_pedit_htype(act, k);
+	ctx->idx = 0;
+
+	ctx->first_word = ntohl(~tcf_pedit_mask(act, k));
+	ctx->last_word = ctx->first_word;
+
+	flow_action_mangle_ctx_update(ctx, act, k);
+}
+
+static int flow_action_mangle(struct flow_action *flow_action,
+			      struct flow_action_entry *entry,
+			      const struct tc_action *act, int j)
+{
+	struct flow_action_mangle_ctx ctx;
+	int k;
+
+	if (tcf_pedit_cmd(act, 0) > TCA_PEDIT_KEY_EX_CMD_ADD)
+		return -1;
+
+	flow_action_mangle_ctx_init(&ctx, act, 0);
+
+	/* Special case: one single 32-bits word. */
+	if (tcf_pedit_nkeys(act) == 1) {
+		flow_action_mangle_entry(entry, &ctx);
+		return j;
+	}
+
+	for (k = 1; k < tcf_pedit_nkeys(act); k++) {
+		if (tcf_pedit_cmd(act, k) > TCA_PEDIT_KEY_EX_CMD_ADD)
+			return -1;
+
+		/* Offset is contiguous and type is the same, coalesce. */
+		if (ctx.idx < FLOW_ACTION_MANGLE_MAXLEN &&
+		    ctx.offset + ctx.idx == tcf_pedit_offset(act, k) &&
+		    ctx.cmd == flow_action_mangle_type(tcf_pedit_cmd(act, k))) {
+			flow_action_mangle_ctx_update(&ctx, act, k);
+			continue;
+		}
+		ctx.last_word = ntohl(~tcf_pedit_mask(act, k - 1));
+
+		/* Cannot coalesce, set up this entry. */
+		flow_action_mangle_entry(entry, &ctx);
+
+		flow_action_mangle_ctx_init(&ctx, act, k);
+		entry = &flow_action->entries[++j];
+	}
+
+	ctx.last_word = ntohl(~tcf_pedit_mask(act, k - 1));
+	flow_action_mangle_entry(entry, &ctx);
+
+	return j;
+}
+
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts, bool rtnl_held)
 {
 	const struct tc_action *act;
-	int i, j, k, err = 0;
+	int i, j, err = 0;
 
 	if (!exts)
 		return 0;
@@ -3366,25 +3483,9 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 		} else if (is_tcf_tunnel_release(act)) {
 			entry->id = FLOW_ACTION_TUNNEL_DECAP;
 		} else if (is_tcf_pedit(act)) {
-			for (k = 0; k < tcf_pedit_nkeys(act); k++) {
-				switch (tcf_pedit_cmd(act, k)) {
-				case TCA_PEDIT_KEY_EX_CMD_SET:
-					entry->id = FLOW_ACTION_MANGLE;
-					break;
-				case TCA_PEDIT_KEY_EX_CMD_ADD:
-					entry->id = FLOW_ACTION_ADD;
-					break;
-				default:
-					err = -EOPNOTSUPP;
-					goto err_out;
-				}
-				entry->mangle.htype = tcf_pedit_htype(act, k);
-				entry->mangle.mask = ~tcf_pedit_mask(act, k);
-				entry->mangle.val = tcf_pedit_val(act, k) &
-							entry->mangle.mask;
-				entry->mangle.offset = tcf_pedit_offset(act, k);
-				entry = &flow_action->entries[++j];
-			}
+			j = flow_action_mangle(flow_action, entry, act, j);
+			if (j < 0)
+				goto err_out;
 		} else if (is_tcf_csum(act)) {
 			entry->id = FLOW_ACTION_CSUM;
 			entry->csum_flags = tcf_csum_update_flags(act);
@@ -3439,9 +3540,9 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 			goto err_out;
 		}
 
-		if (!is_tcf_pedit(act))
-			j++;
+		j++;
 	}
+	flow_action->num_entries = j;
 
 err_out:
 	if (!rtnl_held)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next,v3 4/4] netfilter: nft_payload: packet mangling offload support
  2019-09-06  0:03 [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2019-09-06  0:04 ` [PATCH net-next,v3 3/4] net: flow_offload: mangle action at byte level Pablo Neira Ayuso
@ 2019-09-06  0:04 ` Pablo Neira Ayuso
  2019-09-06 10:02 ` [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Edward Cree
  4 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06  0:04 UTC (permalink / raw)
  To: netfilter-devel
  Cc: davem, netdev, jakub.kicinski, jiri, saeedm, vishal, vladbu

This patch allows for mangling packet fields using hardware offload
infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_payload.c | 73 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 22a80eb60222..0efa8bfd2b51 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -562,12 +562,85 @@ static int nft_payload_set_dump(struct sk_buff *skb, const struct nft_expr *expr
 	return -1;
 }
 
+static int nft_payload_offload_set_nh(struct nft_offload_ctx *ctx,
+				      struct nft_flow_rule *flow,
+				      const struct nft_payload_set *priv)
+{
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+
+	switch (ctx->dep.l3num) {
+	case htons(ETH_P_IP):
+		type = FLOW_ACT_MANGLE_HDR_TYPE_IP4;
+		break;
+	case htons(ETH_P_IPV6):
+		type = FLOW_ACT_MANGLE_HDR_TYPE_IP6;
+		break;
+	}
+
+	return type;
+}
+
+static int nft_payload_offload_set_th(struct nft_offload_ctx *ctx,
+				      struct nft_flow_rule *flow,
+				      const struct nft_payload_set *priv)
+{
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+
+	switch (ctx->dep.protonum) {
+	case IPPROTO_TCP:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_TCP;
+		break;
+	case IPPROTO_UDP:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_UDP;
+		break;
+	}
+
+	return type;
+}
+
+static int nft_payload_set_offload(struct nft_offload_ctx *ctx,
+				   struct nft_flow_rule *flow,
+				   const struct nft_expr *expr)
+{
+	const struct nft_payload_set *priv = nft_expr_priv(expr);
+	struct nft_offload_reg *sreg = &ctx->regs[priv->sreg];
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+	struct flow_action_entry *entry;
+
+	switch (priv->base) {
+	case NFT_PAYLOAD_LL_HEADER:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_ETH;
+		break;
+	case NFT_PAYLOAD_NETWORK_HEADER:
+		type = nft_payload_offload_set_nh(ctx, flow, priv);
+		break;
+	case NFT_PAYLOAD_TRANSPORT_HEADER:
+		type = nft_payload_offload_set_th(ctx, flow, priv);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	entry = &flow->rule->action.entries[ctx->num_actions++];
+	entry->id		= FLOW_ACTION_MANGLE;
+	entry->mangle.htype	= type;
+	entry->mangle.offset	= priv->offset;
+	entry->mangle.len	= priv->len;
+
+	memcpy(entry->mangle.val, sreg->data.data, priv->len);
+	memset(entry->mangle.mask, 0xff, priv->len);
+
+	return type != FLOW_ACT_MANGLE_UNSPEC ? 0 : -EOPNOTSUPP;
+}
+
 static const struct nft_expr_ops nft_payload_set_ops = {
 	.type		= &nft_payload_type,
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_payload_set)),
 	.eval		= nft_payload_set_eval,
 	.init		= nft_payload_set_init,
 	.dump		= nft_payload_set_dump,
+	.offload	= nft_payload_set_offload,
 };
 
 static const struct nft_expr_ops *
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06  0:03 [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2019-09-06  0:04 ` [PATCH net-next,v3 4/4] netfilter: nft_payload: packet mangling offload support Pablo Neira Ayuso
@ 2019-09-06 10:02 ` Edward Cree
  2019-09-06 10:56   ` Pablo Neira Ayuso
  4 siblings, 1 reply; 15+ messages in thread
From: Edward Cree @ 2019-09-06 10:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel
  Cc: davem, netdev, jakub.kicinski, jiri, saeedm, vishal, vladbu

On 06/09/2019 01:03, Pablo Neira Ayuso wrote:
> This patch updates the mangle action representation:
>
> Patch 1) Undo bitwise NOT operation on the mangle mask (coming from tc
>          pedit userspace).
>
> Patch 2) mangle value &= mask from the front-end side.
>
> Patch 3) adjust offset, length and coalesce consecutive pedit keys into
>          one single action.
>
> Patch 4) add support for payload mangling for netfilter.
>
> After this patchset:
>
> * Offset to payload does not need to be on the 32-bits boundaries anymore.
>   This patchset adds front-end code to adjust the offset and length coming
>   from the tc pedit representation, so drivers get an exact header field
>   offset and length.
>
> * This new front-end code coalesces consecutive pedit actions into one
>   single action, so drivers can mangle IPv6 and ethernet address fields
>   in one go, instead once for each 32-bit word.
>
> On the driver side, diffstat -t shows that drivers code to deal with
> payload mangling gets simplified:
>
>         INSERTED,DELETED,MODIFIED,FILENAME
>         46,116,0,drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c (-70 LOC)
>         12,28,0,drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h (-16 LOC)
> 	26,54,0,drivers/net/ethernet/mellanox/mlx5/core/en_tc.c (-27 LOC)
>         89,111,0,drivers/net/ethernet/netronome/nfp/flower/action.c (-17 LOC)
>
> While, on the front-end side the balance is the following:
>
>         123,22,0,net/sched/cls_api.c (+101 LOC)
>
> Changes since v2:
>
> * Fix is_action_keys_supported() breakage in mlx5 reported by Vlad Buslov.
Still NAK for the same reasons as three versions ago (when it was called
 "netfilter: payload mangling offload support"), you've never managed to
 explain why this extra API complexity is useful.  (Reducing LOC does not
 mean you've reduced complexity.)

As Jakub said, 'We suffered through enough haphazard "updates"'.  Please
 can you fix the problems your previous API changes caused (I still haven't
 had an answer about the flow block changes since sending you my driver code
 two weeks ago) before trying to ram new ones through.

-Ed

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 10:02 ` [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Edward Cree
@ 2019-09-06 10:56   ` Pablo Neira Ayuso
  2019-09-06 12:55     ` Edward Cree
  0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06 10:56 UTC (permalink / raw)
  To: Edward Cree
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

On Fri, Sep 06, 2019 at 11:02:18AM +0100, Edward Cree wrote:
> On 06/09/2019 01:03, Pablo Neira Ayuso wrote:
> > This patch updates the mangle action representation:
> >
> > Patch 1) Undo bitwise NOT operation on the mangle mask (coming from tc
> >          pedit userspace).
> >
> > Patch 2) mangle value &= mask from the front-end side.
> >
> > Patch 3) adjust offset, length and coalesce consecutive pedit keys into
> >          one single action.
> >
> > Patch 4) add support for payload mangling for netfilter.
> >
> > After this patchset:
> >
> > * Offset to payload does not need to be on the 32-bits boundaries anymore.
> >   This patchset adds front-end code to adjust the offset and length coming
> >   from the tc pedit representation, so drivers get an exact header field
> >   offset and length.
> >
> > * This new front-end code coalesces consecutive pedit actions into one
> >   single action, so drivers can mangle IPv6 and ethernet address fields
> >   in one go, instead once for each 32-bit word.
> >
> > On the driver side, diffstat -t shows that drivers code to deal with
> > payload mangling gets simplified:
> >
> >         INSERTED,DELETED,MODIFIED,FILENAME
> >         46,116,0,drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c (-70 LOC)
> >         12,28,0,drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h (-16 LOC)
> > 	26,54,0,drivers/net/ethernet/mellanox/mlx5/core/en_tc.c (-27 LOC)
> >         89,111,0,drivers/net/ethernet/netronome/nfp/flower/action.c (-17 LOC)
> >
> > While, on the front-end side the balance is the following:
> >
> >         123,22,0,net/sched/cls_api.c (+101 LOC)
> >
> > Changes since v2:
> >
> > * Fix is_action_keys_supported() breakage in mlx5 reported by Vlad Buslov.
>
> Still NAK for the same reasons as three versions ago (when it was called
>  "netfilter: payload mangling offload support"), you've never managed to
>  explain why this extra API complexity is useful.  (Reducing LOC does not
>  mean you've reduced complexity.)

Oh well...

Patch 1) Mask is inverted for no reason, just because tc pedit needs
this in this way. All drivers reverse this mask.

Patch 2) All drivers mask out meaningless fields in the value field.

Patch 3) Without this patchset, offsets are on the 32-bit boundary.
Drivers need to play with the 32-bit mask to infer what field they are
supposed to mangle... eg. with 32-bit offset alignment, checking if
the use want to alter the ttl/hop_limit need for helper structures to
check the 32-bit mask. Mangling a IPv6 address comes with one single
action...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 10:56   ` Pablo Neira Ayuso
@ 2019-09-06 12:55     ` Edward Cree
  2019-09-06 13:14       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 15+ messages in thread
From: Edward Cree @ 2019-09-06 12:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

On 06/09/2019 11:56, Pablo Neira Ayuso wrote:
> On Fri, Sep 06, 2019 at 11:02:18AM +0100, Edward Cree wrote:
>> Still NAK for the same reasons as three versions ago (when it was called
>>  "netfilter: payload mangling offload support"), you've never managed to
>>  explain why this extra API complexity is useful.  (Reducing LOC does not
>>  mean you've reduced complexity.)
> Oh well...
>
> Patch 1) Mask is inverted for no reason, just because tc pedit needs
> this in this way. All drivers reverse this mask.
>
> Patch 2) All drivers mask out meaningless fields in the value field.
To be clear: I have no issue with these two patches; they look fine to me.
(Though I'd like to see some comments on struct flow_action_entry specifying
 the semantics of these fields, especially if they're going to differ from
 the corresponding fields in struct tc_pedit_key.)

> Patch 3) Without this patchset, offsets are on the 32-bit boundary.
> Drivers need to play with the 32-bit mask to infer what field they are
> supposed to mangle... eg. with 32-bit offset alignment, checking if
> the use want to alter the ttl/hop_limit need for helper structures to
> check the 32-bit mask. Mangling a IPv6 address comes with one single
> action...
Drivers are still going to need to handle multiple pedit actions, in
 case the original rule wanted to mangle two non-consecutive fields.
And you can't just coalesce all consecutive mangles, because if you
 mangle two consecutive fields (e.g. UDP sport and dport) the driver
 still needs to disentangle that if it works on a 'fields' (rather
 than 'u32s') level.
So either have the core convert things into named protocol fields
 (i.e. "set src IPv6 to 1234::5 and add 1 to UDP sport"), or leave
 the current sequence-of-u32-mangles as it is.  This in-between "we'll
 coalesce things together despite not knowing what fields they are" is
 neither fish nor fowl.

-Ed

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 12:55     ` Edward Cree
@ 2019-09-06 13:14       ` Pablo Neira Ayuso
  2019-09-06 13:37         ` Edward Cree
  0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06 13:14 UTC (permalink / raw)
  To: Edward Cree
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

On Fri, Sep 06, 2019 at 01:55:29PM +0100, Edward Cree wrote:
> On 06/09/2019 11:56, Pablo Neira Ayuso wrote:
> > On Fri, Sep 06, 2019 at 11:02:18AM +0100, Edward Cree wrote:
> >> Still NAK for the same reasons as three versions ago (when it was called
> >>  "netfilter: payload mangling offload support"), you've never managed to
> >>  explain why this extra API complexity is useful.  (Reducing LOC does not
> >>  mean you've reduced complexity.)
> > Oh well...
> >
> > Patch 1) Mask is inverted for no reason, just because tc pedit needs
> > this in this way. All drivers reverse this mask.
> >
> > Patch 2) All drivers mask out meaningless fields in the value field.
>
> To be clear: I have no issue with these two patches; they look fine to me.
> (Though I'd like to see some comments on struct flow_action_entry specifying
>  the semantics of these fields, especially if they're going to differ from
>  the corresponding fields in struct tc_pedit_key.)

OK, I can document this semantics, I need just _time_ to write that
documentation. I was expecting this patch description is enough by now
until I can get to finish that documentation.

> > Patch 3) Without this patchset, offsets are on the 32-bit boundary.
> > Drivers need to play with the 32-bit mask to infer what field they are
> > supposed to mangle... eg. with 32-bit offset alignment, checking if
> > the use want to alter the ttl/hop_limit need for helper structures to
> > check the 32-bit mask. Mangling a IPv6 address comes with one single
> > action...
>
> Drivers are still going to need to handle multiple pedit actions, in
>  case the original rule wanted to mangle two non-consecutive fields.
> And you can't just coalesce all consecutive mangles, because if you
>  mangle two consecutive fields (e.g. UDP sport and dport) the driver
>  still needs to disentangle that if it works on a 'fields' (rather
>  than 'u32s') level.

This infrastructure is _not_ coalescing two consecutive field, e.g.
UDP sport and dport is _not_ coalesced. The coalesce routine does
_not_ handle multiple tc pedit ex actions.

I'll clarify this in the cover letter in the next patchset round.

>  So either have the core convert things into named protocol fields
>  (i.e. "set src IPv6 to 1234::5 and add 1 to UDP sport"), or leave
>  the current sequence-of-u32-mangles as it is. This in-between "we'll
>  coalesce things together despite not knowing what fields they are" is
>  neither fish nor fowl.

The model you propose would still need this code for tc pedit to
adjust offset/length and coalesce u32 fields.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 13:14       ` Pablo Neira Ayuso
@ 2019-09-06 13:37         ` Edward Cree
  2019-09-06 14:50           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 15+ messages in thread
From: Edward Cree @ 2019-09-06 13:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

On 06/09/2019 14:14, Pablo Neira Ayuso wrote:
> OK, I can document this semantics, I need just _time_ to write that
> documentation. I was expecting this patch description is enough by now
> until I can get to finish that documentation.
I think for two structs with apparently the same contents but different
 semantics (one has the mask bitwise complemented) it's best to hold up
 the code change until the comment is ready to come with it, because
 until then it's a dangerously confusing situation.

>> And you can't just coalesce all consecutive mangles, because if you
>>  mangle two consecutive fields (e.g. UDP sport and dport) the driver
>>  still needs to disentangle that if it works on a 'fields' (rather
>>  than 'u32s') level.
> This infrastructure is _not_ coalescing two consecutive field, e.g.
> UDP sport and dport is _not_ coalesced. The coalesce routine does
> _not_ handle multiple tc pedit ex actions.
So an IPv6 address mangle only comes as a single action if it's from
 netfilter, not if it's coming from TC pedit.  Therefore drivers still
 need to handle an IPv6 or MAC address mangle coming in multiple
 actions, therefore your driver simplifications are invalid.  No?

> The model you propose would still need this code for tc pedit to
> adjust offset/length and coalesce u32 fields.
Yes, but we don't add code/features to the kernel based on what we
 *could* use it for later; every submission has to be self-contained
 in providing something of demonstrable value.  So either implement
 "the model I propose" (which to be clear I'm *not* proposing, I want
 the u32 pedit left as it is; it's just that it's a better model than
 what you're implementing here), or leave well alone.

-Ed

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 13:37         ` Edward Cree
@ 2019-09-06 14:50           ` Pablo Neira Ayuso
  2019-09-06 15:13             ` Edward Cree
  0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06 14:50 UTC (permalink / raw)
  To: Edward Cree
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

On Fri, Sep 06, 2019 at 02:37:16PM +0100, Edward Cree wrote:
> On 06/09/2019 14:14, Pablo Neira Ayuso wrote:
> > OK, I can document this semantics, I need just _time_ to write that
> > documentation. I was expecting this patch description is enough by now
> > until I can get to finish that documentation.
>
> I think for two structs with apparently the same contents but different
>  semantics (one has the mask bitwise complemented) it's best to hold up
>  the code change until the comment is ready to come with it, because
>  until then it's a dangerously confusing situation.

The idea is that flow rule API != tc front-end anymore. So the flow
rule API can evolve regardless the front-end requirements. Before this
update there was no other choice than using the tc pedit layout since
it was exposed to the drivers, this is not the case anymore.

> >> And you can't just coalesce all consecutive mangles, because if you
> >>  mangle two consecutive fields (e.g. UDP sport and dport) the driver
> >>  still needs to disentangle that if it works on a 'fields' (rather
> >>  than 'u32s') level.
> >
> > This infrastructure is _not_ coalescing two consecutive field, e.g.
> > UDP sport and dport is _not_ coalesced. The coalesce routine does
> > _not_ handle multiple tc pedit ex actions.
>
> So an IPv6 address mangle only comes as a single action if it's from
>  netfilter, not if it's coming from TC pedit.

Driver gets one single action from tc/netfilter (and potentially
ethtool if it would support for this), it comes as a single action
from both subsystems:

        front-end -----> flow_rule API -----> drivers

Front-end need to translate their representation to this
FLOW_ACTION_MANGLE layout.

By front-end, I refer to ethtool/netfilter/tc.

>  Therefore drivers still  need to handle an IPv6 or MAC address
>  mangle coming in multiple  actions, therefore your driver
>  simplifications are invalid.  No?

No. IPv6 and MAC come as a single action. All subsystems use the same
flow_rule API, they use the same layout.

> > The model you propose would still need this code for tc pedit to
> > adjust offset/length and coalesce u32 fields.
>
>  Yes, but we don't add code/features to the kernel based on what we
>  *could* use it for later

This is already useful: Look at the cxgb driver in particular, it's
way easier to read.

Other existing drivers do not need to do things like:

        case round_down(offsetof(struct iphdr, tos), 4):

and the play with masks to infer if the user wants to mangle the TOS
field, they can just do:

        case offsetof(struct iphdr, tos):

which is way more natural representation.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 14:50           ` Pablo Neira Ayuso
@ 2019-09-06 15:13             ` Edward Cree
  2019-09-06 15:58               ` Pablo Neira Ayuso
  0 siblings, 1 reply; 15+ messages in thread
From: Edward Cree @ 2019-09-06 15:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

On 06/09/2019 15:50, Pablo Neira Ayuso wrote:
> On Fri, Sep 06, 2019 at 02:37:16PM +0100, Edward Cree wrote:
>> On 06/09/2019 14:14, Pablo Neira Ayuso wrote:
>>> OK, I can document this semantics, I need just _time_ to write that
>>> documentation. I was expecting this patch description is enough by now
>>> until I can get to finish that documentation.
>> I think for two structs with apparently the same contents but different
>>  semantics (one has the mask bitwise complemented) it's best to hold up
>>  the code change until the comment is ready to come with it, because
>>  until then it's a dangerously confusing situation.
> The idea is that flow rule API != tc front-end anymore. So the flow
> rule API can evolve regardless the front-end requirements. Before this
> update there was no other choice than using the tc pedit layout since
> it was exposed to the drivers, this is not the case anymore.
I'm not saying that they have to be the same.
I'm saying that they're _almost_ the same, and having things that are
 _almost_ the same but subtly different is a recipe for misunderstandings
 and bugs, and so must must must have very visible comments in the code.

>> So an IPv6 address mangle only comes as a single action if it's from
>>  netfilter, not if it's coming from TC pedit.
> Driver gets one single action from tc/netfilter (and potentially
> ethtool if it would support for this), it comes as a single action
> from both subsystems:
>
>         front-end -----> flow_rule API -----> drivers
>
> Front-end need to translate their representation to this
> FLOW_ACTION_MANGLE layout.
>
> By front-end, I refer to ethtool/netfilter/tc.
In your patch, flow_action_mangle() looks only at the offset and type
 (add vs. set) of each pedit, coalesces them if compatible (so, unless
 I'm misreading the code, it _will_ coalesce adjacent pedits to
 contiguous fields like UDP sport & dport, unlike what you said),
 because it doesn't know whether two contiguous pedits are part of the
 same field or not (it doesn't have any knowledge of 'fields' at all).
And for you to change that, while still coalescing IPv6 pedits, you
 would need flow_action_mangle() to know what fields each pedit is in.
It is not possible for code that doesn't know about fields to both:
 (a) not coalesce edits of contiguous fields, and
 (b) coalesce edits of larger-than-u32 fields

>>  Yes, but we don't add code/features to the kernel based on what we
>>  *could* use it for later
> This is already useful: Look at the cxgb driver in particular, it's
> way easier to read.
Have you tested it?  Again, I might be misreading, but it looks like
 your flow_action_mangle() *will* coalesce sport & dport, which it
 appears will break that cxgb code.

> Other existing drivers do not need to do things like:
>
>         case round_down(offsetof(struct iphdr, tos), 4):
>
> and the play with masks to infer if the user wants to mangle the TOS
> field, they can just do:
>
>         case offsetof(struct iphdr, tos):
>
> which is way more natural representation.
Proper thing to do is have helper functions available to drivers to test
 the pedit, and not just switch on the offset.  Why do I say that?
Well, consider a pedit on UDP dport, with mask 0x00ff (network endian).
Now as a u32 pedit that's 0x000000ff offset 0, so field-blind offset
 calculation (ffs in flow_action_mangle_entry()) will turn that into
 offset 3 mask 0xff.  Now driver does
    switch(offset) { /* 3 */
    case offsetof(struct udphdr, dest): /* 2 */
        /* Whoops, we never get here! */
    }
Do you see the problem?

-Ed

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 15:13             ` Edward Cree
@ 2019-09-06 15:58               ` Pablo Neira Ayuso
  2019-09-06 16:49                 ` Edward Cree
  0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06 15:58 UTC (permalink / raw)
  To: Edward Cree
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

Hi Edward,

On Fri, Sep 06, 2019 at 04:13:17PM +0100, Edward Cree wrote:
> On 06/09/2019 15:50, Pablo Neira Ayuso wrote:
> > On Fri, Sep 06, 2019 at 02:37:16PM +0100, Edward Cree wrote:
[...]
> >> So an IPv6 address mangle only comes as a single action if it's from
> >>  netfilter, not if it's coming from TC pedit.
> > Driver gets one single action from tc/netfilter (and potentially
> > ethtool if it would support for this), it comes as a single action
> > from both subsystems:
> >
> >         front-end -----> flow_rule API -----> drivers
> >
> > Front-end need to translate their representation to this
> > FLOW_ACTION_MANGLE layout.
> >
> > By front-end, I refer to ethtool/netfilter/tc.
>
>  In your patch, flow_action_mangle() looks only at the offset and type
>  (add vs. set) of each pedit, coalesces them if compatible (so, unless
>  I'm misreading the code, it _will_ coalesce adjacent pedits to
>  contiguous fields like UDP sport & dport, unlike what you said),
>  because it doesn't know whether two contiguous pedits are part of the
>  same field or not (it doesn't have any knowledge of 'fields' at all).

In tc pedit ex, those are _indeed_ two separated actions:

* One to mangle UDP sport.
* One to mangle UDP dport.

They are _not_ one as you describe above.

The coalesce routine I made does _not_ coalesce fields in two
different actions.

>  And for you to change that, while still coalescing IPv6 pedits, you
>  would need flow_action_mangle() to know what fields each pedit is in.

In the particular case of IPv6 address, 'tc pedit ex' generates one
single action with 4 nkeys. So this is:

* One action to mangle four 32-bits words (the number of words in tc
  pedit is stored in the nkeys field).

The coalesce routine I made in this case merges the four 32-bits words
into one single action.

[...]
> >>  Yes, but we don't add code/features to the kernel based on what we
> >>  *could* use it for later
> > This is already useful: Look at the cxgb driver in particular, it's
> > way easier to read.
>
>  Have you tested it?  Again, I might be misreading, but it looks like
>  your flow_action_mangle() *will* coalesce sport & dport, which it
>  appears will break that cxgb code.

Because sport and dport are not place in the same action by tc pedit
ex, _that cannot happen_.

> > Other existing drivers do not need to do things like:
> >
> >         case round_down(offsetof(struct iphdr, tos), 4):
> >
> > and the play with masks to infer if the user wants to mangle the TOS
> > field, they can just do:
> >
> >         case offsetof(struct iphdr, tos):
> >
> > which is way more natural representation.
>
> Proper thing to do is have helper functions available to drivers to test
> the pedit, and not just switch on the offset.  Why do I say that?
>
> Well, consider a pedit on UDP dport, with mask 0x00ff (network endian).
> Now as a u32 pedit that's 0x000000ff offset 0, so field-blind offset
> calculation (ffs in flow_action_mangle_entry()) will turn that into
> offset 3 mask 0xff.  Now driver does
>     switch(offset) { /* 3 */
>     case offsetof(struct udphdr, dest): /* 2 */
>         /* Whoops, we never get here! */
>     }
>
> Do you see the problem?

This scenario you describe cannot _work_ right now, with the existing
code. Without my patchset, this scenario you describe does _not_ work,

The drivers in the tree need a mask of 0xffff to infer that this is
UDP dport.

The 'tc pedit ex' infrastructure does not allow for the scenario that
you describe above.

No driver in the tree allow for what you describe already.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 15:58               ` Pablo Neira Ayuso
@ 2019-09-06 16:49                 ` Edward Cree
  2019-09-06 18:15                   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 15+ messages in thread
From: Edward Cree @ 2019-09-06 16:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

On 06/09/2019 16:58, Pablo Neira Ayuso wrote:
> In tc pedit ex, those are _indeed_ two separated actions: 
I read the code again and I get it now, there's double iteration
 already over tcf_exts_for_each_action and tcf_pedit_nkeys, and
 it's only within the latter that you coalesce.

However, have you considered that iproute2 (i.e. tc tool) isn't
 guaranteed to be the only userland consumer of the TC uAPI?  For all
 we know there could be another user out there producing things like
 a single pedit action with two keys, same offset but different
 masks, to mangle sport & dport separately, which your code now
 _would_ coalesce into a single mangle.  I don't know if that would
 lead to any problems, but I want to be sure you've thought about it ;)

>> Proper thing to do is have helper functions available to drivers to test
>> the pedit, and not just switch on the offset.  Why do I say that?
>>
>> Well, consider a pedit on UDP dport, with mask 0x00ff (network endian).
>> Now as a u32 pedit that's 0x000000ff offset 0, so field-blind offset
>> calculation (ffs in flow_action_mangle_entry()) will turn that into
>>  offset 3 mask 0xff.  Now driver does
>>     switch(offset) { /* 3 */
>>     case offsetof(struct udphdr, dest): /* 2 */
>>         /* Whoops, we never get here! */
>>     }
>>
>> Do you see the problem?
> This scenario you describe cannot _work_ right now, with the existing
> code. Without my patchset, this scenario you describe does _not_ work,
>
> The drivers in the tree need a mask of 0xffff to infer that this is
> UDP dport.
>
> The 'tc pedit ex' infrastructure does not allow for the scenario that
> you describe above.
>
> No driver in the tree allow for what you describe already.
Looks to me like existing nfp_fl_set_tport() handles just fine any
 arbitrary mask across the u32 that contains UDP sport & dport.
And the uAPI we have to maintain is the uAPI we expose, not the subset
 that iproute2 uses.  I could write a patched tc tool *today* that does
 a pedit of 'UDP header, offset 0, mask 0xff0000ff' and the nfp driver
 would accept that fine (no idea what the fw / chip would do with it,
 but presumably it works or Netronome folks would have put checks in),
 whereas with your patch it'll complain "invalid pedit L4 action"
 because the mask isn't all-1s.
And if I made it produce my example from above, mask 0x000000ff, you'd
 calculate an offset of 3 and hit the other error, "unsupported section
 of L4 header", which again would have worked before.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next,v3 0/4] flow_offload: update mangle action representation
  2019-09-06 16:49                 ` Edward Cree
@ 2019-09-06 18:15                   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-06 18:15 UTC (permalink / raw)
  To: Edward Cree
  Cc: netfilter-devel, davem, netdev, jakub.kicinski, jiri, saeedm,
	vishal, vladbu

On Fri, Sep 06, 2019 at 05:49:07PM +0100, Edward Cree wrote:
> On 06/09/2019 16:58, Pablo Neira Ayuso wrote:
> > In tc pedit ex, those are _indeed_ two separated actions: 
>
>  I read the code again and I get it now, there's double iteration
>  already over tcf_exts_for_each_action and tcf_pedit_nkeys, and
>  it's only within the latter that you coalesce.

Exactly.

>  However, have you considered that iproute2 (i.e. tc tool) isn't
>  guaranteed to be the only userland consumer of the TC uAPI?
>  For all we know there could be another user out there producing things like
>  a single pedit action with two keys, same offset but different
>  masks, to mangle sport & dport separately, which your code now
>  _would_ coalesce into a single mangle. I don't know if that would
>  lead to any problems, but I want to be sure you've thought about it ;)

tc pedit only supports for the "extended mode". So the hardware
offloads only support for a subset of the tc pedit uAPI already.

Userland may decide not to use the "extended mode", however, it will
not work for hardware offloads.

> >> Proper thing to do is have helper functions available to drivers to test
> >> the pedit, and not just switch on the offset.  Why do I say that?
> >>
> >> Well, consider a pedit on UDP dport, with mask 0x00ff (network endian).
> >> Now as a u32 pedit that's 0x000000ff offset 0, so field-blind offset
> >> calculation (ffs in flow_action_mangle_entry()) will turn that into
> >> offset 3 mask 0xff.  Now driver does
> >>     switch(offset) { /* 3 */
> >>     case offsetof(struct udphdr, dest): /* 2 */
> >>         /* Whoops, we never get here! */
> >>     }
> >>
> >> Do you see the problem?
> >
> > This scenario you describe cannot _work_ right now, with the existing
> > code. Without my patchset, this scenario you describe does _not_ work,
> >
> > The drivers in the tree need a mask of 0xffff to infer that this is
> > UDP dport.
> >
> > The 'tc pedit ex' infrastructure does not allow for the scenario that
> > you describe above.
> >
> > No driver in the tree allow for what you describe already.
>
>  Looks to me like existing nfp_fl_set_tport() handles just fine any
>  arbitrary mask across the u32 that contains UDP sport & dport.
>  And the uAPI we have to maintain is the uAPI we expose, not the subset
>  that iproute2 uses. I could write a patched tc tool *today* that does
>  a pedit of 'UDP header, offset 0, mask 0xff0000ff' and the nfp driver
>  would accept that fine (no idea what the fw / chip would do with it,
>  but presumably it works or Netronome folks would have put checks in),
>  whereas with your patch it'll complain "invalid pedit L4 action"
>  because the mask isn't all-1s.

'UDP header, offset 0, mask 0xff0000ff': Mangle one byte of the UDP
sport, and only one mangle of the dport via uAPI.

I get your point: If you think that supporting for this is reasonable
usecase, I'll fix this patchset and send a v4 so the nfp still works
for this.

>  And if I made it produce my example from above, mask 0x000000ff, you'd
>  calculate an offset of 3 and hit the other error, "unsupported section
>  of L4 header", which again would have worked before.

'mask 0x000000ff' mangle only one byte of a UDP port.

I'm sorry for this, I assumed in this case that the reasonable (sane)
uAPI subset in this case to be supported for the hardware offloads is
what tc tool via pedit ex generates.

I'll restore the nfp driver so it works for these scenarios via uAPI
that you describe.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-09-06 18:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-06  0:03 [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Pablo Neira Ayuso
2019-09-06  0:04 ` [PATCH net-next,v3 1/4] net: flow_offload: flip mangle action mask Pablo Neira Ayuso
2019-09-06  0:04 ` [PATCH net-next,v3 2/4] net: flow_offload: bitwise AND on mangle action value field Pablo Neira Ayuso
2019-09-06  0:04 ` [PATCH net-next,v3 3/4] net: flow_offload: mangle action at byte level Pablo Neira Ayuso
2019-09-06  0:04 ` [PATCH net-next,v3 4/4] netfilter: nft_payload: packet mangling offload support Pablo Neira Ayuso
2019-09-06 10:02 ` [PATCH net-next,v3 0/4] flow_offload: update mangle action representation Edward Cree
2019-09-06 10:56   ` Pablo Neira Ayuso
2019-09-06 12:55     ` Edward Cree
2019-09-06 13:14       ` Pablo Neira Ayuso
2019-09-06 13:37         ` Edward Cree
2019-09-06 14:50           ` Pablo Neira Ayuso
2019-09-06 15:13             ` Edward Cree
2019-09-06 15:58               ` Pablo Neira Ayuso
2019-09-06 16:49                 ` Edward Cree
2019-09-06 18:15                   ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).