* [PATCH net-next 0/3] Conntrack GRE offload
@ 2022-02-03 11:59 Toshiaki Makita
2022-02-03 11:59 ` [PATCH net-next 1/3] netfilter: flowtable: Support GRE Toshiaki Makita
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Toshiaki Makita @ 2022-02-03 11:59 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
Jozsef Kadlecsik, Florian Westphal
Cc: Toshiaki Makita, netdev, netfilter-devel, coreteam, Paul Blakey
Conntrack offload currently only supports TCP and UDP.
Thus TC/nftables/OVS cannot offload GRE packets.
However, GRE is widely used so some users create gre devices in VMs,
and in that case host OVS forwards GRE packets from/to VMs.
In order to offload GRE packets in OVS with stateful firewall support,
we need act_ct GRE offload support.
This patch set adds GRE offload support for act_ct and mlx5 conntrack.
Currently only GREv0 and no NAT support.
- Patch 1: flow_offload/flowtable GRE support.
- Patch 2: act_ct GRE offload support.
- Patch 3: mlx5 conntrack GRE offload support.
Tested with ConnectX-6 Dx 100G NIC and netperf TCP_STREAM.
+------------------------------------+
| +-----------+
| |(namespace)|
+---------+ | | netserver |
| | wire +----+ tc +--------+ +-------+ |
| netperf |-------->|mlx5|------>|mlx5 rep|--|mlx5 vf| |
| | +----+ +--------+ +-------+---+
+---------+ +------------------------------------+
- No offload (TC skip_hw): 8.5 Gbps
- Offload (act_ct) : 22 Gbps
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Toshiaki Makita (3):
netfilter: flowtable: Support GRE
act_ct: Support GRE offload
net/mlx5: Support GRE conntrack offload
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 21 +++--
net/netfilter/nf_flow_table_core.c | 10 +-
net/netfilter/nf_flow_table_ip.c | 54 +++++++++--
net/netfilter/nf_flow_table_offload.c | 19 ++--
net/netfilter/nft_flow_offload.c | 13 +++
net/sched/act_ct.c | 101 ++++++++++++++++-----
6 files changed, 171 insertions(+), 47 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 1/3] netfilter: flowtable: Support GRE
2022-02-03 11:59 [PATCH net-next 0/3] Conntrack GRE offload Toshiaki Makita
@ 2022-02-03 11:59 ` Toshiaki Makita
2022-02-07 17:56 ` Pablo Neira Ayuso
2022-02-03 11:59 ` [PATCH net-next 2/3] act_ct: Support GRE offload Toshiaki Makita
2022-02-03 11:59 ` [PATCH net-next 3/3] net/mlx5: Support GRE conntrack offload Toshiaki Makita
2 siblings, 1 reply; 12+ messages in thread
From: Toshiaki Makita @ 2022-02-03 11:59 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
Jozsef Kadlecsik, Florian Westphal
Cc: Toshiaki Makita, netdev, netfilter-devel, coreteam, Paul Blakey
Support GREv0 without NAT.
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
---
net/netfilter/nf_flow_table_core.c | 10 +++++--
net/netfilter/nf_flow_table_ip.c | 54 ++++++++++++++++++++++++++++-------
net/netfilter/nf_flow_table_offload.c | 19 +++++++-----
net/netfilter/nft_flow_offload.c | 13 +++++++++
4 files changed, 77 insertions(+), 19 deletions(-)
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index b90eca7..e66a375 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -39,8 +39,14 @@
ft->l3proto = ctt->src.l3num;
ft->l4proto = ctt->dst.protonum;
- ft->src_port = ctt->src.u.tcp.port;
- ft->dst_port = ctt->dst.u.tcp.port;
+
+ switch (ctt->dst.protonum) {
+ case IPPROTO_TCP:
+ case IPPROTO_UDP:
+ ft->src_port = ctt->src.u.tcp.port;
+ ft->dst_port = ctt->dst.u.tcp.port;
+ break;
+ }
}
struct flow_offload *flow_offload_alloc(struct nf_conn *ct)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 889cf88..48e2f58 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -172,6 +172,7 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
struct flow_ports *ports;
unsigned int thoff;
struct iphdr *iph;
+ u8 ipproto;
if (!pskb_may_pull(skb, sizeof(*iph) + offset))
return -1;
@@ -185,13 +186,19 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
thoff += offset;
- switch (iph->protocol) {
+ ipproto = iph->protocol;
+ switch (ipproto) {
case IPPROTO_TCP:
*hdrsize = sizeof(struct tcphdr);
break;
case IPPROTO_UDP:
*hdrsize = sizeof(struct udphdr);
break;
+#ifdef CONFIG_NF_CT_PROTO_GRE
+ case IPPROTO_GRE:
+ *hdrsize = sizeof(struct gre_base_hdr);
+ break;
+#endif
default:
return -1;
}
@@ -202,15 +209,25 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
if (!pskb_may_pull(skb, thoff + *hdrsize))
return -1;
+ if (ipproto == IPPROTO_GRE) {
+ struct gre_base_hdr *greh;
+
+ greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
+ if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
+ return -1;
+ }
+
iph = (struct iphdr *)(skb_network_header(skb) + offset);
- ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
tuple->src_v4.s_addr = iph->saddr;
tuple->dst_v4.s_addr = iph->daddr;
- tuple->src_port = ports->source;
- tuple->dst_port = ports->dest;
+ if (ipproto == IPPROTO_TCP || ipproto == IPPROTO_UDP) {
+ ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
+ tuple->src_port = ports->source;
+ tuple->dst_port = ports->dest;
+ }
tuple->l3proto = AF_INET;
- tuple->l4proto = iph->protocol;
+ tuple->l4proto = ipproto;
tuple->iifidx = dev->ifindex;
nf_flow_tuple_encap(skb, tuple);
@@ -521,6 +538,7 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
struct flow_ports *ports;
struct ipv6hdr *ip6h;
unsigned int thoff;
+ u8 nexthdr;
thoff = sizeof(*ip6h) + offset;
if (!pskb_may_pull(skb, thoff))
@@ -528,13 +546,19 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
- switch (ip6h->nexthdr) {
+ nexthdr = ip6h->nexthdr;
+ switch (nexthdr) {
case IPPROTO_TCP:
*hdrsize = sizeof(struct tcphdr);
break;
case IPPROTO_UDP:
*hdrsize = sizeof(struct udphdr);
break;
+#ifdef CONFIG_NF_CT_PROTO_GRE
+ case IPPROTO_GRE:
+ *hdrsize = sizeof(struct gre_base_hdr);
+ break;
+#endif
default:
return -1;
}
@@ -545,15 +569,25 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
if (!pskb_may_pull(skb, thoff + *hdrsize))
return -1;
+ if (nexthdr == IPPROTO_GRE) {
+ struct gre_base_hdr *greh;
+
+ greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
+ if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
+ return -1;
+ }
+
ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
- ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
tuple->src_v6 = ip6h->saddr;
tuple->dst_v6 = ip6h->daddr;
- tuple->src_port = ports->source;
- tuple->dst_port = ports->dest;
+ if (nexthdr == IPPROTO_TCP || nexthdr == IPPROTO_UDP) {
+ ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
+ tuple->src_port = ports->source;
+ tuple->dst_port = ports->dest;
+ }
tuple->l3proto = AF_INET6;
- tuple->l4proto = ip6h->nexthdr;
+ tuple->l4proto = nexthdr;
tuple->iifidx = dev->ifindex;
nf_flow_tuple_encap(skb, tuple);
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index b561e0a..9b81080 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -170,6 +170,7 @@ static int nf_flow_rule_match(struct nf_flow_match *match,
match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_TCP);
break;
case IPPROTO_UDP:
+ case IPPROTO_GRE:
break;
default:
return -EOPNOTSUPP;
@@ -178,15 +179,19 @@ static int nf_flow_rule_match(struct nf_flow_match *match,
key->basic.ip_proto = tuple->l4proto;
mask->basic.ip_proto = 0xff;
- key->tp.src = tuple->src_port;
- mask->tp.src = 0xffff;
- key->tp.dst = tuple->dst_port;
- mask->tp.dst = 0xffff;
-
match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_META) |
BIT(FLOW_DISSECTOR_KEY_CONTROL) |
- BIT(FLOW_DISSECTOR_KEY_BASIC) |
- BIT(FLOW_DISSECTOR_KEY_PORTS);
+ BIT(FLOW_DISSECTOR_KEY_BASIC);
+
+ if (tuple->l4proto == IPPROTO_TCP || tuple->l4proto == IPPROTO_UDP) {
+ key->tp.src = tuple->src_port;
+ mask->tp.src = 0xffff;
+ key->tp.dst = tuple->dst_port;
+ mask->tp.dst = 0xffff;
+
+ match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_PORTS);
+ }
+
return 0;
}
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 0af34ad..731b5d8 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -298,6 +298,19 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
break;
case IPPROTO_UDP:
break;
+#ifdef CONFIG_NF_CT_PROTO_GRE
+ case IPPROTO_GRE: {
+ struct nf_conntrack_tuple *tuple;
+
+ if (ct->status & IPS_NAT_MASK)
+ goto out;
+ tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+ /* No support for GRE v1 */
+ if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
+ goto out;
+ break;
+ }
+#endif
default:
goto out;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 2/3] act_ct: Support GRE offload
2022-02-03 11:59 [PATCH net-next 0/3] Conntrack GRE offload Toshiaki Makita
2022-02-03 11:59 ` [PATCH net-next 1/3] netfilter: flowtable: Support GRE Toshiaki Makita
@ 2022-02-03 11:59 ` Toshiaki Makita
2022-02-10 12:48 ` Paul Blakey
2022-02-03 11:59 ` [PATCH net-next 3/3] net/mlx5: Support GRE conntrack offload Toshiaki Makita
2 siblings, 1 reply; 12+ messages in thread
From: Toshiaki Makita @ 2022-02-03 11:59 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
Jozsef Kadlecsik, Florian Westphal
Cc: Toshiaki Makita, netdev, netfilter-devel, coreteam, Paul Blakey
Support GREv0 without NAT.
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
---
net/sched/act_ct.c | 101 +++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 79 insertions(+), 22 deletions(-)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index f99247f..a5f47d5 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -421,6 +421,19 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
break;
case IPPROTO_UDP:
break;
+#ifdef CONFIG_NF_CT_PROTO_GRE
+ case IPPROTO_GRE: {
+ struct nf_conntrack_tuple *tuple;
+
+ if (ct->status & IPS_NAT_MASK)
+ return;
+ tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+ /* No support for GRE v1 */
+ if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
+ return;
+ break;
+ }
+#endif
default:
return;
}
@@ -440,6 +453,8 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
struct flow_ports *ports;
unsigned int thoff;
struct iphdr *iph;
+ size_t hdrsize;
+ u8 ipproto;
if (!pskb_network_may_pull(skb, sizeof(*iph)))
return false;
@@ -451,29 +466,49 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
unlikely(thoff != sizeof(struct iphdr)))
return false;
- if (iph->protocol != IPPROTO_TCP &&
- iph->protocol != IPPROTO_UDP)
+ ipproto = iph->protocol;
+ switch (ipproto) {
+ case IPPROTO_TCP:
+ hdrsize = sizeof(struct tcphdr);
+ break;
+ case IPPROTO_UDP:
+ hdrsize = sizeof(*ports);
+ break;
+#ifdef CONFIG_NF_CT_PROTO_GRE
+ case IPPROTO_GRE:
+ hdrsize = sizeof(struct gre_base_hdr);
+ break;
+#endif
+ default:
return false;
+ }
if (iph->ttl <= 1)
return false;
- if (!pskb_network_may_pull(skb, iph->protocol == IPPROTO_TCP ?
- thoff + sizeof(struct tcphdr) :
- thoff + sizeof(*ports)))
+ if (!pskb_network_may_pull(skb, thoff + hdrsize))
return false;
iph = ip_hdr(skb);
- if (iph->protocol == IPPROTO_TCP)
+ if (ipproto == IPPROTO_TCP) {
*tcph = (void *)(skb_network_header(skb) + thoff);
+ } else if (ipproto == IPPROTO_GRE) {
+ struct gre_base_hdr *greh;
+
+ greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
+ if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
+ return false;
+ }
- ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
tuple->src_v4.s_addr = iph->saddr;
tuple->dst_v4.s_addr = iph->daddr;
- tuple->src_port = ports->source;
- tuple->dst_port = ports->dest;
+ if (ipproto == IPPROTO_TCP || ipproto == IPPROTO_UDP) {
+ ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
+ tuple->src_port = ports->source;
+ tuple->dst_port = ports->dest;
+ }
tuple->l3proto = AF_INET;
- tuple->l4proto = iph->protocol;
+ tuple->l4proto = ipproto;
return true;
}
@@ -486,36 +521,58 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
struct flow_ports *ports;
struct ipv6hdr *ip6h;
unsigned int thoff;
+ size_t hdrsize;
+ u8 nexthdr;
if (!pskb_network_may_pull(skb, sizeof(*ip6h)))
return false;
ip6h = ipv6_hdr(skb);
+ thoff = sizeof(*ip6h);
- if (ip6h->nexthdr != IPPROTO_TCP &&
- ip6h->nexthdr != IPPROTO_UDP)
- return false;
+ nexthdr = ip6h->nexthdr;
+ switch (nexthdr) {
+ case IPPROTO_TCP:
+ hdrsize = sizeof(struct tcphdr);
+ break;
+ case IPPROTO_UDP:
+ hdrsize = sizeof(*ports);
+ break;
+#ifdef CONFIG_NF_CT_PROTO_GRE
+ case IPPROTO_GRE:
+ hdrsize = sizeof(struct gre_base_hdr);
+ break;
+#endif
+ default:
+ return -1;
+ }
if (ip6h->hop_limit <= 1)
return false;
- thoff = sizeof(*ip6h);
- if (!pskb_network_may_pull(skb, ip6h->nexthdr == IPPROTO_TCP ?
- thoff + sizeof(struct tcphdr) :
- thoff + sizeof(*ports)))
+ if (!pskb_network_may_pull(skb, thoff + hdrsize))
return false;
ip6h = ipv6_hdr(skb);
- if (ip6h->nexthdr == IPPROTO_TCP)
+ if (nexthdr == IPPROTO_TCP) {
*tcph = (void *)(skb_network_header(skb) + thoff);
+ } else if (nexthdr == IPPROTO_GRE) {
+ struct gre_base_hdr *greh;
+
+ greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
+ if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
+ return false;
+ }
- ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
tuple->src_v6 = ip6h->saddr;
tuple->dst_v6 = ip6h->daddr;
- tuple->src_port = ports->source;
- tuple->dst_port = ports->dest;
+ if (nexthdr == IPPROTO_TCP || nexthdr == IPPROTO_UDP) {
+ ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
+ tuple->src_port = ports->source;
+ tuple->dst_port = ports->dest;
+ }
tuple->l3proto = AF_INET6;
- tuple->l4proto = ip6h->nexthdr;
+ tuple->l4proto = nexthdr;
return true;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 3/3] net/mlx5: Support GRE conntrack offload
2022-02-03 11:59 [PATCH net-next 0/3] Conntrack GRE offload Toshiaki Makita
2022-02-03 11:59 ` [PATCH net-next 1/3] netfilter: flowtable: Support GRE Toshiaki Makita
2022-02-03 11:59 ` [PATCH net-next 2/3] act_ct: Support GRE offload Toshiaki Makita
@ 2022-02-03 11:59 ` Toshiaki Makita
2022-02-10 12:44 ` Paul Blakey
2 siblings, 1 reply; 12+ messages in thread
From: Toshiaki Makita @ 2022-02-03 11:59 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
Jozsef Kadlecsik, Florian Westphal
Cc: Toshiaki Makita, netdev, netfilter-devel, coreteam, Paul Blakey
Support GREv0 without NAT.
Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 0f4d3b9d..465643c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -258,7 +258,8 @@ struct mlx5_ct_entry {
return -EOPNOTSUPP;
}
} else {
- return -EOPNOTSUPP;
+ if (tuple->ip_proto != IPPROTO_GRE)
+ return -EOPNOTSUPP;
}
return 0;
@@ -807,7 +808,11 @@ struct mlx5_ct_entry {
attr->dest_chain = 0;
attr->dest_ft = mlx5e_tc_post_act_get_ft(ct_priv->post_act);
attr->ft = nat ? ct_priv->ct_nat : ct_priv->ct;
- attr->outer_match_level = MLX5_MATCH_L4;
+ if (entry->tuple.ip_proto == IPPROTO_TCP ||
+ entry->tuple.ip_proto == IPPROTO_UDP)
+ attr->outer_match_level = MLX5_MATCH_L4;
+ else
+ attr->outer_match_level = MLX5_MATCH_L3;
attr->counter = entry->counter->counter;
attr->flags |= MLX5_ATTR_FLAG_NO_IN_PORT;
if (ct_priv->ns_type == MLX5_FLOW_NAMESPACE_FDB)
@@ -1224,16 +1229,20 @@ static void mlx5_tc_ct_entry_del_work(struct work_struct *work)
struct flow_keys flow_keys;
skb_reset_network_header(skb);
- skb_flow_dissect_flow_keys(skb, &flow_keys, 0);
+ skb_flow_dissect_flow_keys(skb, &flow_keys, FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP);
tuple->zone = zone;
if (flow_keys.basic.ip_proto != IPPROTO_TCP &&
- flow_keys.basic.ip_proto != IPPROTO_UDP)
+ flow_keys.basic.ip_proto != IPPROTO_UDP &&
+ flow_keys.basic.ip_proto != IPPROTO_GRE)
return false;
- tuple->port.src = flow_keys.ports.src;
- tuple->port.dst = flow_keys.ports.dst;
+ if (flow_keys.basic.ip_proto == IPPROTO_TCP ||
+ flow_keys.basic.ip_proto == IPPROTO_UDP) {
+ tuple->port.src = flow_keys.ports.src;
+ tuple->port.dst = flow_keys.ports.dst;
+ }
tuple->n_proto = flow_keys.basic.n_proto;
tuple->ip_proto = flow_keys.basic.ip_proto;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/3] netfilter: flowtable: Support GRE
2022-02-03 11:59 ` [PATCH net-next 1/3] netfilter: flowtable: Support GRE Toshiaki Makita
@ 2022-02-07 17:56 ` Pablo Neira Ayuso
2022-02-08 14:30 ` Toshiaki Makita
0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2022-02-07 17:56 UTC (permalink / raw)
To: Toshiaki Makita
Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Jozsef Kadlecsik,
Florian Westphal, netdev, netfilter-devel, coreteam, Paul Blakey
On Thu, Feb 03, 2022 at 08:59:39PM +0900, Toshiaki Makita wrote:
> Support GREv0 without NAT.
>
> Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
> ---
> net/netfilter/nf_flow_table_core.c | 10 +++++--
> net/netfilter/nf_flow_table_ip.c | 54 ++++++++++++++++++++++++++++-------
> net/netfilter/nf_flow_table_offload.c | 19 +++++++-----
> net/netfilter/nft_flow_offload.c | 13 +++++++++
> 4 files changed, 77 insertions(+), 19 deletions(-)
>
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index b90eca7..e66a375 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -39,8 +39,14 @@
>
> ft->l3proto = ctt->src.l3num;
> ft->l4proto = ctt->dst.protonum;
> - ft->src_port = ctt->src.u.tcp.port;
> - ft->dst_port = ctt->dst.u.tcp.port;
> +
> + switch (ctt->dst.protonum) {
> + case IPPROTO_TCP:
> + case IPPROTO_UDP:
> + ft->src_port = ctt->src.u.tcp.port;
> + ft->dst_port = ctt->dst.u.tcp.port;
> + break;
> + }
> }
>
> struct flow_offload *flow_offload_alloc(struct nf_conn *ct)
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index 889cf88..48e2f58 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -172,6 +172,7 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
> struct flow_ports *ports;
> unsigned int thoff;
> struct iphdr *iph;
> + u8 ipproto;
>
> if (!pskb_may_pull(skb, sizeof(*iph) + offset))
> return -1;
> @@ -185,13 +186,19 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
>
> thoff += offset;
>
> - switch (iph->protocol) {
> + ipproto = iph->protocol;
> + switch (ipproto) {
> case IPPROTO_TCP:
> *hdrsize = sizeof(struct tcphdr);
> break;
> case IPPROTO_UDP:
> *hdrsize = sizeof(struct udphdr);
> break;
> +#ifdef CONFIG_NF_CT_PROTO_GRE
> + case IPPROTO_GRE:
> + *hdrsize = sizeof(struct gre_base_hdr);
> + break;
> +#endif
> default:
> return -1;
> }
> @@ -202,15 +209,25 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
> if (!pskb_may_pull(skb, thoff + *hdrsize))
> return -1;
>
> + if (ipproto == IPPROTO_GRE) {
No ifdef here? Maybe remove these ifdef everywhere?
> + struct gre_base_hdr *greh;
> +
> + greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
> + if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
> + return -1;
> + }
> +
> iph = (struct iphdr *)(skb_network_header(skb) + offset);
> - ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
>
> tuple->src_v4.s_addr = iph->saddr;
> tuple->dst_v4.s_addr = iph->daddr;
> - tuple->src_port = ports->source;
> - tuple->dst_port = ports->dest;
> + if (ipproto == IPPROTO_TCP || ipproto == IPPROTO_UDP) {
> + ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
> + tuple->src_port = ports->source;
> + tuple->dst_port = ports->dest;
> + }
maybe:
switch (ipproto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
...
break;
case IPPROTO_GRE:
break;
}
?
> tuple->l3proto = AF_INET;
> - tuple->l4proto = iph->protocol;
> + tuple->l4proto = ipproto;
> tuple->iifidx = dev->ifindex;
> nf_flow_tuple_encap(skb, tuple);
>
> @@ -521,6 +538,7 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
> struct flow_ports *ports;
> struct ipv6hdr *ip6h;
> unsigned int thoff;
> + u8 nexthdr;
>
> thoff = sizeof(*ip6h) + offset;
> if (!pskb_may_pull(skb, thoff))
> @@ -528,13 +546,19 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
>
> ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
>
> - switch (ip6h->nexthdr) {
> + nexthdr = ip6h->nexthdr;
> + switch (nexthdr) {
> case IPPROTO_TCP:
> *hdrsize = sizeof(struct tcphdr);
> break;
> case IPPROTO_UDP:
> *hdrsize = sizeof(struct udphdr);
> break;
> +#ifdef CONFIG_NF_CT_PROTO_GRE
> + case IPPROTO_GRE:
> + *hdrsize = sizeof(struct gre_base_hdr);
> + break;
> +#endif
> default:
> return -1;
> }
> @@ -545,15 +569,25 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
> if (!pskb_may_pull(skb, thoff + *hdrsize))
> return -1;
>
> + if (nexthdr == IPPROTO_GRE) {
> + struct gre_base_hdr *greh;
> +
> + greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
> + if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
> + return -1;
> + }
> +
> ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
> - ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
>
> tuple->src_v6 = ip6h->saddr;
> tuple->dst_v6 = ip6h->daddr;
> - tuple->src_port = ports->source;
> - tuple->dst_port = ports->dest;
> + if (nexthdr == IPPROTO_TCP || nexthdr == IPPROTO_UDP) {
> + ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
> + tuple->src_port = ports->source;
> + tuple->dst_port = ports->dest;
> + }
> tuple->l3proto = AF_INET6;
> - tuple->l4proto = ip6h->nexthdr;
> + tuple->l4proto = nexthdr;
> tuple->iifidx = dev->ifindex;
> nf_flow_tuple_encap(skb, tuple);
>
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index b561e0a..9b81080 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -170,6 +170,7 @@ static int nf_flow_rule_match(struct nf_flow_match *match,
> match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_TCP);
> break;
> case IPPROTO_UDP:
> + case IPPROTO_GRE:
> break;
> default:
> return -EOPNOTSUPP;
> @@ -178,15 +179,19 @@ static int nf_flow_rule_match(struct nf_flow_match *match,
> key->basic.ip_proto = tuple->l4proto;
> mask->basic.ip_proto = 0xff;
>
> - key->tp.src = tuple->src_port;
> - mask->tp.src = 0xffff;
> - key->tp.dst = tuple->dst_port;
> - mask->tp.dst = 0xffff;
> -
> match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_META) |
> BIT(FLOW_DISSECTOR_KEY_CONTROL) |
> - BIT(FLOW_DISSECTOR_KEY_BASIC) |
> - BIT(FLOW_DISSECTOR_KEY_PORTS);
> + BIT(FLOW_DISSECTOR_KEY_BASIC);
> +
> + if (tuple->l4proto == IPPROTO_TCP || tuple->l4proto == IPPROTO_UDP) {
> + key->tp.src = tuple->src_port;
> + mask->tp.src = 0xffff;
> + key->tp.dst = tuple->dst_port;
> + mask->tp.dst = 0xffff;
> +
> + match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_PORTS);
> + }
> +
> return 0;
> }
>
> diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> index 0af34ad..731b5d8 100644
> --- a/net/netfilter/nft_flow_offload.c
> +++ b/net/netfilter/nft_flow_offload.c
> @@ -298,6 +298,19 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
> break;
> case IPPROTO_UDP:
> break;
> +#ifdef CONFIG_NF_CT_PROTO_GRE
> + case IPPROTO_GRE: {
> + struct nf_conntrack_tuple *tuple;
> +
> + if (ct->status & IPS_NAT_MASK)
> + goto out;
Why this NAT check?
> + tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> + /* No support for GRE v1 */
> + if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
> + goto out;
> + break;
> + }
> +#endif
> default:
> goto out;
> }
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/3] netfilter: flowtable: Support GRE
2022-02-07 17:56 ` Pablo Neira Ayuso
@ 2022-02-08 14:30 ` Toshiaki Makita
2022-02-09 10:01 ` Pablo Neira Ayuso
0 siblings, 1 reply; 12+ messages in thread
From: Toshiaki Makita @ 2022-02-08 14:30 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Jozsef Kadlecsik,
Florian Westphal, netdev, netfilter-devel, coreteam, Paul Blakey
On 2022/02/08 2:56, Pablo Neira Ayuso wrote:
> On Thu, Feb 03, 2022 at 08:59:39PM +0900, Toshiaki Makita wrote:
>> Support GREv0 without NAT.
>>
>> Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
>> ---
>> net/netfilter/nf_flow_table_core.c | 10 +++++--
>> net/netfilter/nf_flow_table_ip.c | 54 ++++++++++++++++++++++++++++-------
>> net/netfilter/nf_flow_table_offload.c | 19 +++++++-----
>> net/netfilter/nft_flow_offload.c | 13 +++++++++
>> 4 files changed, 77 insertions(+), 19 deletions(-)
>>
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index b90eca7..e66a375 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -39,8 +39,14 @@
>>
>> ft->l3proto = ctt->src.l3num;
>> ft->l4proto = ctt->dst.protonum;
>> - ft->src_port = ctt->src.u.tcp.port;
>> - ft->dst_port = ctt->dst.u.tcp.port;
>> +
>> + switch (ctt->dst.protonum) {
>> + case IPPROTO_TCP:
>> + case IPPROTO_UDP:
>> + ft->src_port = ctt->src.u.tcp.port;
>> + ft->dst_port = ctt->dst.u.tcp.port;
>> + break;
>> + }
>> }
>>
>> struct flow_offload *flow_offload_alloc(struct nf_conn *ct)
>> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
>> index 889cf88..48e2f58 100644
>> --- a/net/netfilter/nf_flow_table_ip.c
>> +++ b/net/netfilter/nf_flow_table_ip.c
>> @@ -172,6 +172,7 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
>> struct flow_ports *ports;
>> unsigned int thoff;
>> struct iphdr *iph;
>> + u8 ipproto;
>>
>> if (!pskb_may_pull(skb, sizeof(*iph) + offset))
>> return -1;
>> @@ -185,13 +186,19 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
>>
>> thoff += offset;
>>
>> - switch (iph->protocol) {
>> + ipproto = iph->protocol;
>> + switch (ipproto) {
>> case IPPROTO_TCP:
>> *hdrsize = sizeof(struct tcphdr);
>> break;
>> case IPPROTO_UDP:
>> *hdrsize = sizeof(struct udphdr);
>> break;
>> +#ifdef CONFIG_NF_CT_PROTO_GRE
>> + case IPPROTO_GRE:
>> + *hdrsize = sizeof(struct gre_base_hdr);
>> + break;
>> +#endif
>> default:
>> return -1;
>> }
>> @@ -202,15 +209,25 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
>> if (!pskb_may_pull(skb, thoff + *hdrsize))
>> return -1;
>>
>> + if (ipproto == IPPROTO_GRE) {
>
> No ifdef here? Maybe remove these ifdef everywhere?
I wanted to avoid adding many ifdefs and I expect this to be compiled out when
CONFIG_NF_CT_PROTO_GRE=n as this block is unreachable anyway. It rather may have
been unintuitive though.
Removing all of these ifdefs will cause inconsistent behavior between
CONFIG_NF_CT_PROTO_GRE=n/y.
When CONFIG_NF_CT_PROTO_GRE=n, conntrack cannot determine GRE version, thus it will
track GREv1 without key infomation, and the flow will be offloaded.
When CONFIG_NF_CT_PROTO_GRE=y, GREv1 will have key information and will not be
offloaded.
I wanted to just refuse offloading of GRE to avoid this inconsistency.
Anyway this kind of inconsistency seems to happen in software conntrack, so if you'd
like to remove ifdefs, I will do.
>> + struct gre_base_hdr *greh;
>> +
>> + greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
>> + if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
>> + return -1;
>> + }
>> +
>> iph = (struct iphdr *)(skb_network_header(skb) + offset);
>> - ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
>>
>> tuple->src_v4.s_addr = iph->saddr;
>> tuple->dst_v4.s_addr = iph->daddr;
>> - tuple->src_port = ports->source;
>> - tuple->dst_port = ports->dest;
>> + if (ipproto == IPPROTO_TCP || ipproto == IPPROTO_UDP) {
>> + ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
>> + tuple->src_port = ports->source;
>> + tuple->dst_port = ports->dest;
>> + }
>
> maybe:
>
> switch (ipproto) {
> case IPPROTO_TCP:
> case IPPROTO_UDP:
> ...
> break;
> case IPPROTO_GRE:
> break;
> }
>
> ?
Sure, will fix.
>> tuple->l3proto = AF_INET;
>> - tuple->l4proto = iph->protocol;
>> + tuple->l4proto = ipproto;
>> tuple->iifidx = dev->ifindex;
>> nf_flow_tuple_encap(skb, tuple);
>>
>> @@ -521,6 +538,7 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
>> struct flow_ports *ports;
>> struct ipv6hdr *ip6h;
>> unsigned int thoff;
>> + u8 nexthdr;
>>
>> thoff = sizeof(*ip6h) + offset;
>> if (!pskb_may_pull(skb, thoff))
>> @@ -528,13 +546,19 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
>>
>> ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
>>
>> - switch (ip6h->nexthdr) {
>> + nexthdr = ip6h->nexthdr;
>> + switch (nexthdr) {
>> case IPPROTO_TCP:
>> *hdrsize = sizeof(struct tcphdr);
>> break;
>> case IPPROTO_UDP:
>> *hdrsize = sizeof(struct udphdr);
>> break;
>> +#ifdef CONFIG_NF_CT_PROTO_GRE
>> + case IPPROTO_GRE:
>> + *hdrsize = sizeof(struct gre_base_hdr);
>> + break;
>> +#endif
>> default:
>> return -1;
>> }
>> @@ -545,15 +569,25 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
>> if (!pskb_may_pull(skb, thoff + *hdrsize))
>> return -1;
>>
>> + if (nexthdr == IPPROTO_GRE) {
>> + struct gre_base_hdr *greh;
>> +
>> + greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
>> + if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
>> + return -1;
>> + }
>> +
>> ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
>> - ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
>>
>> tuple->src_v6 = ip6h->saddr;
>> tuple->dst_v6 = ip6h->daddr;
>> - tuple->src_port = ports->source;
>> - tuple->dst_port = ports->dest;
>> + if (nexthdr == IPPROTO_TCP || nexthdr == IPPROTO_UDP) {
>> + ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
>> + tuple->src_port = ports->source;
>> + tuple->dst_port = ports->dest;
>> + }
>> tuple->l3proto = AF_INET6;
>> - tuple->l4proto = ip6h->nexthdr;
>> + tuple->l4proto = nexthdr;
>> tuple->iifidx = dev->ifindex;
>> nf_flow_tuple_encap(skb, tuple);
>>
>> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
>> index b561e0a..9b81080 100644
>> --- a/net/netfilter/nf_flow_table_offload.c
>> +++ b/net/netfilter/nf_flow_table_offload.c
>> @@ -170,6 +170,7 @@ static int nf_flow_rule_match(struct nf_flow_match *match,
>> match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_TCP);
>> break;
>> case IPPROTO_UDP:
>> + case IPPROTO_GRE:
>> break;
>> default:
>> return -EOPNOTSUPP;
>> @@ -178,15 +179,19 @@ static int nf_flow_rule_match(struct nf_flow_match *match,
>> key->basic.ip_proto = tuple->l4proto;
>> mask->basic.ip_proto = 0xff;
>>
>> - key->tp.src = tuple->src_port;
>> - mask->tp.src = 0xffff;
>> - key->tp.dst = tuple->dst_port;
>> - mask->tp.dst = 0xffff;
>> -
>> match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_META) |
>> BIT(FLOW_DISSECTOR_KEY_CONTROL) |
>> - BIT(FLOW_DISSECTOR_KEY_BASIC) |
>> - BIT(FLOW_DISSECTOR_KEY_PORTS);
>> + BIT(FLOW_DISSECTOR_KEY_BASIC);
>> +
>> + if (tuple->l4proto == IPPROTO_TCP || tuple->l4proto == IPPROTO_UDP) {
>> + key->tp.src = tuple->src_port;
>> + mask->tp.src = 0xffff;
>> + key->tp.dst = tuple->dst_port;
>> + mask->tp.dst = 0xffff;
>> +
>> + match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_PORTS);
>> + }
>> +
>> return 0;
>> }
>>
>> diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
>> index 0af34ad..731b5d8 100644
>> --- a/net/netfilter/nft_flow_offload.c
>> +++ b/net/netfilter/nft_flow_offload.c
>> @@ -298,6 +298,19 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
>> break;
>> case IPPROTO_UDP:
>> break;
>> +#ifdef CONFIG_NF_CT_PROTO_GRE
>> + case IPPROTO_GRE: {
>> + struct nf_conntrack_tuple *tuple;
>> +
>> + if (ct->status & IPS_NAT_MASK)
>> + goto out;
>
> Why this NAT check?
NAT requires more work. I'd like to start with a minimal GRE support.
Maybe we can add NAT support later.
Toshiaki Makita
>> + tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
>> + /* No support for GRE v1 */
>> + if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
>> + goto out;
>> + break;
>> + }
>> +#endif
>> default:
>> goto out;
>> }
>> --
>> 1.8.3.1
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/3] netfilter: flowtable: Support GRE
2022-02-08 14:30 ` Toshiaki Makita
@ 2022-02-09 10:01 ` Pablo Neira Ayuso
2022-02-12 1:54 ` Toshiaki Makita
0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2022-02-09 10:01 UTC (permalink / raw)
To: Toshiaki Makita
Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Jozsef Kadlecsik,
Florian Westphal, netdev, netfilter-devel, coreteam, Paul Blakey
On Tue, Feb 08, 2022 at 11:30:03PM +0900, Toshiaki Makita wrote:
> On 2022/02/08 2:56, Pablo Neira Ayuso wrote:
> > On Thu, Feb 03, 2022 at 08:59:39PM +0900, Toshiaki Makita wrote:
[...]
> > > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> > > index 889cf88..48e2f58 100644
> > > --- a/net/netfilter/nf_flow_table_ip.c
> > > +++ b/net/netfilter/nf_flow_table_ip.c
[...]
> > > @@ -202,15 +209,25 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
> > > if (!pskb_may_pull(skb, thoff + *hdrsize))
> > > return -1;
> > > + if (ipproto == IPPROTO_GRE) {
> >
> > No ifdef here? Maybe remove these ifdef everywhere?
>
> I wanted to avoid adding many ifdefs and I expect this to be compiled out
> when CONFIG_NF_CT_PROTO_GRE=n as this block is unreachable anyway. It rather
> may have been unintuitive though.
>
> Removing all of these ifdefs will cause inconsistent behavior between
> CONFIG_NF_CT_PROTO_GRE=n/y.
> When CONFIG_NF_CT_PROTO_GRE=n, conntrack cannot determine GRE version, thus
> it will track GREv1 without key infomation, and the flow will be offloaded.
> When CONFIG_NF_CT_PROTO_GRE=y, GREv1 will have key information and will not
> be offloaded.
> I wanted to just refuse offloading of GRE to avoid this inconsistency.
> Anyway this kind of inconsistency seems to happen in software conntrack, so
> if you'd like to remove ifdefs, I will do.
Good point, thanks for explaining. LGTM.
[...]
> > > diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> > > index 0af34ad..731b5d8 100644
> > > --- a/net/netfilter/nft_flow_offload.c
> > > +++ b/net/netfilter/nft_flow_offload.c
> > > @@ -298,6 +298,19 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
> > > break;
> > > case IPPROTO_UDP:
> > > break;
> > > +#ifdef CONFIG_NF_CT_PROTO_GRE
> > > + case IPPROTO_GRE: {
> > > + struct nf_conntrack_tuple *tuple;
> > > +
> > > + if (ct->status & IPS_NAT_MASK)
> > > + goto out;
> >
> > Why this NAT check?
>
> NAT requires more work. I'd like to start with a minimal GRE support.
> Maybe we can add NAT support later.
Oh well, yes, no problem.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 3/3] net/mlx5: Support GRE conntrack offload
2022-02-03 11:59 ` [PATCH net-next 3/3] net/mlx5: Support GRE conntrack offload Toshiaki Makita
@ 2022-02-10 12:44 ` Paul Blakey
0 siblings, 0 replies; 12+ messages in thread
From: Paul Blakey @ 2022-02-10 12:44 UTC (permalink / raw)
To: Toshiaki Makita
Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
Jozsef Kadlecsik, Florian Westphal, netdev, netfilter-devel,
coreteam
On Thu, 3 Feb 2022, Toshiaki Makita wrote:
> Support GREv0 without NAT.
>
> Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> index 0f4d3b9d..465643c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
> @@ -258,7 +258,8 @@ struct mlx5_ct_entry {
> return -EOPNOTSUPP;
> }
> } else {
> - return -EOPNOTSUPP;
> + if (tuple->ip_proto != IPPROTO_GRE)
> + return -EOPNOTSUPP;
> }
>
> return 0;
> @@ -807,7 +808,11 @@ struct mlx5_ct_entry {
> attr->dest_chain = 0;
> attr->dest_ft = mlx5e_tc_post_act_get_ft(ct_priv->post_act);
> attr->ft = nat ? ct_priv->ct_nat : ct_priv->ct;
> - attr->outer_match_level = MLX5_MATCH_L4;
> + if (entry->tuple.ip_proto == IPPROTO_TCP ||
> + entry->tuple.ip_proto == IPPROTO_UDP)
> + attr->outer_match_level = MLX5_MATCH_L4;
> + else
> + attr->outer_match_level = MLX5_MATCH_L3;
> attr->counter = entry->counter->counter;
> attr->flags |= MLX5_ATTR_FLAG_NO_IN_PORT;
> if (ct_priv->ns_type == MLX5_FLOW_NAMESPACE_FDB)
> @@ -1224,16 +1229,20 @@ static void mlx5_tc_ct_entry_del_work(struct work_struct *work)
> struct flow_keys flow_keys;
>
> skb_reset_network_header(skb);
> - skb_flow_dissect_flow_keys(skb, &flow_keys, 0);
> + skb_flow_dissect_flow_keys(skb, &flow_keys, FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP);
>
> tuple->zone = zone;
>
> if (flow_keys.basic.ip_proto != IPPROTO_TCP &&
> - flow_keys.basic.ip_proto != IPPROTO_UDP)
> + flow_keys.basic.ip_proto != IPPROTO_UDP &&
> + flow_keys.basic.ip_proto != IPPROTO_GRE)
> return false;
>
> - tuple->port.src = flow_keys.ports.src;
> - tuple->port.dst = flow_keys.ports.dst;
> + if (flow_keys.basic.ip_proto == IPPROTO_TCP ||
> + flow_keys.basic.ip_proto == IPPROTO_UDP) {
> + tuple->port.src = flow_keys.ports.src;
> + tuple->port.dst = flow_keys.ports.dst;
> + }
> tuple->n_proto = flow_keys.basic.n_proto;
> tuple->ip_proto = flow_keys.basic.ip_proto;
>
> --
> 1.8.3.1
>
>
Acked-by: Paul Blakey <paulb@nvidia.com>
Looks good to me.
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 2/3] act_ct: Support GRE offload
2022-02-03 11:59 ` [PATCH net-next 2/3] act_ct: Support GRE offload Toshiaki Makita
@ 2022-02-10 12:48 ` Paul Blakey
0 siblings, 0 replies; 12+ messages in thread
From: Paul Blakey @ 2022-02-10 12:48 UTC (permalink / raw)
To: Toshiaki Makita
Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Pablo Neira Ayuso,
Jozsef Kadlecsik, Florian Westphal, netdev, netfilter-devel,
coreteam
On Thu, 3 Feb 2022, Toshiaki Makita wrote:
> Support GREv0 without NAT.
>
> Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
> ---
> net/sched/act_ct.c | 101 +++++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 79 insertions(+), 22 deletions(-)
>
> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> index f99247f..a5f47d5 100644
> --- a/net/sched/act_ct.c
> +++ b/net/sched/act_ct.c
> @@ -421,6 +421,19 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
> break;
> case IPPROTO_UDP:
> break;
> +#ifdef CONFIG_NF_CT_PROTO_GRE
> + case IPPROTO_GRE: {
> + struct nf_conntrack_tuple *tuple;
> +
> + if (ct->status & IPS_NAT_MASK)
> + return;
> + tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> + /* No support for GRE v1 */
> + if (tuple->src.u.gre.key || tuple->dst.u.gre.key)
> + return;
> + break;
> + }
> +#endif
> default:
> return;
> }
> @@ -440,6 +453,8 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
> struct flow_ports *ports;
> unsigned int thoff;
> struct iphdr *iph;
> + size_t hdrsize;
> + u8 ipproto;
>
> if (!pskb_network_may_pull(skb, sizeof(*iph)))
> return false;
> @@ -451,29 +466,49 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
> unlikely(thoff != sizeof(struct iphdr)))
> return false;
>
> - if (iph->protocol != IPPROTO_TCP &&
> - iph->protocol != IPPROTO_UDP)
> + ipproto = iph->protocol;
> + switch (ipproto) {
> + case IPPROTO_TCP:
> + hdrsize = sizeof(struct tcphdr);
> + break;
> + case IPPROTO_UDP:
> + hdrsize = sizeof(*ports);
> + break;
> +#ifdef CONFIG_NF_CT_PROTO_GRE
> + case IPPROTO_GRE:
> + hdrsize = sizeof(struct gre_base_hdr);
> + break;
> +#endif
> + default:
> return false;
> + }
>
> if (iph->ttl <= 1)
> return false;
>
> - if (!pskb_network_may_pull(skb, iph->protocol == IPPROTO_TCP ?
> - thoff + sizeof(struct tcphdr) :
> - thoff + sizeof(*ports)))
> + if (!pskb_network_may_pull(skb, thoff + hdrsize))
> return false;
>
> iph = ip_hdr(skb);
> - if (iph->protocol == IPPROTO_TCP)
> + if (ipproto == IPPROTO_TCP) {
> *tcph = (void *)(skb_network_header(skb) + thoff);
> + } else if (ipproto == IPPROTO_GRE) {
> + struct gre_base_hdr *greh;
> +
> + greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
> + if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
> + return false;
> + }
>
> - ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
> tuple->src_v4.s_addr = iph->saddr;
> tuple->dst_v4.s_addr = iph->daddr;
> - tuple->src_port = ports->source;
> - tuple->dst_port = ports->dest;
> + if (ipproto == IPPROTO_TCP || ipproto == IPPROTO_UDP) {
> + ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
> + tuple->src_port = ports->source;
> + tuple->dst_port = ports->dest;
> + }
> tuple->l3proto = AF_INET;
> - tuple->l4proto = iph->protocol;
> + tuple->l4proto = ipproto;
>
> return true;
> }
> @@ -486,36 +521,58 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft,
> struct flow_ports *ports;
> struct ipv6hdr *ip6h;
> unsigned int thoff;
> + size_t hdrsize;
> + u8 nexthdr;
>
> if (!pskb_network_may_pull(skb, sizeof(*ip6h)))
> return false;
>
> ip6h = ipv6_hdr(skb);
> + thoff = sizeof(*ip6h);
>
> - if (ip6h->nexthdr != IPPROTO_TCP &&
> - ip6h->nexthdr != IPPROTO_UDP)
> - return false;
> + nexthdr = ip6h->nexthdr;
> + switch (nexthdr) {
> + case IPPROTO_TCP:
> + hdrsize = sizeof(struct tcphdr);
> + break;
> + case IPPROTO_UDP:
> + hdrsize = sizeof(*ports);
> + break;
> +#ifdef CONFIG_NF_CT_PROTO_GRE
> + case IPPROTO_GRE:
> + hdrsize = sizeof(struct gre_base_hdr);
> + break;
> +#endif
> + default:
> + return -1;
> + }
>
> if (ip6h->hop_limit <= 1)
> return false;
>
> - thoff = sizeof(*ip6h);
> - if (!pskb_network_may_pull(skb, ip6h->nexthdr == IPPROTO_TCP ?
> - thoff + sizeof(struct tcphdr) :
> - thoff + sizeof(*ports)))
> + if (!pskb_network_may_pull(skb, thoff + hdrsize))
> return false;
>
> ip6h = ipv6_hdr(skb);
> - if (ip6h->nexthdr == IPPROTO_TCP)
> + if (nexthdr == IPPROTO_TCP) {
> *tcph = (void *)(skb_network_header(skb) + thoff);
> + } else if (nexthdr == IPPROTO_GRE) {
> + struct gre_base_hdr *greh;
> +
> + greh = (struct gre_base_hdr *)(skb_network_header(skb) + thoff);
> + if ((greh->flags & GRE_VERSION) != GRE_VERSION_0)
> + return false;
> + }
>
> - ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
> tuple->src_v6 = ip6h->saddr;
> tuple->dst_v6 = ip6h->daddr;
> - tuple->src_port = ports->source;
> - tuple->dst_port = ports->dest;
> + if (nexthdr == IPPROTO_TCP || nexthdr == IPPROTO_UDP) {
> + ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
> + tuple->src_port = ports->source;
> + tuple->dst_port = ports->dest;
> + }
> tuple->l3proto = AF_INET6;
> - tuple->l4proto = ip6h->nexthdr;
> + tuple->l4proto = nexthdr;
>
> return true;
> }
> --
> 1.8.3.1
>
>
The GRE ifdefs I assume are for the same reason you mentioned in other
patch, If so, looks good to me.
Acked-by: Paul Blakey <paulb@nvidia.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/3] netfilter: flowtable: Support GRE
2022-02-09 10:01 ` Pablo Neira Ayuso
@ 2022-02-12 1:54 ` Toshiaki Makita
2022-02-19 12:51 ` Toshiaki Makita
0 siblings, 1 reply; 12+ messages in thread
From: Toshiaki Makita @ 2022-02-12 1:54 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Jozsef Kadlecsik,
Florian Westphal, netdev, netfilter-devel, coreteam, Paul Blakey
On 2022/02/09 19:01, Pablo Neira Ayuso wrote:
> On Tue, Feb 08, 2022 at 11:30:03PM +0900, Toshiaki Makita wrote:
>> On 2022/02/08 2:56, Pablo Neira Ayuso wrote:
>>> On Thu, Feb 03, 2022 at 08:59:39PM +0900, Toshiaki Makita wrote:
> [...]
>>>> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
>>>> index 889cf88..48e2f58 100644
>>>> --- a/net/netfilter/nf_flow_table_ip.c
>>>> +++ b/net/netfilter/nf_flow_table_ip.c
> [...]
>>>> @@ -202,15 +209,25 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
>>>> if (!pskb_may_pull(skb, thoff + *hdrsize))
>>>> return -1;
>>>> + if (ipproto == IPPROTO_GRE) {
>>>
>>> No ifdef here? Maybe remove these ifdef everywhere?
>>
>> I wanted to avoid adding many ifdefs and I expect this to be compiled out
>> when CONFIG_NF_CT_PROTO_GRE=n as this block is unreachable anyway. It rather
>> may have been unintuitive though.
>>
>> Removing all of these ifdefs will cause inconsistent behavior between
>> CONFIG_NF_CT_PROTO_GRE=n/y.
>> When CONFIG_NF_CT_PROTO_GRE=n, conntrack cannot determine GRE version, thus
>> it will track GREv1 without key infomation, and the flow will be offloaded.
>> When CONFIG_NF_CT_PROTO_GRE=y, GREv1 will have key information and will not
>> be offloaded.
>> I wanted to just refuse offloading of GRE to avoid this inconsistency.
>> Anyway this kind of inconsistency seems to happen in software conntrack, so
>> if you'd like to remove ifdefs, I will do.
>
> Good point, thanks for explaining. LGTM.
Let me confirm, did you agree to keep ifdefs, or delete them?
Toshiaki Makita
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/3] netfilter: flowtable: Support GRE
2022-02-12 1:54 ` Toshiaki Makita
@ 2022-02-19 12:51 ` Toshiaki Makita
2022-02-23 10:44 ` Pablo Neira Ayuso
0 siblings, 1 reply; 12+ messages in thread
From: Toshiaki Makita @ 2022-02-19 12:51 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Jozsef Kadlecsik,
Florian Westphal, netdev, netfilter-devel, coreteam, Paul Blakey
Ping, Pablo?
On 2022/02/12 10:54, Toshiaki Makita wrote:
> On 2022/02/09 19:01, Pablo Neira Ayuso wrote:
>> On Tue, Feb 08, 2022 at 11:30:03PM +0900, Toshiaki Makita wrote:
>>> On 2022/02/08 2:56, Pablo Neira Ayuso wrote:
>>>> On Thu, Feb 03, 2022 at 08:59:39PM +0900, Toshiaki Makita wrote:
>> [...]
>>>>> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
>>>>> index 889cf88..48e2f58 100644
>>>>> --- a/net/netfilter/nf_flow_table_ip.c
>>>>> +++ b/net/netfilter/nf_flow_table_ip.c
>> [...]
>>>>> @@ -202,15 +209,25 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const
>>>>> struct net_device *dev,
>>>>> if (!pskb_may_pull(skb, thoff + *hdrsize))
>>>>> return -1;
>>>>> + if (ipproto == IPPROTO_GRE) {
>>>>
>>>> No ifdef here? Maybe remove these ifdef everywhere?
>>>
>>> I wanted to avoid adding many ifdefs and I expect this to be compiled out
>>> when CONFIG_NF_CT_PROTO_GRE=n as this block is unreachable anyway. It rather
>>> may have been unintuitive though.
>>>
>>> Removing all of these ifdefs will cause inconsistent behavior between
>>> CONFIG_NF_CT_PROTO_GRE=n/y.
>>> When CONFIG_NF_CT_PROTO_GRE=n, conntrack cannot determine GRE version, thus
>>> it will track GREv1 without key infomation, and the flow will be offloaded.
>>> When CONFIG_NF_CT_PROTO_GRE=y, GREv1 will have key information and will not
>>> be offloaded.
>>> I wanted to just refuse offloading of GRE to avoid this inconsistency.
>>> Anyway this kind of inconsistency seems to happen in software conntrack, so
>>> if you'd like to remove ifdefs, I will do.
>>
>> Good point, thanks for explaining. LGTM.
>
> Let me confirm, did you agree to keep ifdefs, or delete them?
>
> Toshiaki Makita
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/3] netfilter: flowtable: Support GRE
2022-02-19 12:51 ` Toshiaki Makita
@ 2022-02-23 10:44 ` Pablo Neira Ayuso
0 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2022-02-23 10:44 UTC (permalink / raw)
To: Toshiaki Makita
Cc: David S. Miller, Jakub Kicinski, Saeed Mahameed,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Jozsef Kadlecsik,
Florian Westphal, netdev, netfilter-devel, coreteam, Paul Blakey
On Sat, Feb 19, 2022 at 09:51:37PM +0900, Toshiaki Makita wrote:
> Ping, Pablo?
Please go ahead, these #ifdef can go away later on.
> On 2022/02/12 10:54, Toshiaki Makita wrote:
> > On 2022/02/09 19:01, Pablo Neira Ayuso wrote:
> > > On Tue, Feb 08, 2022 at 11:30:03PM +0900, Toshiaki Makita wrote:
> > > > On 2022/02/08 2:56, Pablo Neira Ayuso wrote:
> > > > > On Thu, Feb 03, 2022 at 08:59:39PM +0900, Toshiaki Makita wrote:
> > > [...]
> > > > > > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> > > > > > index 889cf88..48e2f58 100644
> > > > > > --- a/net/netfilter/nf_flow_table_ip.c
> > > > > > +++ b/net/netfilter/nf_flow_table_ip.c
> > > [...]
> > > > > > @@ -202,15 +209,25 @@ static int nf_flow_tuple_ip(struct
> > > > > > sk_buff *skb, const struct net_device *dev,
> > > > > > if (!pskb_may_pull(skb, thoff + *hdrsize))
> > > > > > return -1;
> > > > > > + if (ipproto == IPPROTO_GRE) {
> > > > >
> > > > > No ifdef here? Maybe remove these ifdef everywhere?
> > > >
> > > > I wanted to avoid adding many ifdefs and I expect this to be compiled out
> > > > when CONFIG_NF_CT_PROTO_GRE=n as this block is unreachable anyway. It rather
> > > > may have been unintuitive though.
> > > >
> > > > Removing all of these ifdefs will cause inconsistent behavior between
> > > > CONFIG_NF_CT_PROTO_GRE=n/y.
> > > > When CONFIG_NF_CT_PROTO_GRE=n, conntrack cannot determine GRE version, thus
> > > > it will track GREv1 without key infomation, and the flow will be offloaded.
> > > > When CONFIG_NF_CT_PROTO_GRE=y, GREv1 will have key information and will not
> > > > be offloaded.
> > > > I wanted to just refuse offloading of GRE to avoid this inconsistency.
> > > > Anyway this kind of inconsistency seems to happen in software conntrack, so
> > > > if you'd like to remove ifdefs, I will do.
> > >
> > > Good point, thanks for explaining. LGTM.
> >
> > Let me confirm, did you agree to keep ifdefs, or delete them?
> >
> > Toshiaki Makita
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-02-23 10:44 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 11:59 [PATCH net-next 0/3] Conntrack GRE offload Toshiaki Makita
2022-02-03 11:59 ` [PATCH net-next 1/3] netfilter: flowtable: Support GRE Toshiaki Makita
2022-02-07 17:56 ` Pablo Neira Ayuso
2022-02-08 14:30 ` Toshiaki Makita
2022-02-09 10:01 ` Pablo Neira Ayuso
2022-02-12 1:54 ` Toshiaki Makita
2022-02-19 12:51 ` Toshiaki Makita
2022-02-23 10:44 ` Pablo Neira Ayuso
2022-02-03 11:59 ` [PATCH net-next 2/3] act_ct: Support GRE offload Toshiaki Makita
2022-02-10 12:48 ` Paul Blakey
2022-02-03 11:59 ` [PATCH net-next 3/3] net/mlx5: Support GRE conntrack offload Toshiaki Makita
2022-02-10 12:44 ` Paul Blakey
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.