netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement
@ 2020-10-06 16:02 Jesper Dangaard Brouer
  2020-10-06 16:02 ` [PATCH bpf-next V1 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:02 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, maze, lmb, shaun, Lorenzo Bianconi, marek,
	John Fastabend, Jakub Kicinski

This patchset drops all the MTU checks in TC BPF-helpers that limits
growing the packet size. This is done because these BPF-helpers doesn't
take redirect into account, which can result in their MTU check being done
against the wrong netdev.

The new approach is to give BPF-programs knowledge about the MTU on a
netdev (via ifindex) and fib route lookup level. Meaning some BPF-helpers
are added and extended to make it possible to do MTU checks in the
BPF-code. If BPF-prog doesn't comply with the MTU this is enforced on the
kernel side.

Realizing MTU should only apply to transmitted packets, the MTU
enforcement is now done after the TC egress hook. This gives TC-BPF
programs most flexibility and allows to shrink packet size again in egress
hook prior to transmit.

This patchset is primarily focused on TC-BPF, but I've made sure that the
MTU BPF-helpers also works for XDP BPF-programs.

---

Jesper Dangaard Brouer (6):
      bpf: Remove MTU check in __bpf_skb_max_len
      bpf: bpf_fib_lookup return MTU value as output when looked up
      bpf: add BPF-helper for reading MTU from net_device via ifindex
      bpf: make it possible to identify BPF redirected SKBs
      bpf: Add MTU check for TC-BPF packets after egress hook
      bpf: drop MTU check when doing TC-BPF redirect to ingress


 include/linux/netdevice.h |    5 ++-
 include/uapi/linux/bpf.h  |   24 +++++++++++-
 net/core/dev.c            |   24 +++++++++++-
 net/core/filter.c         |   88 ++++++++++++++++++++++++++++++++++++++++-----
 net/sched/Kconfig         |    1 +
 5 files changed, 126 insertions(+), 16 deletions(-)

--


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next V1 1/6] bpf: Remove MTU check in __bpf_skb_max_len
  2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
@ 2020-10-06 16:02 ` Jesper Dangaard Brouer
  2020-10-06 16:02 ` [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:02 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, maze, lmb, shaun, Lorenzo Bianconi, marek,
	John Fastabend, Jakub Kicinski

Multiple BPF-helpers that can manipulate/increase the size of the SKB uses
__bpf_skb_max_len() as the max-length. This function limit size against
the current net_device MTU (skb->dev->mtu).

When a BPF-prog grow the packet size, then it should not be limited to the
MTU. The MTU is a transmit limitation, and software receiving this packet
should be allowed to increase the size. Further more, current MTU check in
__bpf_skb_max_len uses the MTU from ingress/current net_device, which in
case of redirects uses the wrong net_device.

Keep a sanity max limit of IP_MAX_MTU which is 64KiB.

In later patches we will enforce the MTU limitation when transmitting
packets.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
(imported from commit 37f8552786cf46588af52b77829b730dd14524d3)
---
 net/core/filter.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 05df73780dd3..fed239e77bdc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3476,8 +3476,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 
 static u32 __bpf_skb_max_len(const struct sk_buff *skb)
 {
-	return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
-			  SKB_MAX_ALLOC;
+	return IP_MAX_MTU;
 }
 
 BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up
  2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
  2020-10-06 16:02 ` [PATCH bpf-next V1 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
@ 2020-10-06 16:02 ` Jesper Dangaard Brouer
  2020-10-07  1:34   ` Maciej Żenczykowski
  2020-10-07  7:28   ` kernel test robot
  2020-10-06 16:03 ` [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:02 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, maze, lmb, shaun, Lorenzo Bianconi, marek,
	John Fastabend, Jakub Kicinski

The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED.  The BPF-prog
don't know the MTU value that caused this rejection.

If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
need to know this MTU value for the ICMP packet.

Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
value as output via a union with 'tot_len' as this is the value used for
the MTU lookup.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h |   11 +++++++++--
 net/core/filter.c        |   17 ++++++++++++-----
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c446394135be..50ce65e37b16 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2216,6 +2216,9 @@ union bpf_attr {
  *		* > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
  *		  packet is not forwarded or needs assist from full stack
  *
+ *		If lookup fails with BPF_FIB_LKUP_RET_FRAG_NEEDED, then the MTU
+ *		was exceeded and result params->mtu contains the MTU.
+ *
  * long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
  *	Description
  *		Add an entry to, or update a sockhash *map* referencing sockets.
@@ -4844,9 +4847,13 @@ struct bpf_fib_lookup {
 	__be16	sport;
 	__be16	dport;
 
-	/* total length of packet from network header - used for MTU check */
-	__u16	tot_len;
+	union {	/* used for MTU check */
+		/* input to lookup */
+		__u16	tot_len; /* total length of packet from network hdr */
 
+		/* output: MTU value (if requested check_mtu) */
+		__u16	mtu;
+	};
 	/* input: L3 device index for lookup
 	 * output: device index from FIB lookup
 	 */
diff --git a/net/core/filter.c b/net/core/filter.c
index fed239e77bdc..d84723f347c0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5185,13 +5185,14 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
 #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
 static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
 				  const struct neighbour *neigh,
-				  const struct net_device *dev)
+				  const struct net_device *dev, u32 mtu)
 {
 	memcpy(params->dmac, neigh->ha, ETH_ALEN);
 	memcpy(params->smac, dev->dev_addr, ETH_ALEN);
 	params->h_vlan_TCI = 0;
 	params->h_vlan_proto = 0;
 	params->ifindex = dev->ifindex;
+	params->mtu = mtu;
 
 	return 0;
 }
@@ -5275,8 +5276,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 	if (check_mtu) {
 		mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
-		if (params->tot_len > mtu)
+		if (params->tot_len > mtu) {
+			params->mtu = mtu; /* union with tot_len */
 			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
+		}
 	}
 
 	nhc = res.nhc;
@@ -5309,7 +5312,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (!neigh)
 		return BPF_FIB_LKUP_RET_NO_NEIGH;
 
-	return bpf_fib_set_fwd_params(params, neigh, dev);
+	return bpf_fib_set_fwd_params(params, neigh, dev, mtu);
 }
 #endif
 
@@ -5401,8 +5404,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 	if (check_mtu) {
 		mtu = ipv6_stub->ip6_mtu_from_fib6(&res, dst, src);
-		if (params->tot_len > mtu)
+		if (params->tot_len > mtu) {
+			params->mtu = mtu; /* union with tot_len */
 			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
+		}
 	}
 
 	if (res.nh->fib_nh_lws)
@@ -5421,7 +5426,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (!neigh)
 		return BPF_FIB_LKUP_RET_NO_NEIGH;
 
-	return bpf_fib_set_fwd_params(params, neigh, dev);
+	return bpf_fib_set_fwd_params(params, neigh, dev, mtu);
 }
 #endif
 
@@ -5490,6 +5495,8 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
 		dev = dev_get_by_index_rcu(net, params->ifindex);
 		if (!is_skb_forwardable(dev, skb))
 			rc = BPF_FIB_LKUP_RET_FRAG_NEEDED;
+
+		params->mtu = dev->mtu; /* union with tot_len */
 	}
 
 	return rc;



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex
  2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
  2020-10-06 16:02 ` [PATCH bpf-next V1 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
  2020-10-06 16:02 ` [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
@ 2020-10-06 16:03 ` Jesper Dangaard Brouer
  2020-10-06 16:33   ` Jesper Dangaard Brouer
  2020-10-06 16:03 ` [PATCH bpf-next V1 4/6] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, maze, lmb, shaun, Lorenzo Bianconi, marek,
	John Fastabend, Jakub Kicinski

FIXME: add description.

FIXME: IMHO we can create a better BPF-helper named bpf_mtu_check()
instead of bpf_mtu_lookup(), because a flag can be used for requesting
GRO segment size checking.  The ret value of bpf_mtu_check() says
if MTU was violoated, but also return MTU via pointer arg to allow
BPF-progs to do own logic.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h |   13 +++++++++++
 net/core/filter.c        |   56 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 50ce65e37b16..29b335cb96ef 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3718,6 +3718,18 @@ union bpf_attr {
  *		never return NULL.
  *	Return
  *		A pointer pointing to the kernel percpu variable on this cpu.
+ *
+ * int bpf_mtu_lookup(void *ctx, u32 ifindex, u64 flags)
+ *	Description
+ *		Lookup MTU of net device based on ifindex.  The Linux kernel
+ *		route table can configure MTUs on a more specific per route
+ *		level, which is not provided by this helper. For route level
+ *		MTU checks use the **bpf_fib_lookup**\ () helper.
+ *
+ *		*ctx* is either **struct xdp_md** for XDP programs or
+ *		**struct sk_buff** tc cls_act programs.
+ *	Return
+ *		On success, MTU size is returned. On error, a negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3875,6 +3887,7 @@ union bpf_attr {
 	FN(redirect_neigh),		\
 	FN(bpf_per_cpu_ptr),            \
 	FN(bpf_this_cpu_ptr),		\
+	FN(mtu_lookup),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index d84723f347c0..49ae3b80027b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5512,6 +5512,58 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
+static int bpf_mtu_lookup(struct net *netns, u32 ifindex, u64 flags)
+{
+	struct net_device *dev;
+
+	// XXX: Do we even need flags?
+	// Flag idea: get ctx dev->mtu for XDP_TX or redir out-same-dev
+	if (flags)
+		return -EINVAL;
+
+	dev = dev_get_by_index_rcu(netns, ifindex);
+	if (!dev)
+		return -ENODEV;
+
+	return dev->mtu;
+}
+
+BPF_CALL_3(bpf_skb_mtu_lookup, struct sk_buff *, skb,
+	   u32, ifindex, u64, flags)
+{
+	struct net *netns = dev_net(skb->dev);
+
+	return bpf_mtu_lookup(netns, ifindex, flags);
+}
+
+BPF_CALL_3(bpf_xdp_mtu_lookup, struct xdp_buff *, xdp,
+	   u32, ifindex, u64, flags)
+{
+	struct net *netns = dev_net(xdp->rxq->dev);
+	// XXX: Handle if this runs in devmap prog (then is rxq invalid?)
+
+	return bpf_mtu_lookup(netns, ifindex, flags);
+}
+
+static const struct bpf_func_proto bpf_skb_mtu_lookup_proto = {
+	.func		= bpf_skb_mtu_lookup,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_ANYTHING,
+	.arg3_type      = ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto bpf_xdp_mtu_lookup_proto = {
+	.func		= bpf_xdp_mtu_lookup,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_ANYTHING,
+	.arg3_type      = ARG_ANYTHING,
+};
+
+
 #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
 static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
 {
@@ -7075,6 +7127,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_socket_uid_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_skb_fib_lookup_proto;
+	case BPF_FUNC_mtu_lookup:
+		return &bpf_skb_mtu_lookup_proto;
 	case BPF_FUNC_sk_fullsock:
 		return &bpf_sk_fullsock_proto;
 	case BPF_FUNC_sk_storage_get:
@@ -7144,6 +7198,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_adjust_tail_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_xdp_fib_lookup_proto;
+	case BPF_FUNC_mtu_lookup:
+		return &bpf_xdp_mtu_lookup_proto;
 #ifdef CONFIG_INET
 	case BPF_FUNC_sk_lookup_udp:
 		return &bpf_xdp_sk_lookup_udp_proto;



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next V1 4/6] bpf: make it possible to identify BPF redirected SKBs
  2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2020-10-06 16:03 ` [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex Jesper Dangaard Brouer
@ 2020-10-06 16:03 ` Jesper Dangaard Brouer
  2020-10-06 16:03 ` [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
  2020-10-06 16:03 ` [PATCH bpf-next V1 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer
  5 siblings, 0 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, maze, lmb, shaun, Lorenzo Bianconi, marek,
	John Fastabend, Jakub Kicinski

This change makes it possible to identify SKBs that have been redirected
by TC-BPF (cls_act). This is needed for a number of cases.

(1) For collaborating with driver ifb net_devices.
(2) For avoiding starting generic-XDP prog on TC ingress redirect.
(3) Next MTU check patches need ability to identify redirected SKBs.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c    |    2 ++
 net/sched/Kconfig |    1 +
 2 files changed, 3 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9d55bf5d1a65..b433098896b2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3885,6 +3885,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 		return NULL;
 	case TC_ACT_REDIRECT:
 		/* No need to push/pop skb's mac_header here on egress! */
+		skb_set_redirected(skb, false);
 		skb_do_redirect(skb);
 		*ret = NET_XMIT_SUCCESS;
 		return NULL;
@@ -4974,6 +4975,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		 * redirecting to another netdev
 		 */
 		__skb_push(skb, skb->mac_len);
+		skb_set_redirected(skb, true);
 		skb_do_redirect(skb);
 		return NULL;
 	case TC_ACT_CONSUMED:
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a3b37d88800e..a1bbaa8fd054 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -384,6 +384,7 @@ config NET_SCH_INGRESS
 	depends on NET_CLS_ACT
 	select NET_INGRESS
 	select NET_EGRESS
+	select NET_REDIRECT
 	help
 	  Say Y here if you want to use classifiers for incoming and/or outgoing
 	  packets. This qdisc doesn't do anything else besides running classifiers,



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
  2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2020-10-06 16:03 ` [PATCH bpf-next V1 4/6] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
@ 2020-10-06 16:03 ` Jesper Dangaard Brouer
  2020-10-06 20:09   ` kernel test robot
  2020-10-07  0:26   ` kernel test robot
  2020-10-06 16:03 ` [PATCH bpf-next V1 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer
  5 siblings, 2 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, maze, lmb, shaun, Lorenzo Bianconi, marek,
	John Fastabend, Jakub Kicinski

The MTU should only apply to transmitted packets.

When TC-ingress redirect packet to egress on another netdev, then the
normal netstack MTU checks are skipped (and driver level will not catch
any MTU violation, checked ixgbe).

This patch choose not to add MTU check in the egress code path of
skb_do_redirect() prior to calling dev_queue_xmit(), because it is still
possible to run another BPF egress program that will shrink/consume
headers, which will make packet comply with netdev MTU. This use-case
might already be in production use (if ingress MTU is larger than egress).

Instead do the MTU check after sch_handle_egress() step, for the cases
that require this.

The cases need a bit explaining. Ingress to egress redirected packets
could be detected via skb->tc_at_ingress bit, but it is not reliable,
because sch_handle_egress() could steal the packet and redirect this
(again) to another egress netdev, which will then have the
skb->tc_at_ingress cleared. There is also the case of TC-egress prog
increase packet size and then redirect it egress. Thus, it is more
reliable to do the MTU check for any redirected packet (both ingress and
egress), which is available via skb_is_redirected() in earlier patch.
Also handle case where egress BPF-prog increased size.

One advantage of this approach is that it ingress-to-egress BPF-prog can
send information via packet data. With the MTU checks removed in the
helpers, and also not done in skb_do_redirect() call, this allows for an
ingress BPF-prog to communicate with an egress BPF-prog via packet data,
as long as egress BPF-prog remove this prior to transmitting packet.

Troubleshooting: MTU violations are recorded in TX dropped counter, and
kprobe on dev_queue_xmit() have retval -EMSGSIZE.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b433098896b2..33529022b30d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3870,6 +3870,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
+		*ret = NET_XMIT_SUCCESS;
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
@@ -4064,9 +4065,10 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 {
 	struct net_device *dev = skb->dev;
 	struct netdev_queue *txq;
+	bool mtu_check = false;
+	bool again = false;
 	struct Qdisc *q;
 	int rc = -ENOMEM;
-	bool again = false;
 
 	skb_reset_mac_header(skb);
 
@@ -4082,14 +4084,28 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 
 	qdisc_pkt_len_init(skb);
 #ifdef CONFIG_NET_CLS_ACT
+	mtu_check = skb_is_redirected(skb);
 	skb->tc_at_ingress = 0;
 # ifdef CONFIG_NET_EGRESS
 	if (static_branch_unlikely(&egress_needed_key)) {
+		unsigned int len_orig = skb->len;
+
 		skb = sch_handle_egress(skb, &rc, dev);
 		if (!skb)
 			goto out;
+		/* BPF-prog ran and could have changed packet size beyond MTU */
+		if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
+			mtu_check = true;
 	}
 # endif
+	/* MTU-check only happens on "last" net_device in a redirect sequence
+	 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
+	 * either ingress or egress to another device).
+	 */
+	if (mtu_check && !is_skb_forwardable(dev, skb)) {
+		rc = -EMSGSIZE;
+		goto drop;
+	}
 #endif
 	/* If device/qdisc don't need skb->dst, release it right now while
 	 * its hot in this cpu cache.
@@ -4157,7 +4173,7 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 
 	rc = -ENETDOWN;
 	rcu_read_unlock_bh();
-
+drop:
 	atomic_long_inc(&dev->tx_dropped);
 	kfree_skb_list(skb);
 	return rc;



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next V1 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress
  2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
                   ` (4 preceding siblings ...)
  2020-10-06 16:03 ` [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
@ 2020-10-06 16:03 ` Jesper Dangaard Brouer
  5 siblings, 0 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, maze, lmb, shaun, Lorenzo Bianconi, marek,
	John Fastabend, Jakub Kicinski

The use-case for dropping the MTU check when TC-BPF ingress redirecting a
packet, is described by Eyal Birger in email[0]. The summary is the
ability to increase packet size (e.g. with IPv6 headers for NAT64) and
ingress redirect packet and let normal netstack fragment packet as needed.

[0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/netdevice.h |    5 +++--
 net/core/dev.c            |    2 +-
 net/core/filter.c         |   12 ++++++++++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 28cfa53daf72..58fb7b4869ba 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3866,10 +3866,11 @@ bool is_skb_forwardable(const struct net_device *dev,
 			const struct sk_buff *skb);
 
 static __always_inline int ____dev_forward_skb(struct net_device *dev,
-					       struct sk_buff *skb)
+					       struct sk_buff *skb,
+					       const bool mtu_check)
 {
 	if (skb_orphan_frags(skb, GFP_ATOMIC) ||
-	    unlikely(!is_skb_forwardable(dev, skb))) {
+	    (mtu_check && unlikely(!is_skb_forwardable(dev, skb)))) {
 		atomic_long_inc(&dev->rx_dropped);
 		kfree_skb(skb);
 		return NET_RX_DROP;
diff --git a/net/core/dev.c b/net/core/dev.c
index 33529022b30d..154c342bc061 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2209,7 +2209,7 @@ EXPORT_SYMBOL_GPL(is_skb_forwardable);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 {
-	int ret = ____dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, true);
 
 	if (likely(!ret)) {
 		skb->protocol = eth_type_trans(skb, dev);
diff --git a/net/core/filter.c b/net/core/filter.c
index 49ae3b80027b..69cd6de41f64 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2083,13 +2083,21 @@ static const struct bpf_func_proto bpf_csum_level_proto = {
 
 static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
 {
-	return dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, false);
+
+	if (likely(!ret)) {
+		skb->protocol = eth_type_trans(skb, dev);
+		skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+		ret = netif_rx(skb);
+	}
+
+	return ret;
 }
 
 static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
 				      struct sk_buff *skb)
 {
-	int ret = ____dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, false);
 
 	if (likely(!ret)) {
 		skb->dev = dev;



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex
  2020-10-06 16:03 ` [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex Jesper Dangaard Brouer
@ 2020-10-06 16:33   ` Jesper Dangaard Brouer
  2020-10-07  1:18     ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:33 UTC (permalink / raw)
  To: bpf
  Cc: netdev, Daniel Borkmann, Alexei Starovoitov, maze, lmb, shaun,
	Lorenzo Bianconi, marek, John Fastabend, Jakub Kicinski, brouer

On Tue, 06 Oct 2020 18:03:01 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> FIXME: add description.

Ups, I will obviously send a V2.

I still want feedback on whether I should implement another BPF-helper
as sketched below:

> FIXME: IMHO we can create a better BPF-helper named bpf_mtu_check()
> instead of bpf_mtu_lookup(), because a flag can be used for requesting
> GRO segment size checking.  The ret value of bpf_mtu_check() says
> if MTU was violoated, but also return MTU via pointer arg to allow
> BPF-progs to do own logic.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/uapi/linux/bpf.h |   13 +++++++++++
>  net/core/filter.c        |   56 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 50ce65e37b16..29b335cb96ef 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3718,6 +3718,18 @@ union bpf_attr {
>   *		never return NULL.
>   *	Return
>   *		A pointer pointing to the kernel percpu variable on this cpu.
> + *
> + * int bpf_mtu_lookup(void *ctx, u32 ifindex, u64 flags)
> + *	Description
> + *		Lookup MTU of net device based on ifindex.  The Linux kernel
> + *		route table can configure MTUs on a more specific per route
> + *		level, which is not provided by this helper. For route level
> + *		MTU checks use the **bpf_fib_lookup**\ () helper.
> + *
> + *		*ctx* is either **struct xdp_md** for XDP programs or
> + *		**struct sk_buff** tc cls_act programs.
> + *	Return
> + *		On success, MTU size is returned. On error, a negative value.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -3875,6 +3887,7 @@ union bpf_attr {
>  	FN(redirect_neigh),		\
>  	FN(bpf_per_cpu_ptr),            \
>  	FN(bpf_this_cpu_ptr),		\
> +	FN(mtu_lookup),			\
>  	/* */
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/net/core/filter.c b/net/core/filter.c
> index d84723f347c0..49ae3b80027b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5512,6 +5512,58 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
>  	.arg4_type	= ARG_ANYTHING,
>  };
>  
> +static int bpf_mtu_lookup(struct net *netns, u32 ifindex, u64 flags)
> +{
> +	struct net_device *dev;
> +
> +	// XXX: Do we even need flags?
> +	// Flag idea: get ctx dev->mtu for XDP_TX or redir out-same-dev
> +	if (flags)
> +		return -EINVAL;
> +
> +	dev = dev_get_by_index_rcu(netns, ifindex);
> +	if (!dev)
> +		return -ENODEV;
> +
> +	return dev->mtu;
> +}
> +
> +BPF_CALL_3(bpf_skb_mtu_lookup, struct sk_buff *, skb,
> +	   u32, ifindex, u64, flags)
> +{
> +	struct net *netns = dev_net(skb->dev);
> +
> +	return bpf_mtu_lookup(netns, ifindex, flags);
> +}
> +
> +BPF_CALL_3(bpf_xdp_mtu_lookup, struct xdp_buff *, xdp,
> +	   u32, ifindex, u64, flags)
> +{
> +	struct net *netns = dev_net(xdp->rxq->dev);
> +	// XXX: Handle if this runs in devmap prog (then is rxq invalid?)
> +
> +	return bpf_mtu_lookup(netns, ifindex, flags);
> +}
> +
> +static const struct bpf_func_proto bpf_skb_mtu_lookup_proto = {
> +	.func		= bpf_skb_mtu_lookup,
> +	.gpl_only	= true,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type      = ARG_PTR_TO_CTX,
> +	.arg2_type      = ARG_ANYTHING,
> +	.arg3_type      = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto bpf_xdp_mtu_lookup_proto = {
> +	.func		= bpf_xdp_mtu_lookup,
> +	.gpl_only	= true,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type      = ARG_PTR_TO_CTX,
> +	.arg2_type      = ARG_ANYTHING,
> +	.arg3_type      = ARG_ANYTHING,
> +};
> +
> +
>  #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
>  static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
>  {
> @@ -7075,6 +7127,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_get_socket_uid_proto;
>  	case BPF_FUNC_fib_lookup:
>  		return &bpf_skb_fib_lookup_proto;
> +	case BPF_FUNC_mtu_lookup:
> +		return &bpf_skb_mtu_lookup_proto;
>  	case BPF_FUNC_sk_fullsock:
>  		return &bpf_sk_fullsock_proto;
>  	case BPF_FUNC_sk_storage_get:
> @@ -7144,6 +7198,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_xdp_adjust_tail_proto;
>  	case BPF_FUNC_fib_lookup:
>  		return &bpf_xdp_fib_lookup_proto;
> +	case BPF_FUNC_mtu_lookup:
> +		return &bpf_xdp_mtu_lookup_proto;
>  #ifdef CONFIG_INET
>  	case BPF_FUNC_sk_lookup_udp:
>  		return &bpf_xdp_sk_lookup_udp_proto;
> 
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
  2020-10-06 16:03 ` [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
@ 2020-10-06 20:09   ` kernel test robot
  2020-10-07  0:26   ` kernel test robot
  1 sibling, 0 replies; 19+ messages in thread
From: kernel test robot @ 2020-10-06 20:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, bpf
  Cc: kbuild-all, clang-built-linux, Jesper Dangaard Brouer, netdev,
	Daniel Borkmann, Alexei Starovoitov, maze, lmb, shaun,
	Lorenzo Bianconi, marek

[-- Attachment #1: Type: text/plain, Size: 26387 bytes --]

Hi Jesper,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: s390-randconfig-r026-20201006 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1127662c6dc2a276839c75a42238b11a3ad00f32)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/0day-ci/linux/commit/2065cee7d6b74c8f1dabae4e4e15999a841e3349
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
        git checkout 2065cee7d6b74c8f1dabae4e4e15999a841e3349
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x0000ff00UL) <<  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x00ff0000UL) >>  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0xff000000UL) >> 24)))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
           __fswab32(x))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   net/core/dev.c:4068:7: warning: unused variable 'mtu_check' [-Wunused-variable]
           bool mtu_check = false;
                ^
>> net/core/dev.c:4176:1: warning: unused label 'drop' [-Wunused-label]
   drop:
   ^~~~~
   net/core/dev.c:4949:1: warning: unused function 'sch_handle_ingress' [-Wunused-function]
   sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
   ^
   net/core/dev.c:5094:19: warning: unused function 'nf_ingress' [-Wunused-function]
   static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
                     ^
   In file included from net/core/dev.c:71:
   In file included from include/linux/uaccess.h:6:
   In file included from include/linux/sched.h:14:
   In file included from include/linux/pid.h:5:
   In file included from include/linux/rculist.h:10:
   In file included from include/linux/list.h:9:
   In file included from include/linux/kernel.h:12:
   In file included from include/linux/bitops.h:29:
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   24 warnings and 14 errors generated.
--
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x0000ff00UL) <<  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x00ff0000UL) >>  8) |            \
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0xff000000UL) >> 24)))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
           __fswab32(x))
                     ^
   In file included from net/core/dev.c:89:
   In file included from include/linux/if_ether.h:19:
   In file included from include/linux/skbuff.h:31:
   In file included from include/linux/dma-mapping.h:11:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> net/core/dev.c:4176:1: warning: unused label 'drop' [-Wunused-label]
   drop:
   ^~~~~
   net/core/dev.c:4068:7: warning: unused variable 'mtu_check' [-Wunused-variable]
           bool mtu_check = false;
                ^
   net/core/dev.c:4949:1: warning: unused function 'sch_handle_ingress' [-Wunused-function]
   sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
   ^
   net/core/dev.c:5094:19: warning: unused function 'nf_ingress' [-Wunused-function]
   static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
                     ^
   In file included from net/core/dev.c:71:
   In file included from include/linux/uaccess.h:6:
   In file included from include/linux/sched.h:14:
   In file included from include/linux/pid.h:5:
   In file included from include/linux/rculist.h:10:
   In file included from include/linux/list.h:9:
   In file included from include/linux/kernel.h:12:
   In file included from include/linux/bitops.h:29:
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
                           "oi     %0,%b1\n"
                           ^
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:69:4: error: invalid operand in inline asm: 'oi	$0,${1:b}'
   arch/s390/include/asm/bitops.h:90:4: error: invalid operand in inline asm: 'ni	$0,${1:b}'
                           "ni     %0,%b1\n"
                           ^
   24 warnings and 14 errors generated.

vim +/drop +4176 net/core/dev.c

  4037	
  4038	/**
  4039	 *	__dev_queue_xmit - transmit a buffer
  4040	 *	@skb: buffer to transmit
  4041	 *	@sb_dev: suboordinate device used for L2 forwarding offload
  4042	 *
  4043	 *	Queue a buffer for transmission to a network device. The caller must
  4044	 *	have set the device and priority and built the buffer before calling
  4045	 *	this function. The function can be called from an interrupt.
  4046	 *
  4047	 *	A negative errno code is returned on a failure. A success does not
  4048	 *	guarantee the frame will be transmitted as it may be dropped due
  4049	 *	to congestion or traffic shaping.
  4050	 *
  4051	 * -----------------------------------------------------------------------------------
  4052	 *      I notice this method can also return errors from the queue disciplines,
  4053	 *      including NET_XMIT_DROP, which is a positive value.  So, errors can also
  4054	 *      be positive.
  4055	 *
  4056	 *      Regardless of the return value, the skb is consumed, so it is currently
  4057	 *      difficult to retry a send to this method.  (You can bump the ref count
  4058	 *      before sending to hold a reference for retry if you are careful.)
  4059	 *
  4060	 *      When calling this method, interrupts MUST be enabled.  This is because
  4061	 *      the BH enable code must have IRQs enabled so that it will not deadlock.
  4062	 *          --BLG
  4063	 */
  4064	static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
  4065	{
  4066		struct net_device *dev = skb->dev;
  4067		struct netdev_queue *txq;
  4068		bool mtu_check = false;
  4069		bool again = false;
  4070		struct Qdisc *q;
  4071		int rc = -ENOMEM;
  4072	
  4073		skb_reset_mac_header(skb);
  4074	
  4075		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
  4076			__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED);
  4077	
  4078		/* Disable soft irqs for various locks below. Also
  4079		 * stops preemption for RCU.
  4080		 */
  4081		rcu_read_lock_bh();
  4082	
  4083		skb_update_prio(skb);
  4084	
  4085		qdisc_pkt_len_init(skb);
  4086	#ifdef CONFIG_NET_CLS_ACT
  4087		mtu_check = skb_is_redirected(skb);
  4088		skb->tc_at_ingress = 0;
  4089	# ifdef CONFIG_NET_EGRESS
  4090		if (static_branch_unlikely(&egress_needed_key)) {
  4091			unsigned int len_orig = skb->len;
  4092	
  4093			skb = sch_handle_egress(skb, &rc, dev);
  4094			if (!skb)
  4095				goto out;
  4096			/* BPF-prog ran and could have changed packet size beyond MTU */
  4097			if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
  4098				mtu_check = true;
  4099		}
  4100	# endif
  4101		/* MTU-check only happens on "last" net_device in a redirect sequence
  4102		 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
  4103		 * either ingress or egress to another device).
  4104		 */
  4105		if (mtu_check && !is_skb_forwardable(dev, skb)) {
  4106			rc = -EMSGSIZE;
  4107			goto drop;
  4108		}
  4109	#endif
  4110		/* If device/qdisc don't need skb->dst, release it right now while
  4111		 * its hot in this cpu cache.
  4112		 */
  4113		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
  4114			skb_dst_drop(skb);
  4115		else
  4116			skb_dst_force(skb);
  4117	
  4118		txq = netdev_core_pick_tx(dev, skb, sb_dev);
  4119		q = rcu_dereference_bh(txq->qdisc);
  4120	
  4121		trace_net_dev_queue(skb);
  4122		if (q->enqueue) {
  4123			rc = __dev_xmit_skb(skb, q, dev, txq);
  4124			goto out;
  4125		}
  4126	
  4127		/* The device has no queue. Common case for software devices:
  4128		 * loopback, all the sorts of tunnels...
  4129	
  4130		 * Really, it is unlikely that netif_tx_lock protection is necessary
  4131		 * here.  (f.e. loopback and IP tunnels are clean ignoring statistics
  4132		 * counters.)
  4133		 * However, it is possible, that they rely on protection
  4134		 * made by us here.
  4135	
  4136		 * Check this and shot the lock. It is not prone from deadlocks.
  4137		 *Either shot noqueue qdisc, it is even simpler 8)
  4138		 */
  4139		if (dev->flags & IFF_UP) {
  4140			int cpu = smp_processor_id(); /* ok because BHs are off */
  4141	
  4142			if (txq->xmit_lock_owner != cpu) {
  4143				if (dev_xmit_recursion())
  4144					goto recursion_alert;
  4145	
  4146				skb = validate_xmit_skb(skb, dev, &again);
  4147				if (!skb)
  4148					goto out;
  4149	
  4150				HARD_TX_LOCK(dev, txq, cpu);
  4151	
  4152				if (!netif_xmit_stopped(txq)) {
  4153					dev_xmit_recursion_inc();
  4154					skb = dev_hard_start_xmit(skb, dev, txq, &rc);
  4155					dev_xmit_recursion_dec();
  4156					if (dev_xmit_complete(rc)) {
  4157						HARD_TX_UNLOCK(dev, txq);
  4158						goto out;
  4159					}
  4160				}
  4161				HARD_TX_UNLOCK(dev, txq);
  4162				net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
  4163						     dev->name);
  4164			} else {
  4165				/* Recursion is detected! It is possible,
  4166				 * unfortunately
  4167				 */
  4168	recursion_alert:
  4169				net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
  4170						     dev->name);
  4171			}
  4172		}
  4173	
  4174		rc = -ENETDOWN;
  4175		rcu_read_unlock_bh();
> 4176	drop:
  4177		atomic_long_inc(&dev->tx_dropped);
  4178		kfree_skb_list(skb);
  4179		return rc;
  4180	out:
  4181		rcu_read_unlock_bh();
  4182		return rc;
  4183	}
  4184	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27068 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
  2020-10-06 16:03 ` [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
  2020-10-06 20:09   ` kernel test robot
@ 2020-10-07  0:26   ` kernel test robot
  1 sibling, 0 replies; 19+ messages in thread
From: kernel test robot @ 2020-10-07  0:26 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, bpf
  Cc: kbuild-all, clang-built-linux, Jesper Dangaard Brouer, netdev,
	Daniel Borkmann, Alexei Starovoitov, maze, lmb, shaun,
	Lorenzo Bianconi, marek

[-- Attachment #1: Type: text/plain, Size: 10654 bytes --]

Hi Jesper,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: mips-randconfig-r024-20201005 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1127662c6dc2a276839c75a42238b11a3ad00f32)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install mips cross compiling tool for clang build
        # apt-get install binutils-mips-linux-gnu
        # https://github.com/0day-ci/linux/commit/2065cee7d6b74c8f1dabae4e4e15999a841e3349
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
        git checkout 2065cee7d6b74c8f1dabae4e4e15999a841e3349
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   net/core/dev.c:4176:1: warning: unused label 'drop'
   drop:
   ^~~~~
>> net/core/dev.c:4068:7: warning: unused variable 'mtu_check'
   bool mtu_check = false;
   ^
   net/core/dev.c:4949:1: warning: unused function 'sch_handle_ingress'
   sch_handle_ingress(struct sk_buff struct packet_type int
   ^
   net/core/dev.c:5094:19: warning: unused function 'nf_ingress'
   static inline int nf_ingress(struct sk_buff struct packet_type
   ^
   fatal error: error in backend: Nested variants found in inline asm string: ' .set push
   .set noat
   .set push
   .set arch=r4000
   .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/cmpxchg.h", .line = 163, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   1: ll $0, $2 # __cmpxchg_asm
   bne $0, ${3:z}, 2f
   .set pop
   move $$1, ${4:z}
   .set arch=r4000
   sc $$1, $1
   beqz $$1, 1b
   .set pop
   2: .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/cmpxchg.h", .line = 163, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   '
   clang-12: error: clang frontend command failed with exit code 70 (use -v to see invocation)
   clang version 12.0.0 (git://gitmirror/llvm_project 1127662c6dc2a276839c75a42238b11a3ad00f32)
   Target: mipsel-unknown-linux-gnu
   Thread model: posix
   InstalledDir: /opt/cross/clang-1127662c6d/bin
   clang-12: note: diagnostic msg:
   Makefile arch drivers include kernel net scripts source usr
--
>> net/core/dev.c:4068:7: warning: unused variable 'mtu_check'
   bool mtu_check = false;
   ^
   net/core/dev.c:4176:1: warning: unused label 'drop'
   drop:
   ^~~~~
   net/core/dev.c:4949:1: warning: unused function 'sch_handle_ingress'
   sch_handle_ingress(struct sk_buff struct packet_type int
   ^
   net/core/dev.c:5094:19: warning: unused function 'nf_ingress'
   static inline int nf_ingress(struct sk_buff struct packet_type
   ^
   fatal error: error in backend: Nested variants found in inline asm string: ' .set push
   .set noat
   .set push
   .set arch=r4000
   .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/cmpxchg.h", .line = 163, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   1: ll $0, $2 # __cmpxchg_asm
   bne $0, ${3:z}, 2f
   .set pop
   move $$1, ${4:z}
   .set arch=r4000
   sc $$1, $1
   beqz $$1, 1b
   .set pop
   2: .if ( 0x00 ) != -1)) 0x00 ) != -1)) : ($( static struct ftrace_branch_data __attribute__((__aligned__(4))) __attribute__((__section__("_ftrace_branch"))) __if_trace = $( .func = __func__, .file = "arch/mips/include/asm/cmpxchg.h", .line = 163, $); 0x00 ) != -1)) : $))) ) && ( 0 ); .set push; .set mips64r2; .rept 1; sync 0x00; .endr; .set pop; .else; ; .endif
   '
   clang-12: error: clang frontend command failed with exit code 70 (use -v to see invocation)
   clang version 12.0.0 (git://gitmirror/llvm_project 1127662c6dc2a276839c75a42238b11a3ad00f32)
   Target: mipsel-unknown-linux-gnu
   Thread model: posix
   InstalledDir: /opt/cross/clang-1127662c6d/bin
   clang-12: note: diagnostic msg:
   Makefile arch drivers include kernel net scripts source usr

vim +/mtu_check +4068 net/core/dev.c

  4037	
  4038	/**
  4039	 *	__dev_queue_xmit - transmit a buffer
  4040	 *	@skb: buffer to transmit
  4041	 *	@sb_dev: suboordinate device used for L2 forwarding offload
  4042	 *
  4043	 *	Queue a buffer for transmission to a network device. The caller must
  4044	 *	have set the device and priority and built the buffer before calling
  4045	 *	this function. The function can be called from an interrupt.
  4046	 *
  4047	 *	A negative errno code is returned on a failure. A success does not
  4048	 *	guarantee the frame will be transmitted as it may be dropped due
  4049	 *	to congestion or traffic shaping.
  4050	 *
  4051	 * -----------------------------------------------------------------------------------
  4052	 *      I notice this method can also return errors from the queue disciplines,
  4053	 *      including NET_XMIT_DROP, which is a positive value.  So, errors can also
  4054	 *      be positive.
  4055	 *
  4056	 *      Regardless of the return value, the skb is consumed, so it is currently
  4057	 *      difficult to retry a send to this method.  (You can bump the ref count
  4058	 *      before sending to hold a reference for retry if you are careful.)
  4059	 *
  4060	 *      When calling this method, interrupts MUST be enabled.  This is because
  4061	 *      the BH enable code must have IRQs enabled so that it will not deadlock.
  4062	 *          --BLG
  4063	 */
  4064	static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
  4065	{
  4066		struct net_device *dev = skb->dev;
  4067		struct netdev_queue *txq;
> 4068		bool mtu_check = false;
  4069		bool again = false;
  4070		struct Qdisc *q;
  4071		int rc = -ENOMEM;
  4072	
  4073		skb_reset_mac_header(skb);
  4074	
  4075		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
  4076			__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED);
  4077	
  4078		/* Disable soft irqs for various locks below. Also
  4079		 * stops preemption for RCU.
  4080		 */
  4081		rcu_read_lock_bh();
  4082	
  4083		skb_update_prio(skb);
  4084	
  4085		qdisc_pkt_len_init(skb);
  4086	#ifdef CONFIG_NET_CLS_ACT
  4087		mtu_check = skb_is_redirected(skb);
  4088		skb->tc_at_ingress = 0;
  4089	# ifdef CONFIG_NET_EGRESS
  4090		if (static_branch_unlikely(&egress_needed_key)) {
  4091			unsigned int len_orig = skb->len;
  4092	
  4093			skb = sch_handle_egress(skb, &rc, dev);
  4094			if (!skb)
  4095				goto out;
  4096			/* BPF-prog ran and could have changed packet size beyond MTU */
  4097			if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
  4098				mtu_check = true;
  4099		}
  4100	# endif
  4101		/* MTU-check only happens on "last" net_device in a redirect sequence
  4102		 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
  4103		 * either ingress or egress to another device).
  4104		 */
  4105		if (mtu_check && !is_skb_forwardable(dev, skb)) {
  4106			rc = -EMSGSIZE;
  4107			goto drop;
  4108		}
  4109	#endif
  4110		/* If device/qdisc don't need skb->dst, release it right now while
  4111		 * its hot in this cpu cache.
  4112		 */
  4113		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
  4114			skb_dst_drop(skb);
  4115		else
  4116			skb_dst_force(skb);
  4117	
  4118		txq = netdev_core_pick_tx(dev, skb, sb_dev);
  4119		q = rcu_dereference_bh(txq->qdisc);
  4120	
  4121		trace_net_dev_queue(skb);
  4122		if (q->enqueue) {
  4123			rc = __dev_xmit_skb(skb, q, dev, txq);
  4124			goto out;
  4125		}
  4126	
  4127		/* The device has no queue. Common case for software devices:
  4128		 * loopback, all the sorts of tunnels...
  4129	
  4130		 * Really, it is unlikely that netif_tx_lock protection is necessary
  4131		 * here.  (f.e. loopback and IP tunnels are clean ignoring statistics
  4132		 * counters.)
  4133		 * However, it is possible, that they rely on protection
  4134		 * made by us here.
  4135	
  4136		 * Check this and shot the lock. It is not prone from deadlocks.
  4137		 *Either shot noqueue qdisc, it is even simpler 8)
  4138		 */
  4139		if (dev->flags & IFF_UP) {
  4140			int cpu = smp_processor_id(); /* ok because BHs are off */
  4141	
  4142			if (txq->xmit_lock_owner != cpu) {
  4143				if (dev_xmit_recursion())
  4144					goto recursion_alert;
  4145	
  4146				skb = validate_xmit_skb(skb, dev, &again);
  4147				if (!skb)
  4148					goto out;
  4149	
  4150				HARD_TX_LOCK(dev, txq, cpu);
  4151	
  4152				if (!netif_xmit_stopped(txq)) {
  4153					dev_xmit_recursion_inc();
  4154					skb = dev_hard_start_xmit(skb, dev, txq, &rc);
  4155					dev_xmit_recursion_dec();
  4156					if (dev_xmit_complete(rc)) {
  4157						HARD_TX_UNLOCK(dev, txq);
  4158						goto out;
  4159					}
  4160				}
  4161				HARD_TX_UNLOCK(dev, txq);
  4162				net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
  4163						     dev->name);
  4164			} else {
  4165				/* Recursion is detected! It is possible,
  4166				 * unfortunately
  4167				 */
  4168	recursion_alert:
  4169				net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
  4170						     dev->name);
  4171			}
  4172		}
  4173	
  4174		rc = -ENETDOWN;
  4175		rcu_read_unlock_bh();
  4176	drop:
  4177		atomic_long_inc(&dev->tx_dropped);
  4178		kfree_skb_list(skb);
  4179		return rc;
  4180	out:
  4181		rcu_read_unlock_bh();
  4182		return rc;
  4183	}
  4184	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30698 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex
  2020-10-06 16:33   ` Jesper Dangaard Brouer
@ 2020-10-07  1:18     ` Jakub Kicinski
  2020-10-07  1:24       ` Maciej Żenczykowski
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2020-10-07  1:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, netdev, Daniel Borkmann, Alexei Starovoitov, maze, lmb,
	shaun, Lorenzo Bianconi, marek, John Fastabend

On Tue, 6 Oct 2020 18:33:02 +0200 Jesper Dangaard Brouer wrote:
> > +static const struct bpf_func_proto bpf_xdp_mtu_lookup_proto = {
> > +	.func		= bpf_xdp_mtu_lookup,
> > +	.gpl_only	= true,
> > +	.ret_type	= RET_INTEGER,
> > +	.arg1_type      = ARG_PTR_TO_CTX,
> > +	.arg2_type      = ARG_ANYTHING,
> > +	.arg3_type      = ARG_ANYTHING,
> > +};
> > +
> > +

FWIW

CHECK: Please don't use multiple blank lines
#112: FILE: net/core/filter.c:5566:
+
+

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex
  2020-10-07  1:18     ` Jakub Kicinski
@ 2020-10-07  1:24       ` Maciej Żenczykowski
  2020-10-07  7:53         ` Jesper Dangaard Brouer
  2020-10-07 16:35         ` David Ahern
  0 siblings, 2 replies; 19+ messages in thread
From: Maciej Żenczykowski @ 2020-10-07  1:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, bpf, Linux NetDev, Daniel Borkmann,
	Alexei Starovoitov, Lorenz Bauer, Shaun Crampton,
	Lorenzo Bianconi, Marek Majkowski, John Fastabend

On Tue, Oct 6, 2020 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 6 Oct 2020 18:33:02 +0200 Jesper Dangaard Brouer wrote:
> > > +static const struct bpf_func_proto bpf_xdp_mtu_lookup_proto = {
> > > +   .func           = bpf_xdp_mtu_lookup,
> > > +   .gpl_only       = true,
> > > +   .ret_type       = RET_INTEGER,
> > > +   .arg1_type      = ARG_PTR_TO_CTX,
> > > +   .arg2_type      = ARG_ANYTHING,
> > > +   .arg3_type      = ARG_ANYTHING,
> > > +};
> > > +
> > > +
>
> FWIW
>
> CHECK: Please don't use multiple blank lines
> #112: FILE: net/core/filter.c:5566:

FYI: It would be nice to have a similar function to return a device's
L2 header size (ie. 14 for ethernet) and/or hwtype.

Also, should this be restricted to gpl only?

[I'm not actually sure, I'm actually fed up with non-gpl code atm, and
wouldn't be against all bpf code needing to be gpl'ed...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up
  2020-10-06 16:02 ` [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
@ 2020-10-07  1:34   ` Maciej Żenczykowski
  2020-10-07  7:42     ` Jesper Dangaard Brouer
  2020-10-07  7:28   ` kernel test robot
  1 sibling, 1 reply; 19+ messages in thread
From: Maciej Żenczykowski @ 2020-10-07  1:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, Linux NetDev, Daniel Borkmann, Alexei Starovoitov,
	Lorenz Bauer, Shaun Crampton, Lorenzo Bianconi, Marek Majkowski,
	John Fastabend, Jakub Kicinski

On Tue, Oct 6, 2020 at 9:03 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
> can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED.  The BPF-prog
> don't know the MTU value that caused this rejection.
>
> If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
> need to know this MTU value for the ICMP packet.
>
> Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
> value as output via a union with 'tot_len' as this is the value used for
> the MTU lookup.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/uapi/linux/bpf.h |   11 +++++++++--
>  net/core/filter.c        |   17 ++++++++++++-----
>  2 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c446394135be..50ce65e37b16 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2216,6 +2216,9 @@ union bpf_attr {
>   *             * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
>   *               packet is not forwarded or needs assist from full stack
>   *
> + *             If lookup fails with BPF_FIB_LKUP_RET_FRAG_NEEDED, then the MTU
> + *             was exceeded and result params->mtu contains the MTU.
> + *
>   * long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
>   *     Description
>   *             Add an entry to, or update a sockhash *map* referencing sockets.
> @@ -4844,9 +4847,13 @@ struct bpf_fib_lookup {
>         __be16  sport;
>         __be16  dport;
>
> -       /* total length of packet from network header - used for MTU check */
> -       __u16   tot_len;
> +       union { /* used for MTU check */
> +               /* input to lookup */
> +               __u16   tot_len; /* total length of packet from network hdr */
>
> +               /* output: MTU value (if requested check_mtu) */
> +               __u16   mtu;
> +       };
>         /* input: L3 device index for lookup
>          * output: device index from FIB lookup
>          */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index fed239e77bdc..d84723f347c0 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5185,13 +5185,14 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
>  #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
>  static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
>                                   const struct neighbour *neigh,
> -                                 const struct net_device *dev)
> +                                 const struct net_device *dev, u32 mtu)
>  {
>         memcpy(params->dmac, neigh->ha, ETH_ALEN);
>         memcpy(params->smac, dev->dev_addr, ETH_ALEN);
>         params->h_vlan_TCI = 0;
>         params->h_vlan_proto = 0;
>         params->ifindex = dev->ifindex;
> +       params->mtu = mtu;
>
>         return 0;
>  }
> @@ -5275,8 +5276,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>
>         if (check_mtu) {
>                 mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
> -               if (params->tot_len > mtu)
> +               if (params->tot_len > mtu) {
> +                       params->mtu = mtu; /* union with tot_len */
>                         return BPF_FIB_LKUP_RET_FRAG_NEEDED;
> +               }
>         }
>
>         nhc = res.nhc;
> @@ -5309,7 +5312,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>         if (!neigh)
>                 return BPF_FIB_LKUP_RET_NO_NEIGH;
>
> -       return bpf_fib_set_fwd_params(params, neigh, dev);
> +       return bpf_fib_set_fwd_params(params, neigh, dev, mtu);
>  }
>  #endif
>
> @@ -5401,8 +5404,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>
>         if (check_mtu) {
>                 mtu = ipv6_stub->ip6_mtu_from_fib6(&res, dst, src);
> -               if (params->tot_len > mtu)
> +               if (params->tot_len > mtu) {
> +                       params->mtu = mtu; /* union with tot_len */
>                         return BPF_FIB_LKUP_RET_FRAG_NEEDED;
> +               }
>         }
>
>         if (res.nh->fib_nh_lws)
> @@ -5421,7 +5426,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>         if (!neigh)
>                 return BPF_FIB_LKUP_RET_NO_NEIGH;
>
> -       return bpf_fib_set_fwd_params(params, neigh, dev);
> +       return bpf_fib_set_fwd_params(params, neigh, dev, mtu);
>  }
>  #endif
>
> @@ -5490,6 +5495,8 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
>                 dev = dev_get_by_index_rcu(net, params->ifindex);
>                 if (!is_skb_forwardable(dev, skb))
>                         rc = BPF_FIB_LKUP_RET_FRAG_NEEDED;
> +
> +               params->mtu = dev->mtu; /* union with tot_len */
>         }
>
>         return rc;
>
>

It would be beneficial to be able to fetch the route advmss, initcwnd,
etc as well...
But I take it the struct can't be extended?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up
  2020-10-06 16:02 ` [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
  2020-10-07  1:34   ` Maciej Żenczykowski
@ 2020-10-07  7:28   ` kernel test robot
  1 sibling, 0 replies; 19+ messages in thread
From: kernel test robot @ 2020-10-07  7:28 UTC (permalink / raw)
  To: kbuild, Jesper Dangaard Brouer, bpf
  Cc: lkp, Dan Carpenter, kbuild-all, Jesper Dangaard Brouer, netdev,
	Daniel Borkmann, Alexei Starovoitov, maze, lmb, shaun,
	Lorenzo Bianconi, marek

[-- Attachment #1: Type: text/plain, Size: 10264 bytes --]

Hi Jesper,

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: s390-randconfig-m031-20201002 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
net/core/filter.c:5315 bpf_ipv4_fib_lookup() error: uninitialized symbol 'mtu'.

vim +/mtu +5315 net/core/filter.c

87f5fc7e48dd31 David Ahern            2018-05-09  5202  static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
4f74fede40df8d David Ahern            2018-05-21  5203  			       u32 flags, bool check_mtu)
87f5fc7e48dd31 David Ahern            2018-05-09  5204  {
eba618abacade7 David Ahern            2019-04-02  5205  	struct fib_nh_common *nhc;
87f5fc7e48dd31 David Ahern            2018-05-09  5206  	struct in_device *in_dev;
87f5fc7e48dd31 David Ahern            2018-05-09  5207  	struct neighbour *neigh;
87f5fc7e48dd31 David Ahern            2018-05-09  5208  	struct net_device *dev;
87f5fc7e48dd31 David Ahern            2018-05-09  5209  	struct fib_result res;
87f5fc7e48dd31 David Ahern            2018-05-09  5210  	struct flowi4 fl4;
87f5fc7e48dd31 David Ahern            2018-05-09  5211  	int err;
4f74fede40df8d David Ahern            2018-05-21  5212  	u32 mtu;
87f5fc7e48dd31 David Ahern            2018-05-09  5213  
87f5fc7e48dd31 David Ahern            2018-05-09  5214  	dev = dev_get_by_index_rcu(net, params->ifindex);
87f5fc7e48dd31 David Ahern            2018-05-09  5215  	if (unlikely(!dev))
87f5fc7e48dd31 David Ahern            2018-05-09  5216  		return -ENODEV;
87f5fc7e48dd31 David Ahern            2018-05-09  5217  
87f5fc7e48dd31 David Ahern            2018-05-09  5218  	/* verify forwarding is enabled on this interface */
87f5fc7e48dd31 David Ahern            2018-05-09  5219  	in_dev = __in_dev_get_rcu(dev);
87f5fc7e48dd31 David Ahern            2018-05-09  5220  	if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
4c79579b44b187 David Ahern            2018-06-26  5221  		return BPF_FIB_LKUP_RET_FWD_DISABLED;
87f5fc7e48dd31 David Ahern            2018-05-09  5222  
87f5fc7e48dd31 David Ahern            2018-05-09  5223  	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
87f5fc7e48dd31 David Ahern            2018-05-09  5224  		fl4.flowi4_iif = 1;
87f5fc7e48dd31 David Ahern            2018-05-09  5225  		fl4.flowi4_oif = params->ifindex;
87f5fc7e48dd31 David Ahern            2018-05-09  5226  	} else {
87f5fc7e48dd31 David Ahern            2018-05-09  5227  		fl4.flowi4_iif = params->ifindex;
87f5fc7e48dd31 David Ahern            2018-05-09  5228  		fl4.flowi4_oif = 0;
87f5fc7e48dd31 David Ahern            2018-05-09  5229  	}
87f5fc7e48dd31 David Ahern            2018-05-09  5230  	fl4.flowi4_tos = params->tos & IPTOS_RT_MASK;
87f5fc7e48dd31 David Ahern            2018-05-09  5231  	fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
87f5fc7e48dd31 David Ahern            2018-05-09  5232  	fl4.flowi4_flags = 0;
87f5fc7e48dd31 David Ahern            2018-05-09  5233  
87f5fc7e48dd31 David Ahern            2018-05-09  5234  	fl4.flowi4_proto = params->l4_protocol;
87f5fc7e48dd31 David Ahern            2018-05-09  5235  	fl4.daddr = params->ipv4_dst;
87f5fc7e48dd31 David Ahern            2018-05-09  5236  	fl4.saddr = params->ipv4_src;
87f5fc7e48dd31 David Ahern            2018-05-09  5237  	fl4.fl4_sport = params->sport;
87f5fc7e48dd31 David Ahern            2018-05-09  5238  	fl4.fl4_dport = params->dport;
1869e226a7b3ef David Ahern            2020-09-13  5239  	fl4.flowi4_multipath_hash = 0;
87f5fc7e48dd31 David Ahern            2018-05-09  5240  
87f5fc7e48dd31 David Ahern            2018-05-09  5241  	if (flags & BPF_FIB_LOOKUP_DIRECT) {
87f5fc7e48dd31 David Ahern            2018-05-09  5242  		u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN;
87f5fc7e48dd31 David Ahern            2018-05-09  5243  		struct fib_table *tb;
87f5fc7e48dd31 David Ahern            2018-05-09  5244  
87f5fc7e48dd31 David Ahern            2018-05-09  5245  		tb = fib_get_table(net, tbid);
87f5fc7e48dd31 David Ahern            2018-05-09  5246  		if (unlikely(!tb))
4c79579b44b187 David Ahern            2018-06-26  5247  			return BPF_FIB_LKUP_RET_NOT_FWDED;
87f5fc7e48dd31 David Ahern            2018-05-09  5248  
87f5fc7e48dd31 David Ahern            2018-05-09  5249  		err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
87f5fc7e48dd31 David Ahern            2018-05-09  5250  	} else {
87f5fc7e48dd31 David Ahern            2018-05-09  5251  		fl4.flowi4_mark = 0;
87f5fc7e48dd31 David Ahern            2018-05-09  5252  		fl4.flowi4_secid = 0;
87f5fc7e48dd31 David Ahern            2018-05-09  5253  		fl4.flowi4_tun_key.tun_id = 0;
87f5fc7e48dd31 David Ahern            2018-05-09  5254  		fl4.flowi4_uid = sock_net_uid(net, NULL);
87f5fc7e48dd31 David Ahern            2018-05-09  5255  
87f5fc7e48dd31 David Ahern            2018-05-09  5256  		err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
87f5fc7e48dd31 David Ahern            2018-05-09  5257  	}
87f5fc7e48dd31 David Ahern            2018-05-09  5258  
4c79579b44b187 David Ahern            2018-06-26  5259  	if (err) {
4c79579b44b187 David Ahern            2018-06-26  5260  		/* map fib lookup errors to RTN_ type */
4c79579b44b187 David Ahern            2018-06-26  5261  		if (err == -EINVAL)
4c79579b44b187 David Ahern            2018-06-26  5262  			return BPF_FIB_LKUP_RET_BLACKHOLE;
4c79579b44b187 David Ahern            2018-06-26  5263  		if (err == -EHOSTUNREACH)
4c79579b44b187 David Ahern            2018-06-26  5264  			return BPF_FIB_LKUP_RET_UNREACHABLE;
4c79579b44b187 David Ahern            2018-06-26  5265  		if (err == -EACCES)
4c79579b44b187 David Ahern            2018-06-26  5266  			return BPF_FIB_LKUP_RET_PROHIBIT;
4c79579b44b187 David Ahern            2018-06-26  5267  
4c79579b44b187 David Ahern            2018-06-26  5268  		return BPF_FIB_LKUP_RET_NOT_FWDED;
4c79579b44b187 David Ahern            2018-06-26  5269  	}
4c79579b44b187 David Ahern            2018-06-26  5270  
4c79579b44b187 David Ahern            2018-06-26  5271  	if (res.type != RTN_UNICAST)
4c79579b44b187 David Ahern            2018-06-26  5272  		return BPF_FIB_LKUP_RET_NOT_FWDED;
87f5fc7e48dd31 David Ahern            2018-05-09  5273  
5481d73f81549e David Ahern            2019-06-03  5274  	if (fib_info_num_path(res.fi) > 1)
87f5fc7e48dd31 David Ahern            2018-05-09  5275  		fib_select_path(net, &res, &fl4, NULL);
87f5fc7e48dd31 David Ahern            2018-05-09  5276  
4f74fede40df8d David Ahern            2018-05-21  5277  	if (check_mtu) {
4f74fede40df8d David Ahern            2018-05-21  5278  		mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
ab61fc7ee5c482 Jesper Dangaard Brouer 2020-10-06  5279  		if (params->tot_len > mtu) {
ab61fc7ee5c482 Jesper Dangaard Brouer 2020-10-06  5280  			params->mtu = mtu; /* union with tot_len */
4c79579b44b187 David Ahern            2018-06-26  5281  			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
4f74fede40df8d David Ahern            2018-05-21  5282  		}
ab61fc7ee5c482 Jesper Dangaard Brouer 2020-10-06  5283  	}

"mtu" is not initialized on else path.

4f74fede40df8d David Ahern            2018-05-21  5284  
eba618abacade7 David Ahern            2019-04-02  5285  	nhc = res.nhc;
87f5fc7e48dd31 David Ahern            2018-05-09  5286  
87f5fc7e48dd31 David Ahern            2018-05-09  5287  	/* do not handle lwt encaps right now */
eba618abacade7 David Ahern            2019-04-02  5288  	if (nhc->nhc_lwtstate)
4c79579b44b187 David Ahern            2018-06-26  5289  		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
87f5fc7e48dd31 David Ahern            2018-05-09  5290  
eba618abacade7 David Ahern            2019-04-02  5291  	dev = nhc->nhc_dev;
87f5fc7e48dd31 David Ahern            2018-05-09  5292  
87f5fc7e48dd31 David Ahern            2018-05-09  5293  	params->rt_metric = res.fi->fib_priority;
87f5fc7e48dd31 David Ahern            2018-05-09  5294  
87f5fc7e48dd31 David Ahern            2018-05-09  5295  	/* xdp and cls_bpf programs are run in RCU-bh so
87f5fc7e48dd31 David Ahern            2018-05-09  5296  	 * rcu_read_lock_bh is not needed here
87f5fc7e48dd31 David Ahern            2018-05-09  5297  	 */
6f5f68d05ec0f6 David Ahern            2019-04-05  5298  	if (likely(nhc->nhc_gw_family != AF_INET6)) {
6f5f68d05ec0f6 David Ahern            2019-04-05  5299  		if (nhc->nhc_gw_family)
6f5f68d05ec0f6 David Ahern            2019-04-05  5300  			params->ipv4_dst = nhc->nhc_gw.ipv4;
6f5f68d05ec0f6 David Ahern            2019-04-05  5301  
6f5f68d05ec0f6 David Ahern            2019-04-05  5302  		neigh = __ipv4_neigh_lookup_noref(dev,
6f5f68d05ec0f6 David Ahern            2019-04-05  5303  						 (__force u32)params->ipv4_dst);
6f5f68d05ec0f6 David Ahern            2019-04-05  5304  	} else {
6f5f68d05ec0f6 David Ahern            2019-04-05  5305  		struct in6_addr *dst = (struct in6_addr *)params->ipv6_dst;
6f5f68d05ec0f6 David Ahern            2019-04-05  5306  
6f5f68d05ec0f6 David Ahern            2019-04-05  5307  		params->family = AF_INET6;
6f5f68d05ec0f6 David Ahern            2019-04-05  5308  		*dst = nhc->nhc_gw.ipv6;
6f5f68d05ec0f6 David Ahern            2019-04-05  5309  		neigh = __ipv6_neigh_lookup_noref_stub(dev, dst);
6f5f68d05ec0f6 David Ahern            2019-04-05  5310  	}
6f5f68d05ec0f6 David Ahern            2019-04-05  5311  
4c79579b44b187 David Ahern            2018-06-26  5312  	if (!neigh)
4c79579b44b187 David Ahern            2018-06-26  5313  		return BPF_FIB_LKUP_RET_NO_NEIGH;
87f5fc7e48dd31 David Ahern            2018-05-09  5314  
ab61fc7ee5c482 Jesper Dangaard Brouer 2020-10-06 @5315  	return bpf_fib_set_fwd_params(params, neigh, dev, mtu);
                                                                                                                  ^^^
Uninitialized.

87f5fc7e48dd31 David Ahern            2018-05-09  5316  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 20726 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up
  2020-10-07  1:34   ` Maciej Żenczykowski
@ 2020-10-07  7:42     ` Jesper Dangaard Brouer
  2020-10-07 16:38       ` David Ahern
  0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-07  7:42 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: bpf, Linux NetDev, Daniel Borkmann, Alexei Starovoitov,
	Lorenz Bauer, Shaun Crampton, Lorenzo Bianconi, Marek Majkowski,
	John Fastabend, Jakub Kicinski, brouer, David Ahern

On Tue, 6 Oct 2020 18:34:50 -0700
Maciej Żenczykowski <maze@google.com> wrote:

> On Tue, Oct 6, 2020 at 9:03 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
> > can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED.  The BPF-prog
> > don't know the MTU value that caused this rejection.
> >
> > If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
> > need to know this MTU value for the ICMP packet.
> >
> > Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
> > value as output via a union with 'tot_len' as this is the value used for
> > the MTU lookup.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  include/uapi/linux/bpf.h |   11 +++++++++--
> >  net/core/filter.c        |   17 ++++++++++++-----
> >  2 files changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index c446394135be..50ce65e37b16 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
[...]
> > @@ -4844,9 +4847,13 @@ struct bpf_fib_lookup {
> >         __be16  sport;
> >         __be16  dport;
> >
> > -       /* total length of packet from network header - used for MTU check */
> > -       __u16   tot_len;
> > +       union { /* used for MTU check */
> > +               /* input to lookup */
> > +               __u16   tot_len; /* total length of packet from network hdr */
> >
> > +               /* output: MTU value (if requested check_mtu) */
> > +               __u16   mtu;
> > +       };
> >         /* input: L3 device index for lookup
> >          * output: device index from FIB lookup
> >          */
[...]
> 
> It would be beneficial to be able to fetch the route advmss, initcwnd,
> etc as well...
> But I take it the struct can't be extended?

The struct bpf_fib_lookup is exactly 1 cache-line (64 bytes) for
performance reasons.  I do believe that it can be extended, as Ahern
designed the BPF-helper API cleverly via a plen (detail below signature).

For accessing other route metric information like advmss and initcwnd,
I would expect Daniel to suggest to use BTF to access info from
dst_entry, or actually dst->_metrics. But looking at the details for
accessing dst->_metrics is complicated by macros, thus I expect BTF
would have a hard time.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags)

$ pahole -C bpf_fib_lookup
struct bpf_fib_lookup {
	__u8                       family;               /*     0     1 */
	__u8                       l4_protocol;          /*     1     1 */
	__be16                     sport;                /*     2     2 */
	__be16                     dport;                /*     4     2 */
	union {
		__u16              tot_len;              /*     6     2 */
		__u16              mtu;                  /*     6     2 */
	};                                               /*     6     2 */
	__u32                      ifindex;              /*     8     4 */
	union {
		__u8               tos;                  /*    12     1 */
		__be32             flowinfo;             /*    12     4 */
		__u32              rt_metric;            /*    12     4 */
	};                                               /*    12     4 */
	union {
		__be32             ipv4_src;             /*    16     4 */
		__u32              ipv6_src[4];          /*    16    16 */
	};                                               /*    16    16 */
	union {
		__be32             ipv4_dst;             /*    32     4 */
		__u32              ipv6_dst[4];          /*    32    16 */
	};                                               /*    32    16 */
	__be16                     h_vlan_proto;         /*    48     2 */
	__be16                     h_vlan_TCI;           /*    50     2 */
	__u8                       smac[6];              /*    52     6 */
	__u8                       dmac[6];              /*    58     6 */

	/* size: 64, cachelines: 1, members: 13 */
};


struct dst_metrics {
	u32		metrics[RTAX_MAX];
	refcount_t	refcnt;
} __aligned(4);		/* Low pointer bits contain DST_METRICS_FLAGS */

extern const struct dst_metrics dst_default_metrics;

#define DST_METRICS_READ_ONLY		0x1UL
#define DST_METRICS_REFCOUNTED		0x2UL
#define DST_METRICS_FLAGS		0x3UL
#define __DST_METRICS_PTR(Y)	\
	((u32 *)((Y) & ~DST_METRICS_FLAGS))
#define DST_METRICS_PTR(X)	__DST_METRICS_PTR((X)->_metrics)


static inline u32
dst_metric_raw(const struct dst_entry *dst, const int metric)
{
	u32 *p = DST_METRICS_PTR(dst);

	return p[metric-1];
}

static inline u32
dst_metric_advmss(const struct dst_entry *dst)
{
	u32 advmss = dst_metric_raw(dst, RTAX_ADVMSS);

	if (!advmss)
		advmss = dst->ops->default_advmss(dst);

	return advmss;
}


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex
  2020-10-07  1:24       ` Maciej Żenczykowski
@ 2020-10-07  7:53         ` Jesper Dangaard Brouer
  2020-10-07 16:35         ` David Ahern
  1 sibling, 0 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-07  7:53 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Jakub Kicinski, bpf, Linux NetDev, Daniel Borkmann,
	Alexei Starovoitov, Lorenz Bauer, Shaun Crampton,
	Lorenzo Bianconi, Marek Majkowski, John Fastabend, brouer

On Tue, 6 Oct 2020 18:24:28 -0700
Maciej Żenczykowski <maze@google.com> wrote:

> On Tue, Oct 6, 2020 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 6 Oct 2020 18:33:02 +0200 Jesper Dangaard Brouer wrote:  
> > > > +static const struct bpf_func_proto bpf_xdp_mtu_lookup_proto = {
> > > > +   .func           = bpf_xdp_mtu_lookup,
> > > > +   .gpl_only       = true,
> > > > +   .ret_type       = RET_INTEGER,
> > > > +   .arg1_type      = ARG_PTR_TO_CTX,
> > > > +   .arg2_type      = ARG_ANYTHING,
> > > > +   .arg3_type      = ARG_ANYTHING,
> > > > +};
> > > > +
> > > > +  
> >
> > FWIW
> >
> > CHECK: Please don't use multiple blank lines
> > #112: FILE: net/core/filter.c:5566:  
> 
> FYI: It would be nice to have a similar function to return a device's
> L2 header size (ie. 14 for ethernet) and/or hwtype.
> 
> Also, should this be restricted to gpl only?

That is mostly because I copy-pasted it from the fib lookup helper,
which with good reason is GPL.  I would prefer it to be GPL, but given
how simple it is I shouldn't.  Maybe I could argue that my new mtu_check
could be GPL as it does more work.

> [I'm not actually sure, I'm actually fed up with non-gpl code atm, and
> wouldn't be against all bpf code needing to be gpl'ed...]

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex
  2020-10-07  1:24       ` Maciej Żenczykowski
  2020-10-07  7:53         ` Jesper Dangaard Brouer
@ 2020-10-07 16:35         ` David Ahern
  2020-10-07 17:44           ` Maciej Żenczykowski
  1 sibling, 1 reply; 19+ messages in thread
From: David Ahern @ 2020-10-07 16:35 UTC (permalink / raw)
  To: Maciej Żenczykowski, Jakub Kicinski
  Cc: Jesper Dangaard Brouer, bpf, Linux NetDev, Daniel Borkmann,
	Alexei Starovoitov, Lorenz Bauer, Shaun Crampton,
	Lorenzo Bianconi, Marek Majkowski, John Fastabend

On 10/6/20 6:24 PM, Maciej Żenczykowski wrote:
> 
> FYI: It would be nice to have a similar function to return a device's
> L2 header size (ie. 14 for ethernet) and/or hwtype.

Why does that need to be looked up via a helper? It's a static number
for a device and can plumbed to a program in a number of ways.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up
  2020-10-07  7:42     ` Jesper Dangaard Brouer
@ 2020-10-07 16:38       ` David Ahern
  0 siblings, 0 replies; 19+ messages in thread
From: David Ahern @ 2020-10-07 16:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Maciej Żenczykowski
  Cc: bpf, Linux NetDev, Daniel Borkmann, Alexei Starovoitov,
	Lorenz Bauer, Shaun Crampton, Lorenzo Bianconi, Marek Majkowski,
	John Fastabend, Jakub Kicinski, David Ahern

On 10/7/20 12:42 AM, Jesper Dangaard Brouer wrote:
> 
> The struct bpf_fib_lookup is exactly 1 cache-line (64 bytes) for
> performance reasons.  I do believe that it can be extended, as Ahern
> designed the BPF-helper API cleverly via a plen (detail below signature).

Yes, I kept it to 64B for performance reasons which is why most fields
have 1 value on input and another on output.

Technically it can be extended, but any cost in doing so should be
abosrbed by the new feature(s). Meaning, users just doing a fib lookup
based on current API should not take a hit with the extra size.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex
  2020-10-07 16:35         ` David Ahern
@ 2020-10-07 17:44           ` Maciej Żenczykowski
  0 siblings, 0 replies; 19+ messages in thread
From: Maciej Żenczykowski @ 2020-10-07 17:44 UTC (permalink / raw)
  To: David Ahern
  Cc: Jakub Kicinski, Jesper Dangaard Brouer, bpf, Linux NetDev,
	Daniel Borkmann, Alexei Starovoitov, Lorenz Bauer,
	Shaun Crampton, Lorenzo Bianconi, Marek Majkowski,
	John Fastabend

> > FYI: It would be nice to have a similar function to return a device's
> > L2 header size (ie. 14 for ethernet) and/or hwtype.
>
> Why does that need to be looked up via a helper? It's a static number
> for a device and can plumbed to a program in a number of ways.

Fair enough.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-10-07 17:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
2020-10-06 16:02 ` [PATCH bpf-next V1 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2020-10-06 16:02 ` [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2020-10-07  1:34   ` Maciej Żenczykowski
2020-10-07  7:42     ` Jesper Dangaard Brouer
2020-10-07 16:38       ` David Ahern
2020-10-07  7:28   ` kernel test robot
2020-10-06 16:03 ` [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex Jesper Dangaard Brouer
2020-10-06 16:33   ` Jesper Dangaard Brouer
2020-10-07  1:18     ` Jakub Kicinski
2020-10-07  1:24       ` Maciej Żenczykowski
2020-10-07  7:53         ` Jesper Dangaard Brouer
2020-10-07 16:35         ` David Ahern
2020-10-07 17:44           ` Maciej Żenczykowski
2020-10-06 16:03 ` [PATCH bpf-next V1 4/6] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
2020-10-06 16:03 ` [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
2020-10-06 20:09   ` kernel test robot
2020-10-07  0:26   ` kernel test robot
2020-10-06 16:03 ` [PATCH bpf-next V1 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).