All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement
@ 2020-10-06 16:02 Jesper Dangaard Brouer
  2020-10-06 16:02 ` [PATCH bpf-next V1 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Jesper Dangaard Brouer @ 2020-10-06 16:02 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, maze, lmb, shaun, Lorenzo Bianconi, marek,
	John Fastabend, Jakub Kicinski

This patchset drops all the MTU checks in TC BPF-helpers that limits
growing the packet size. This is done because these BPF-helpers doesn't
take redirect into account, which can result in their MTU check being done
against the wrong netdev.

The new approach is to give BPF-programs knowledge about the MTU on a
netdev (via ifindex) and fib route lookup level. Meaning some BPF-helpers
are added and extended to make it possible to do MTU checks in the
BPF-code. If BPF-prog doesn't comply with the MTU this is enforced on the
kernel side.

Realizing MTU should only apply to transmitted packets, the MTU
enforcement is now done after the TC egress hook. This gives TC-BPF
programs most flexibility and allows to shrink packet size again in egress
hook prior to transmit.

This patchset is primarily focused on TC-BPF, but I've made sure that the
MTU BPF-helpers also works for XDP BPF-programs.

---

Jesper Dangaard Brouer (6):
      bpf: Remove MTU check in __bpf_skb_max_len
      bpf: bpf_fib_lookup return MTU value as output when looked up
      bpf: add BPF-helper for reading MTU from net_device via ifindex
      bpf: make it possible to identify BPF redirected SKBs
      bpf: Add MTU check for TC-BPF packets after egress hook
      bpf: drop MTU check when doing TC-BPF redirect to ingress


 include/linux/netdevice.h |    5 ++-
 include/uapi/linux/bpf.h  |   24 +++++++++++-
 net/core/dev.c            |   24 +++++++++++-
 net/core/filter.c         |   88 ++++++++++++++++++++++++++++++++++++++++-----
 net/sched/Kconfig         |    1 +
 5 files changed, 126 insertions(+), 16 deletions(-)

--


^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
@ 2020-10-06 18:26 kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-10-06 18:26 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 15598 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <160200019184.719143.17780588544420986957.stgit@firesoul>
References: <160200019184.719143.17780588544420986957.stgit@firesoul>
TO: Jesper Dangaard Brouer <brouer@redhat.com>
TO: bpf(a)vger.kernel.org
CC: Jesper Dangaard Brouer <brouer@redhat.com>
CC: netdev(a)vger.kernel.org
CC: Daniel Borkmann <borkmann@iogearbox.net>
CC: Alexei Starovoitov <alexei.starovoitov@gmail.com>
CC: maze(a)google.com
CC: lmb(a)cloudflare.com
CC: shaun(a)tigera.io
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: marek(a)cloudflare.com

Hi Jesper,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
:::::: branch date: 2 hours ago
:::::: commit date: 2 hours ago
config: x86_64-randconfig-s021-20201006 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-201-g24bdaac6-dirty
        # https://github.com/0day-ci/linux/commit/2065cee7d6b74c8f1dabae4e4e15999a841e3349
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jesper-Dangaard-Brouer/bpf-New-approach-for-BPF-MTU-handling-and-enforcement/20201007-000903
        git checkout 2065cee7d6b74c8f1dabae4e4e15999a841e3349
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

	echo
	echo "sparse warnings: (new ones prefixed by >>)"
	echo
>> net/core/dev.c:4176:1: sparse: sparse: unused label 'drop'
   net/core/dev.c:3271:23: sparse: sparse: incorrect type in argument 4 (different base types) @@     expected restricted __wsum [usertype] csum @@     got unsigned int @@
   net/core/dev.c:3271:23: sparse:     expected restricted __wsum [usertype] csum
   net/core/dev.c:3271:23: sparse:     got unsigned int
   net/core/dev.c:3271:23: sparse: sparse: cast from restricted __wsum
   net/core/dev.c:3753:26: sparse: sparse: context imbalance in '__dev_queue_xmit' - different lock contexts for basic block
   net/core/dev.c:4934:44: sparse: sparse: context imbalance in 'net_tx_action' - unexpected unlock

vim +/drop +4176 net/core/dev.c

638b2a699fd3ec9 Jiri Pirko             2015-05-12  4037  
d29f749e252bcdb Dave Jones             2008-07-22  4038  /**
9d08dd3d320fab4 Jason Wang             2014-01-20  4039   *	__dev_queue_xmit - transmit a buffer
d29f749e252bcdb Dave Jones             2008-07-22  4040   *	@skb: buffer to transmit
eadec877ce9ca46 Alexander Duyck        2018-07-09  4041   *	@sb_dev: suboordinate device used for L2 forwarding offload
d29f749e252bcdb Dave Jones             2008-07-22  4042   *
d29f749e252bcdb Dave Jones             2008-07-22  4043   *	Queue a buffer for transmission to a network device. The caller must
d29f749e252bcdb Dave Jones             2008-07-22  4044   *	have set the device and priority and built the buffer before calling
d29f749e252bcdb Dave Jones             2008-07-22  4045   *	this function. The function can be called from an interrupt.
d29f749e252bcdb Dave Jones             2008-07-22  4046   *
d29f749e252bcdb Dave Jones             2008-07-22  4047   *	A negative errno code is returned on a failure. A success does not
d29f749e252bcdb Dave Jones             2008-07-22  4048   *	guarantee the frame will be transmitted as it may be dropped due
d29f749e252bcdb Dave Jones             2008-07-22  4049   *	to congestion or traffic shaping.
d29f749e252bcdb Dave Jones             2008-07-22  4050   *
d29f749e252bcdb Dave Jones             2008-07-22  4051   * -----------------------------------------------------------------------------------
d29f749e252bcdb Dave Jones             2008-07-22  4052   *      I notice this method can also return errors from the queue disciplines,
d29f749e252bcdb Dave Jones             2008-07-22  4053   *      including NET_XMIT_DROP, which is a positive value.  So, errors can also
d29f749e252bcdb Dave Jones             2008-07-22  4054   *      be positive.
d29f749e252bcdb Dave Jones             2008-07-22  4055   *
d29f749e252bcdb Dave Jones             2008-07-22  4056   *      Regardless of the return value, the skb is consumed, so it is currently
d29f749e252bcdb Dave Jones             2008-07-22  4057   *      difficult to retry a send to this method.  (You can bump the ref count
d29f749e252bcdb Dave Jones             2008-07-22  4058   *      before sending to hold a reference for retry if you are careful.)
d29f749e252bcdb Dave Jones             2008-07-22  4059   *
d29f749e252bcdb Dave Jones             2008-07-22  4060   *      When calling this method, interrupts MUST be enabled.  This is because
d29f749e252bcdb Dave Jones             2008-07-22  4061   *      the BH enable code must have IRQs enabled so that it will not deadlock.
d29f749e252bcdb Dave Jones             2008-07-22  4062   *          --BLG
d29f749e252bcdb Dave Jones             2008-07-22  4063   */
eadec877ce9ca46 Alexander Duyck        2018-07-09  4064  static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
^1da177e4c3f415 Linus Torvalds         2005-04-16  4065  {
^1da177e4c3f415 Linus Torvalds         2005-04-16  4066  	struct net_device *dev = skb->dev;
dc2b48475a0a36f David S. Miller        2008-07-08  4067  	struct netdev_queue *txq;
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4068  	bool mtu_check = false;
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4069  	bool again = false;
^1da177e4c3f415 Linus Torvalds         2005-04-16  4070  	struct Qdisc *q;
^1da177e4c3f415 Linus Torvalds         2005-04-16  4071  	int rc = -ENOMEM;
^1da177e4c3f415 Linus Torvalds         2005-04-16  4072  
6d1ccff62780682 Eric Dumazet           2013-02-05  4073  	skb_reset_mac_header(skb);
6d1ccff62780682 Eric Dumazet           2013-02-05  4074  
e7fd2885385157d Willem de Bruijn       2014-08-04  4075  	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
e7fd2885385157d Willem de Bruijn       2014-08-04  4076  		__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED);
e7fd2885385157d Willem de Bruijn       2014-08-04  4077  
^1da177e4c3f415 Linus Torvalds         2005-04-16  4078  	/* Disable soft irqs for various locks below. Also
^1da177e4c3f415 Linus Torvalds         2005-04-16  4079  	 * stops preemption for RCU.
^1da177e4c3f415 Linus Torvalds         2005-04-16  4080  	 */
d4828d85d188dc7 Herbert Xu             2006-06-22  4081  	rcu_read_lock_bh();
^1da177e4c3f415 Linus Torvalds         2005-04-16  4082  
5bc1421e34ecfe0 Neil Horman            2011-11-22  4083  	skb_update_prio(skb);
5bc1421e34ecfe0 Neil Horman            2011-11-22  4084  
1f211a1b929c804 Daniel Borkmann        2016-01-07  4085  	qdisc_pkt_len_init(skb);
1f211a1b929c804 Daniel Borkmann        2016-01-07  4086  #ifdef CONFIG_NET_CLS_ACT
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4087  	mtu_check = skb_is_redirected(skb);
8dc07fdbf2054f1 Willem de Bruijn       2017-01-07  4088  	skb->tc_at_ingress = 0;
1f211a1b929c804 Daniel Borkmann        2016-01-07  4089  # ifdef CONFIG_NET_EGRESS
aabf6772cc745f9 Davidlohr Bueso        2018-05-08  4090  	if (static_branch_unlikely(&egress_needed_key)) {
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4091  		unsigned int len_orig = skb->len;
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4092  
1f211a1b929c804 Daniel Borkmann        2016-01-07  4093  		skb = sch_handle_egress(skb, &rc, dev);
1f211a1b929c804 Daniel Borkmann        2016-01-07  4094  		if (!skb)
1f211a1b929c804 Daniel Borkmann        2016-01-07  4095  			goto out;
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4096  		/* BPF-prog ran and could have changed packet size beyond MTU */
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4097  		if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4098  			mtu_check = true;
1f211a1b929c804 Daniel Borkmann        2016-01-07  4099  	}
357b6cc5834eabc Daniel Borkmann        2020-03-18  4100  # endif
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4101  	/* MTU-check only happens on "last" net_device in a redirect sequence
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4102  	 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4103  	 * either ingress or egress to another device).
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4104  	 */
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4105  	if (mtu_check && !is_skb_forwardable(dev, skb)) {
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4106  		rc = -EMSGSIZE;
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4107  		goto drop;
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06  4108  	}
1f211a1b929c804 Daniel Borkmann        2016-01-07  4109  #endif
0287587884b1504 Eric Dumazet           2014-10-05  4110  	/* If device/qdisc don't need skb->dst, release it right now while
0287587884b1504 Eric Dumazet           2014-10-05  4111  	 * its hot in this cpu cache.
0287587884b1504 Eric Dumazet           2014-10-05  4112  	 */
0287587884b1504 Eric Dumazet           2014-10-05  4113  	if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
0287587884b1504 Eric Dumazet           2014-10-05  4114  		skb_dst_drop(skb);
0287587884b1504 Eric Dumazet           2014-10-05  4115  	else
0287587884b1504 Eric Dumazet           2014-10-05  4116  		skb_dst_force(skb);
0287587884b1504 Eric Dumazet           2014-10-05  4117  
4bd97d51a5e602e Paolo Abeni            2019-03-20  4118  	txq = netdev_core_pick_tx(dev, skb, sb_dev);
a898def29e4119b Paul E. McKenney       2010-02-22  4119  	q = rcu_dereference_bh(txq->qdisc);
37437bb2e1ae8af David S. Miller        2008-07-16  4120  
cf66ba58b5cb8b1 Koki Sanagi            2010-08-23  4121  	trace_net_dev_queue(skb);
^1da177e4c3f415 Linus Torvalds         2005-04-16  4122  	if (q->enqueue) {
bbd8a0d3a3b65d3 Krishna Kumar          2009-08-06  4123  		rc = __dev_xmit_skb(skb, q, dev, txq);
^1da177e4c3f415 Linus Torvalds         2005-04-16  4124  		goto out;
^1da177e4c3f415 Linus Torvalds         2005-04-16  4125  	}
^1da177e4c3f415 Linus Torvalds         2005-04-16  4126  
^1da177e4c3f415 Linus Torvalds         2005-04-16  4127  	/* The device has no queue. Common case for software devices:
eb13da1a103a808 tcharding              2017-02-09  4128  	 * loopback, all the sorts of tunnels...
^1da177e4c3f415 Linus Torvalds         2005-04-16  4129  
eb13da1a103a808 tcharding              2017-02-09  4130  	 * Really, it is unlikely that netif_tx_lock protection is necessary
eb13da1a103a808 tcharding              2017-02-09  4131  	 * here.  (f.e. loopback and IP tunnels are clean ignoring statistics
eb13da1a103a808 tcharding              2017-02-09  4132  	 * counters.)
eb13da1a103a808 tcharding              2017-02-09  4133  	 * However, it is possible, that they rely on protection
eb13da1a103a808 tcharding              2017-02-09  4134  	 * made by us here.
^1da177e4c3f415 Linus Torvalds         2005-04-16  4135  
eb13da1a103a808 tcharding              2017-02-09  4136  	 * Check this and shot the lock. It is not prone from deadlocks.
eb13da1a103a808 tcharding              2017-02-09  4137  	 *Either shot noqueue qdisc, it is even simpler 8)
^1da177e4c3f415 Linus Torvalds         2005-04-16  4138  	 */
^1da177e4c3f415 Linus Torvalds         2005-04-16  4139  	if (dev->flags & IFF_UP) {
^1da177e4c3f415 Linus Torvalds         2005-04-16  4140  		int cpu = smp_processor_id(); /* ok because BHs are off */
^1da177e4c3f415 Linus Torvalds         2005-04-16  4141  
c773e847ea8f681 David S. Miller        2008-07-08  4142  		if (txq->xmit_lock_owner != cpu) {
97cdcf37b57e3f2 Florian Westphal       2019-04-01  4143  			if (dev_xmit_recursion())
745e20f1b626b1b Eric Dumazet           2010-09-29  4144  				goto recursion_alert;
745e20f1b626b1b Eric Dumazet           2010-09-29  4145  
f53c723902d1ac5 Steffen Klassert       2017-12-20  4146  			skb = validate_xmit_skb(skb, dev, &again);
1f59533f9ca5634 Jesper Dangaard Brouer 2014-09-03  4147  			if (!skb)
d21fd63ea385620 Eric Dumazet           2016-04-12  4148  				goto out;
1f59533f9ca5634 Jesper Dangaard Brouer 2014-09-03  4149  
c773e847ea8f681 David S. Miller        2008-07-08  4150  			HARD_TX_LOCK(dev, txq, cpu);
^1da177e4c3f415 Linus Torvalds         2005-04-16  4151  
7346649826382b7 Tom Herbert            2011-11-28  4152  			if (!netif_xmit_stopped(txq)) {
97cdcf37b57e3f2 Florian Westphal       2019-04-01  4153  				dev_xmit_recursion_inc();
ce93718fb7cdbc0 David S. Miller        2014-08-30  4154  				skb = dev_hard_start_xmit(skb, dev, txq, &rc);
97cdcf37b57e3f2 Florian Westphal       2019-04-01  4155  				dev_xmit_recursion_dec();
572a9d7b6fc7f20 Patrick McHardy        2009-11-10  4156  				if (dev_xmit_complete(rc)) {
c773e847ea8f681 David S. Miller        2008-07-08  4157  					HARD_TX_UNLOCK(dev, txq);
^1da177e4c3f415 Linus Torvalds         2005-04-16  4158  					goto out;
^1da177e4c3f415 Linus Torvalds         2005-04-16  4159  				}
^1da177e4c3f415 Linus Torvalds         2005-04-16  4160  			}
c773e847ea8f681 David S. Miller        2008-07-08  4161  			HARD_TX_UNLOCK(dev, txq);
e87cc4728f0e2fb Joe Perches            2012-05-13  4162  			net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
7b6cd1ce72176e2 Joe Perches            2012-02-01  4163  					     dev->name);
^1da177e4c3f415 Linus Torvalds         2005-04-16  4164  		} else {
^1da177e4c3f415 Linus Torvalds         2005-04-16  4165  			/* Recursion is detected! It is possible,
745e20f1b626b1b Eric Dumazet           2010-09-29  4166  			 * unfortunately
745e20f1b626b1b Eric Dumazet           2010-09-29  4167  			 */
745e20f1b626b1b Eric Dumazet           2010-09-29  4168  recursion_alert:
e87cc4728f0e2fb Joe Perches            2012-05-13  4169  			net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
7b6cd1ce72176e2 Joe Perches            2012-02-01  4170  					     dev->name);
^1da177e4c3f415 Linus Torvalds         2005-04-16  4171  		}
^1da177e4c3f415 Linus Torvalds         2005-04-16  4172  	}
^1da177e4c3f415 Linus Torvalds         2005-04-16  4173  
^1da177e4c3f415 Linus Torvalds         2005-04-16  4174  	rc = -ENETDOWN;
d4828d85d188dc7 Herbert Xu             2006-06-22  4175  	rcu_read_unlock_bh();
2065cee7d6b74c8 Jesper Dangaard Brouer 2020-10-06 @4176  drop:
015f0688f57ca4d Eric Dumazet           2014-03-27  4177  	atomic_long_inc(&dev->tx_dropped);
1f59533f9ca5634 Jesper Dangaard Brouer 2014-09-03  4178  	kfree_skb_list(skb);
^1da177e4c3f415 Linus Torvalds         2005-04-16  4179  	return rc;
^1da177e4c3f415 Linus Torvalds         2005-04-16  4180  out:
d4828d85d188dc7 Herbert Xu             2006-06-22  4181  	rcu_read_unlock_bh();
^1da177e4c3f415 Linus Torvalds         2005-04-16  4182  	return rc;
^1da177e4c3f415 Linus Torvalds         2005-04-16  4183  }
f663dd9aaf9ed12 Jason Wang             2014-01-10  4184  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 37207 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-10-07 17:44 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06 16:02 [PATCH bpf-next V1 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
2020-10-06 16:02 ` [PATCH bpf-next V1 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2020-10-06 16:02 ` [PATCH bpf-next V1 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2020-10-07  1:34   ` Maciej Żenczykowski
2020-10-07  7:42     ` Jesper Dangaard Brouer
2020-10-07 16:38       ` David Ahern
2020-10-07  7:28   ` kernel test robot
2020-10-06 16:03 ` [PATCH bpf-next V1 3/6] bpf: add BPF-helper for reading MTU from net_device via ifindex Jesper Dangaard Brouer
2020-10-06 16:33   ` Jesper Dangaard Brouer
2020-10-07  1:18     ` Jakub Kicinski
2020-10-07  1:24       ` Maciej Żenczykowski
2020-10-07  7:53         ` Jesper Dangaard Brouer
2020-10-07 16:35         ` David Ahern
2020-10-07 17:44           ` Maciej Żenczykowski
2020-10-06 16:03 ` [PATCH bpf-next V1 4/6] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
2020-10-06 16:03 ` [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
2020-10-06 20:09   ` kernel test robot
2020-10-06 20:09     ` kernel test robot
2020-10-07  0:26   ` kernel test robot
2020-10-07  0:26     ` kernel test robot
2020-10-06 16:03 ` [PATCH bpf-next V1 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer
2020-10-06 18:26 [PATCH bpf-next V1 5/6] bpf: Add MTU check for TC-BPF packets after egress hook kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.