All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] fix xfrm MTU regression
@ 2021-04-29 17:02 Jiri Bohac
  2021-04-29 19:48 ` Sabrina Dubroca
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jiri Bohac @ 2021-04-29 17:02 UTC (permalink / raw)
  To: Mike Maloney, Eric Dumazet, davem; +Cc: netdev, Steffen Klassert, Herbert Xu

Hi,

Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
sendmsg crash caused by too small MTU") breaks PMTU for xfrm.

A Packet Too Big ICMPv6 message received in response to an ESP
packet will prevent all further communication through the tunnel
if the reported MTU minus the ESP overhead is smaller than 1280.

E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
in all further packets in the tunnel dropped. A ping through the
tunnel fails with "ping: sendmsg: Invalid argument".

Apparently the MTU on the xfrm route is smaller than 1280 and
fails the check inside ip6_setup_cork() added by 749439bf.

We found this by debugging USGv6/ipv6ready failures. Failing
tests are: "Phase-2 Interoperability Test Scenario IPsec" /
5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).

Below is my attempt to fix the situation by dropping the MTU
check and instead checking for the underflows described in the
749439bf commit message (without much understanding of the
details!). Does this make sense?:

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..8af6adb42c85 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
-	if (mtu < IPV6_MIN_MTU)
-		return -EINVAL;
 	cork->base.fragsize = mtu;
 	cork->base.gso_size = ipc6->gso_size;
 	cork->base.tx_flags = 0;
@@ -1465,6 +1463,11 @@ static int __ip6_append_data(struct sock *sk,
 
 	fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
 			(opt ? opt->opt_nflen : 0);
+
+	if (mtu < fragheaderlen ||
+	    ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
+		goto emsgsize;
+
 	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
 		     sizeof(struct frag_hdr);
 

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fix xfrm MTU regression
  2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
@ 2021-04-29 19:48 ` Sabrina Dubroca
  2021-04-29 20:25   ` Jiri Bohac
  2021-04-29 20:37 ` kernel test robot
  2021-04-30  5:36 ` [RFC PATCH v2] " Jiri Bohac
  2 siblings, 1 reply; 7+ messages in thread
From: Sabrina Dubroca @ 2021-04-29 19:48 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert, Herbert Xu

2021-04-29, 19:02:54 +0200, Jiri Bohac wrote:
> Hi,
> 
> Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
> sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
> 
> A Packet Too Big ICMPv6 message received in response to an ESP
> packet will prevent all further communication through the tunnel
> if the reported MTU minus the ESP overhead is smaller than 1280.
> 
> E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
> is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
> in all further packets in the tunnel dropped. A ping through the
> tunnel fails with "ping: sendmsg: Invalid argument".
> 
> Apparently the MTU on the xfrm route is smaller than 1280 and
> fails the check inside ip6_setup_cork() added by 749439bf.
> 
> We found this by debugging USGv6/ipv6ready failures. Failing
> tests are: "Phase-2 Interoperability Test Scenario IPsec" /
> 5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).

That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
should return at least 1280 for ipv6"), currently in Steffen's ipsec
tree:
https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276

-- 
Sabrina


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fix xfrm MTU regression
  2021-04-29 19:48 ` Sabrina Dubroca
@ 2021-04-29 20:25   ` Jiri Bohac
  2021-05-01 10:23     ` Sabrina Dubroca
  0 siblings, 1 reply; 7+ messages in thread
From: Jiri Bohac @ 2021-04-29 20:25 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert, Herbert Xu

On Thu, Apr 29, 2021 at 09:48:09PM +0200, Sabrina Dubroca wrote:
> That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
> should return at least 1280 for ipv6"), currently in Steffen's ipsec
> tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276

Thanks, that is interesting! The patch makes my large (-s 1400) pings inside
ESP pass through a 1280-MTU link on an intermediary router  but in a suboptimal
double-fragmented way. tcpdump on the router shows:

	22:09:44.556452 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (0|1232) ESP(spi=0x00000001,seq=0xdd), length 1232                    
	22:09:44.566269 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (1232|100)                                                            
	22:09:44.566553 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0xde), length 276

I.e. the ping is fragmented into two ESP packets and the first ESP packet is then fragmented again.

The same pings with my patch come through in two fragments:

	22:13:22.072934 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0x28), length 1236
	22:13:22.073039 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0x29), length 356 

I can do more tests if needed.

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fix xfrm MTU regression
  2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
  2021-04-29 19:48 ` Sabrina Dubroca
@ 2021-04-29 20:37 ` kernel test robot
  2021-04-30  5:36 ` [RFC PATCH v2] " Jiri Bohac
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-04-29 20:37 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 16600 bytes --]

Hi Jiri,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on linus/master]
[also build test WARNING on v5.12 next-20210429]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jiri-Bohac/fix-xfrm-MTU-regression/20210430-010412
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git d72cd4ad4174cfd2257c426ad51e4f53bcfde9c9
config: x86_64-randconfig-a015-20210429 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9131a078901b00e68248a27a4f8c4b11bb1db1ae)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/f556543e005a1eb6567fc299e60f7d92dc508f88
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jiri-Bohac/fix-xfrm-MTU-regression/20210430-010412
        git checkout f556543e005a1eb6567fc299e60f7d92dc508f88
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/ipv6/ip6_output.c:1467:6: warning: variable 'headersize' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (mtu < fragheaderlen ||
               ^~~~~~~~~~~~~~~~~~~~~~
   net/ipv6/ip6_output.c:1501:27: note: uninitialized use occurs here
                   pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
                                           ^~~~~~~~~~
   include/linux/minmax.h:118:48: note: expanded from macro 'max_t'
   #define max_t(type, x, y)       __careful_cmp((type)(x), (type)(y), >)
                                                        ^
   include/linux/minmax.h:44:14: note: expanded from macro '__careful_cmp'
                   __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
                              ^
   include/linux/minmax.h:37:25: note: expanded from macro '__cmp_once'
                   typeof(x) unique_x = (x);               \
                                         ^
   net/ipv6/ip6_output.c:1467:2: note: remove the 'if' if its condition is always false
           if (mtu < fragheaderlen ||
           ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> net/ipv6/ip6_output.c:1467:6: warning: variable 'headersize' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
           if (mtu < fragheaderlen ||
               ^~~~~~~~~~~~~~~~~~~
   net/ipv6/ip6_output.c:1501:27: note: uninitialized use occurs here
                   pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
                                           ^~~~~~~~~~
   include/linux/minmax.h:118:48: note: expanded from macro 'max_t'
   #define max_t(type, x, y)       __careful_cmp((type)(x), (type)(y), >)
                                                        ^
   include/linux/minmax.h:44:14: note: expanded from macro '__careful_cmp'
                   __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
                              ^
   include/linux/minmax.h:37:25: note: expanded from macro '__cmp_once'
                   typeof(x) unique_x = (x);               \
                                         ^
   net/ipv6/ip6_output.c:1467:6: note: remove the '||' if its condition is always false
           if (mtu < fragheaderlen ||
               ^~~~~~~~~~~~~~~~~~~~~~
   net/ipv6/ip6_output.c:1444:41: note: initialize the variable 'headersize' to silence this warning
           unsigned int maxnonfragsize, headersize;
                                                  ^
                                                   = 0
   2 warnings generated.


vim +1467 net/ipv6/ip6_output.c

  1419	
  1420	static int __ip6_append_data(struct sock *sk,
  1421				     struct flowi6 *fl6,
  1422				     struct sk_buff_head *queue,
  1423				     struct inet_cork *cork,
  1424				     struct inet6_cork *v6_cork,
  1425				     struct page_frag *pfrag,
  1426				     int getfrag(void *from, char *to, int offset,
  1427						 int len, int odd, struct sk_buff *skb),
  1428				     void *from, int length, int transhdrlen,
  1429				     unsigned int flags, struct ipcm6_cookie *ipc6)
  1430	{
  1431		struct sk_buff *skb, *skb_prev = NULL;
  1432		unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu;
  1433		struct ubuf_info *uarg = NULL;
  1434		int exthdrlen = 0;
  1435		int dst_exthdrlen = 0;
  1436		int hh_len;
  1437		int copy;
  1438		int err;
  1439		int offset = 0;
  1440		u32 tskey = 0;
  1441		struct rt6_info *rt = (struct rt6_info *)cork->dst;
  1442		struct ipv6_txoptions *opt = v6_cork->opt;
  1443		int csummode = CHECKSUM_NONE;
  1444		unsigned int maxnonfragsize, headersize;
  1445		unsigned int wmem_alloc_delta = 0;
  1446		bool paged, extra_uref = false;
  1447	
  1448		skb = skb_peek_tail(queue);
  1449		if (!skb) {
  1450			exthdrlen = opt ? opt->opt_flen : 0;
  1451			dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
  1452		}
  1453	
  1454		paged = !!cork->gso_size;
  1455		mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
  1456		orig_mtu = mtu;
  1457	
  1458		if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
  1459		    sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
  1460			tskey = sk->sk_tskey++;
  1461	
  1462		hh_len = LL_RESERVED_SPACE(rt->dst.dev);
  1463	
  1464		fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
  1465				(opt ? opt->opt_nflen : 0);
  1466	
> 1467		if (mtu < fragheaderlen ||
  1468		    ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
  1469			goto emsgsize;
  1470	
  1471		maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
  1472			     sizeof(struct frag_hdr);
  1473	
  1474		headersize = sizeof(struct ipv6hdr) +
  1475			     (opt ? opt->opt_flen + opt->opt_nflen : 0) +
  1476			     (dst_allfrag(&rt->dst) ?
  1477			      sizeof(struct frag_hdr) : 0) +
  1478			     rt->rt6i_nfheader_len;
  1479	
  1480		/* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
  1481		 * the first fragment
  1482		 */
  1483		if (headersize + transhdrlen > mtu)
  1484			goto emsgsize;
  1485	
  1486		if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
  1487		    (sk->sk_protocol == IPPROTO_UDP ||
  1488		     sk->sk_protocol == IPPROTO_RAW)) {
  1489			ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
  1490					sizeof(struct ipv6hdr));
  1491			goto emsgsize;
  1492		}
  1493	
  1494		if (ip6_sk_ignore_df(sk))
  1495			maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
  1496		else
  1497			maxnonfragsize = mtu;
  1498	
  1499		if (cork->length + length > maxnonfragsize - headersize) {
  1500	emsgsize:
  1501			pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
  1502			ipv6_local_error(sk, EMSGSIZE, fl6, pmtu);
  1503			return -EMSGSIZE;
  1504		}
  1505	
  1506		/* CHECKSUM_PARTIAL only with no extension headers and when
  1507		 * we are not going to fragment
  1508		 */
  1509		if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
  1510		    headersize == sizeof(struct ipv6hdr) &&
  1511		    length <= mtu - headersize &&
  1512		    (!(flags & MSG_MORE) || cork->gso_size) &&
  1513		    rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM))
  1514			csummode = CHECKSUM_PARTIAL;
  1515	
  1516		if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) {
  1517			uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
  1518			if (!uarg)
  1519				return -ENOBUFS;
  1520			extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
  1521			if (rt->dst.dev->features & NETIF_F_SG &&
  1522			    csummode == CHECKSUM_PARTIAL) {
  1523				paged = true;
  1524			} else {
  1525				uarg->zerocopy = 0;
  1526				skb_zcopy_set(skb, uarg, &extra_uref);
  1527			}
  1528		}
  1529	
  1530		/*
  1531		 * Let's try using as much space as possible.
  1532		 * Use MTU if total length of the message fits into the MTU.
  1533		 * Otherwise, we need to reserve fragment header and
  1534		 * fragment alignment (= 8-15 octects, in total).
  1535		 *
  1536		 * Note that we may need to "move" the data from the tail
  1537		 * of the buffer to the new fragment when we split
  1538		 * the message.
  1539		 *
  1540		 * FIXME: It may be fragmented into multiple chunks
  1541		 *        at once if non-fragmentable extension headers
  1542		 *        are too large.
  1543		 * --yoshfuji
  1544		 */
  1545	
  1546		cork->length += length;
  1547		if (!skb)
  1548			goto alloc_new_skb;
  1549	
  1550		while (length > 0) {
  1551			/* Check if the remaining data fits into current packet. */
  1552			copy = (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - skb->len;
  1553			if (copy < length)
  1554				copy = maxfraglen - skb->len;
  1555	
  1556			if (copy <= 0) {
  1557				char *data;
  1558				unsigned int datalen;
  1559				unsigned int fraglen;
  1560				unsigned int fraggap;
  1561				unsigned int alloclen;
  1562				unsigned int pagedlen;
  1563	alloc_new_skb:
  1564				/* There's no room in the current skb */
  1565				if (skb)
  1566					fraggap = skb->len - maxfraglen;
  1567				else
  1568					fraggap = 0;
  1569				/* update mtu and maxfraglen if necessary */
  1570				if (!skb || !skb_prev)
  1571					ip6_append_data_mtu(&mtu, &maxfraglen,
  1572							    fragheaderlen, skb, rt,
  1573							    orig_mtu);
  1574	
  1575				skb_prev = skb;
  1576	
  1577				/*
  1578				 * If remaining data exceeds the mtu,
  1579				 * we know we need more fragment(s).
  1580				 */
  1581				datalen = length + fraggap;
  1582	
  1583				if (datalen > (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - fragheaderlen)
  1584					datalen = maxfraglen - fragheaderlen - rt->dst.trailer_len;
  1585				fraglen = datalen + fragheaderlen;
  1586				pagedlen = 0;
  1587	
  1588				if ((flags & MSG_MORE) &&
  1589				    !(rt->dst.dev->features&NETIF_F_SG))
  1590					alloclen = mtu;
  1591				else if (!paged)
  1592					alloclen = fraglen;
  1593				else {
  1594					alloclen = min_t(int, fraglen, MAX_HEADER);
  1595					pagedlen = fraglen - alloclen;
  1596				}
  1597	
  1598				alloclen += dst_exthdrlen;
  1599	
  1600				if (datalen != length + fraggap) {
  1601					/*
  1602					 * this is not the last fragment, the trailer
  1603					 * space is regarded as data space.
  1604					 */
  1605					datalen += rt->dst.trailer_len;
  1606				}
  1607	
  1608				alloclen += rt->dst.trailer_len;
  1609				fraglen = datalen + fragheaderlen;
  1610	
  1611				/*
  1612				 * We just reserve space for fragment header.
  1613				 * Note: this may be overallocation if the message
  1614				 * (without MSG_MORE) fits into the MTU.
  1615				 */
  1616				alloclen += sizeof(struct frag_hdr);
  1617	
  1618				copy = datalen - transhdrlen - fraggap - pagedlen;
  1619				if (copy < 0) {
  1620					err = -EINVAL;
  1621					goto error;
  1622				}
  1623				if (transhdrlen) {
  1624					skb = sock_alloc_send_skb(sk,
  1625							alloclen + hh_len,
  1626							(flags & MSG_DONTWAIT), &err);
  1627				} else {
  1628					skb = NULL;
  1629					if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
  1630					    2 * sk->sk_sndbuf)
  1631						skb = alloc_skb(alloclen + hh_len,
  1632								sk->sk_allocation);
  1633					if (unlikely(!skb))
  1634						err = -ENOBUFS;
  1635				}
  1636				if (!skb)
  1637					goto error;
  1638				/*
  1639				 *	Fill in the control structures
  1640				 */
  1641				skb->protocol = htons(ETH_P_IPV6);
  1642				skb->ip_summed = csummode;
  1643				skb->csum = 0;
  1644				/* reserve for fragmentation and ipsec header */
  1645				skb_reserve(skb, hh_len + sizeof(struct frag_hdr) +
  1646					    dst_exthdrlen);
  1647	
  1648				/*
  1649				 *	Find where to start putting bytes
  1650				 */
  1651				data = skb_put(skb, fraglen - pagedlen);
  1652				skb_set_network_header(skb, exthdrlen);
  1653				data += fragheaderlen;
  1654				skb->transport_header = (skb->network_header +
  1655							 fragheaderlen);
  1656				if (fraggap) {
  1657					skb->csum = skb_copy_and_csum_bits(
  1658						skb_prev, maxfraglen,
  1659						data + transhdrlen, fraggap);
  1660					skb_prev->csum = csum_sub(skb_prev->csum,
  1661								  skb->csum);
  1662					data += fraggap;
  1663					pskb_trim_unique(skb_prev, maxfraglen);
  1664				}
  1665				if (copy > 0 &&
  1666				    getfrag(from, data + transhdrlen, offset,
  1667					    copy, fraggap, skb) < 0) {
  1668					err = -EFAULT;
  1669					kfree_skb(skb);
  1670					goto error;
  1671				}
  1672	
  1673				offset += copy;
  1674				length -= copy + transhdrlen;
  1675				transhdrlen = 0;
  1676				exthdrlen = 0;
  1677				dst_exthdrlen = 0;
  1678	
  1679				/* Only the initial fragment is time stamped */
  1680				skb_shinfo(skb)->tx_flags = cork->tx_flags;
  1681				cork->tx_flags = 0;
  1682				skb_shinfo(skb)->tskey = tskey;
  1683				tskey = 0;
  1684				skb_zcopy_set(skb, uarg, &extra_uref);
  1685	
  1686				if ((flags & MSG_CONFIRM) && !skb_prev)
  1687					skb_set_dst_pending_confirm(skb, 1);
  1688	
  1689				/*
  1690				 * Put the packet on the pending queue
  1691				 */
  1692				if (!skb->destructor) {
  1693					skb->destructor = sock_wfree;
  1694					skb->sk = sk;
  1695					wmem_alloc_delta += skb->truesize;
  1696				}
  1697				__skb_queue_tail(queue, skb);
  1698				continue;
  1699			}
  1700	
  1701			if (copy > length)
  1702				copy = length;
  1703	
  1704			if (!(rt->dst.dev->features&NETIF_F_SG) &&
  1705			    skb_tailroom(skb) >= copy) {
  1706				unsigned int off;
  1707	
  1708				off = skb->len;
  1709				if (getfrag(from, skb_put(skb, copy),
  1710							offset, copy, off, skb) < 0) {
  1711					__skb_trim(skb, off);
  1712					err = -EFAULT;
  1713					goto error;
  1714				}
  1715			} else if (!uarg || !uarg->zerocopy) {
  1716				int i = skb_shinfo(skb)->nr_frags;
  1717	
  1718				err = -ENOMEM;
  1719				if (!sk_page_frag_refill(sk, pfrag))
  1720					goto error;
  1721	
  1722				if (!skb_can_coalesce(skb, i, pfrag->page,
  1723						      pfrag->offset)) {
  1724					err = -EMSGSIZE;
  1725					if (i == MAX_SKB_FRAGS)
  1726						goto error;
  1727	
  1728					__skb_fill_page_desc(skb, i, pfrag->page,
  1729							     pfrag->offset, 0);
  1730					skb_shinfo(skb)->nr_frags = ++i;
  1731					get_page(pfrag->page);
  1732				}
  1733				copy = min_t(int, copy, pfrag->size - pfrag->offset);
  1734				if (getfrag(from,
  1735					    page_address(pfrag->page) + pfrag->offset,
  1736					    offset, copy, skb->len, skb) < 0)
  1737					goto error_efault;
  1738	
  1739				pfrag->offset += copy;
  1740				skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
  1741				skb->len += copy;
  1742				skb->data_len += copy;
  1743				skb->truesize += copy;
  1744				wmem_alloc_delta += copy;
  1745			} else {
  1746				err = skb_zerocopy_iter_dgram(skb, from, copy);
  1747				if (err < 0)
  1748					goto error;
  1749			}
  1750			offset += copy;
  1751			length -= copy;
  1752		}
  1753	
  1754		if (wmem_alloc_delta)
  1755			refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
  1756		return 0;
  1757	
  1758	error_efault:
  1759		err = -EFAULT;
  1760	error:
  1761		net_zcopy_put_abort(uarg, extra_uref);
  1762		cork->length -= length;
  1763		IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
  1764		refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
  1765		return err;
  1766	}
  1767	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33314 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v2] fix xfrm MTU regression
  2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
  2021-04-29 19:48 ` Sabrina Dubroca
  2021-04-29 20:37 ` kernel test robot
@ 2021-04-30  5:36 ` Jiri Bohac
  2 siblings, 0 replies; 7+ messages in thread
From: Jiri Bohac @ 2021-04-30  5:36 UTC (permalink / raw)
  To: Mike Maloney, Eric Dumazet, davem; +Cc: netdev, Steffen Klassert, Herbert Xu

On Thu, Apr 29, 2021 at 07:02:55PM +0200, Jiri Bohac wrote:
> Below is my attempt to fix the situation by dropping the MTU
> check and instead checking for the underflows described in the
> 749439bf commit message (without much understanding of the
> details!). Does this make sense?:

the first version left headersize uninitialized in the error
path; v2 below fixes this.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..171eb4ec1e67 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
-	if (mtu < IPV6_MIN_MTU)
-		return -EINVAL;
 	cork->base.fragsize = mtu;
 	cork->base.gso_size = ipc6->gso_size;
 	cork->base.tx_flags = 0;
@@ -1465,8 +1463,6 @@ static int __ip6_append_data(struct sock *sk,
 
 	fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
 			(opt ? opt->opt_nflen : 0);
-	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
-		     sizeof(struct frag_hdr);
 
 	headersize = sizeof(struct ipv6hdr) +
 		     (opt ? opt->opt_flen + opt->opt_nflen : 0) +
@@ -1474,6 +1470,13 @@ static int __ip6_append_data(struct sock *sk,
 		      sizeof(struct frag_hdr) : 0) +
 		     rt->rt6i_nfheader_len;
 
+	if (mtu < fragheaderlen ||
+	    ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
+		goto emsgsize;
+
+	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
+		     sizeof(struct frag_hdr);
+
 	/* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
 	 * the first fragment
 	 */

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fix xfrm MTU regression
  2021-04-29 20:25   ` Jiri Bohac
@ 2021-05-01 10:23     ` Sabrina Dubroca
  0 siblings, 0 replies; 7+ messages in thread
From: Sabrina Dubroca @ 2021-05-01 10:23 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert, Herbert Xu

2021-04-29, 22:25:29 +0200, Jiri Bohac wrote:
> On Thu, Apr 29, 2021 at 09:48:09PM +0200, Sabrina Dubroca wrote:
> > That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
> > should return at least 1280 for ipv6"), currently in Steffen's ipsec
> > tree:
> > https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276
> 
> Thanks, that is interesting! The patch makes my large (-s 1400) pings inside
> ESP pass through a 1280-MTU link on an intermediary router  but in a suboptimal
> double-fragmented way. tcpdump on the router shows:
> 
> 	22:09:44.556452 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (0|1232) ESP(spi=0x00000001,seq=0xdd), length 1232                    
> 	22:09:44.566269 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (1232|100)                                                            
> 	22:09:44.566553 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0xde), length 276
> 
> I.e. the ping is fragmented into two ESP packets and the first ESP packet is then fragmented again.

It's a bit ugly, but I don't think we can do any better. We're going
through the stack twice in tunnel mode. The first pass (before xfrm)
we fragment according to the PMTU (adjusted to IPV6_MIN_MTU, because
MTUs lower than that are illegal in IPv6). The second time (after
xfrm), the first ESP packet is too big so we fragment it. This
behavior is consistent with a vti device running over a network with
MTU=1280 (which doesn't seem to work without my patch).

In transport mode, we're only going through the stack once, so we
don't see this double fragmentation.

I think my patch is correct, because without it we have IPv6 dsts
going around the kernel with an associated MTU smaller than
IPV6_MIN_MTU.

-- 
Sabrina


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] fix xfrm MTU regression
@ 2021-04-29 23:17 kernel test robot
  0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-04-29 23:17 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 31458 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210429170254.5grfgsz2hgy2qjhk@dwarf.suse.cz>
References: <20210429170254.5grfgsz2hgy2qjhk@dwarf.suse.cz>
TO: Jiri Bohac <jbohac@suse.cz>

Hi Jiri,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on linus/master]
[also build test WARNING on v5.12 next-20210429]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jiri-Bohac/fix-xfrm-MTU-regression/20210430-010412
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git d72cd4ad4174cfd2257c426ad51e4f53bcfde9c9
:::::: branch date: 6 hours ago
:::::: commit date: 6 hours ago
config: i386-randconfig-m021-20210429 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
net/ipv6/ip6_output.c:1501 __ip6_append_data() error: uninitialized symbol 'headersize'.

Old smatch warnings:
net/ipv6/ip6_output.c:292 ip6_xmit() error: we previously assumed 'np' could be null (see line 286)

vim +/headersize +1501 net/ipv6/ip6_output.c

366e41d9774d70 Vlad Yasevich            2015-01-31  1419  
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1420  static int __ip6_append_data(struct sock *sk,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1421  			     struct flowi6 *fl6,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1422  			     struct sk_buff_head *queue,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1423  			     struct inet_cork *cork,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1424  			     struct inet6_cork *v6_cork,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1425  			     struct page_frag *pfrag,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1426  			     int getfrag(void *from, char *to, int offset,
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1427  					 int len, int odd, struct sk_buff *skb),
366e41d9774d70 Vlad Yasevich            2015-01-31  1428  			     void *from, int length, int transhdrlen,
5fdaa88dfefa87 Willem de Bruijn         2018-07-06  1429  			     unsigned int flags, struct ipcm6_cookie *ipc6)
366e41d9774d70 Vlad Yasevich            2015-01-31  1430  {
366e41d9774d70 Vlad Yasevich            2015-01-31  1431  	struct sk_buff *skb, *skb_prev = NULL;
10b8a3de603df7 Paolo Abeni              2018-03-23  1432  	unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1433  	struct ubuf_info *uarg = NULL;
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1434  	int exthdrlen = 0;
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1435  	int dst_exthdrlen = 0;
366e41d9774d70 Vlad Yasevich            2015-01-31  1436  	int hh_len;
366e41d9774d70 Vlad Yasevich            2015-01-31  1437  	int copy;
366e41d9774d70 Vlad Yasevich            2015-01-31  1438  	int err;
366e41d9774d70 Vlad Yasevich            2015-01-31  1439  	int offset = 0;
366e41d9774d70 Vlad Yasevich            2015-01-31  1440  	u32 tskey = 0;
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1441  	struct rt6_info *rt = (struct rt6_info *)cork->dst;
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1442  	struct ipv6_txoptions *opt = v6_cork->opt;
32dce968dd987a Vlad Yasevich            2015-01-31  1443  	int csummode = CHECKSUM_NONE;
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1444  	unsigned int maxnonfragsize, headersize;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1445  	unsigned int wmem_alloc_delta = 0;
100f6d8e09905c Willem de Bruijn         2019-05-30  1446  	bool paged, extra_uref = false;
366e41d9774d70 Vlad Yasevich            2015-01-31  1447  
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1448  	skb = skb_peek_tail(queue);
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1449  	if (!skb) {
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1450  		exthdrlen = opt ? opt->opt_flen : 0;
7efdba5bd9a2f3 Romain KUNTZ             2013-01-16  1451  		dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1452  	}
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1453  
15e36f5b8e982d Willem de Bruijn         2018-04-26  1454  	paged = !!cork->gso_size;
bec1f6f697362c Willem de Bruijn         2018-04-26  1455  	mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
e367c2d03dba4c lucien                   2014-03-17  1456  	orig_mtu = mtu;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1457  
678ca42d688534 Willem de Bruijn         2018-07-06  1458  	if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
678ca42d688534 Willem de Bruijn         2018-07-06  1459  	    sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
678ca42d688534 Willem de Bruijn         2018-07-06  1460  		tskey = sk->sk_tskey++;
678ca42d688534 Willem de Bruijn         2018-07-06  1461  
d8d1f30b95a635 Changli Gao              2010-06-10  1462  	hh_len = LL_RESERVED_SPACE(rt->dst.dev);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1463  
a1b051405bc162 Masahide NAKAMURA        2007-12-20  1464  	fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
b4ce92775c2e7f Herbert Xu               2007-11-13  1465  			(opt ? opt->opt_nflen : 0);
f556543e005a1e Jiri Bohac               2021-04-29  1466  
f556543e005a1e Jiri Bohac               2021-04-29  1467  	if (mtu < fragheaderlen ||
f556543e005a1e Jiri Bohac               2021-04-29  1468  	    ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
f556543e005a1e Jiri Bohac               2021-04-29  1469  		goto emsgsize;
f556543e005a1e Jiri Bohac               2021-04-29  1470  
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1471  	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1472  		     sizeof(struct frag_hdr);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1473  
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1474  	headersize = sizeof(struct ipv6hdr) +
3a1cebe7e05027 Hannes Frederic Sowa     2014-05-11  1475  		     (opt ? opt->opt_flen + opt->opt_nflen : 0) +
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1476  		     (dst_allfrag(&rt->dst) ?
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1477  		      sizeof(struct frag_hdr) : 0) +
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1478  		     rt->rt6i_nfheader_len;
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1479  
10b8a3de603df7 Paolo Abeni              2018-03-23  1480  	/* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
10b8a3de603df7 Paolo Abeni              2018-03-23  1481  	 * the first fragment
10b8a3de603df7 Paolo Abeni              2018-03-23  1482  	 */
10b8a3de603df7 Paolo Abeni              2018-03-23  1483  	if (headersize + transhdrlen > mtu)
10b8a3de603df7 Paolo Abeni              2018-03-23  1484  		goto emsgsize;
10b8a3de603df7 Paolo Abeni              2018-03-23  1485  
26879da58711aa Wei Wang                 2016-05-02  1486  	if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1487  	    (sk->sk_protocol == IPPROTO_UDP ||
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1488  	     sk->sk_protocol == IPPROTO_RAW)) {
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1489  		ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1490  				sizeof(struct ipv6hdr));
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1491  		goto emsgsize;
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1492  	}
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1493  
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1494  	if (ip6_sk_ignore_df(sk))
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1495  		maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1496  	else
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1497  		maxnonfragsize = mtu;
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1498  
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1499  	if (cork->length + length > maxnonfragsize - headersize) {
4df98e76cde7c6 Hannes Frederic Sowa     2013-12-16  1500  emsgsize:
10b8a3de603df7 Paolo Abeni              2018-03-23 @1501  		pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
10b8a3de603df7 Paolo Abeni              2018-03-23  1502  		ipv6_local_error(sk, EMSGSIZE, fl6, pmtu);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1503  		return -EMSGSIZE;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1504  	}
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1505  
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1506  	/* CHECKSUM_PARTIAL only with no extension headers and when
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1507  	 * we are not going to fragment
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1508  	 */
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1509  	if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1510  	    headersize == sizeof(struct ipv6hdr) &&
2b89ed65a6f201 Vlad Yasevich            2017-01-29  1511  	    length <= mtu - headersize &&
bec1f6f697362c Willem de Bruijn         2018-04-26  1512  	    (!(flags & MSG_MORE) || cork->gso_size) &&
c8cd0989bd151f Tom Herbert              2015-12-14  1513  	    rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM))
682b1a9d3f9686 Hannes Frederic Sowa     2015-10-27  1514  		csummode = CHECKSUM_PARTIAL;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1515  
b5947e5d1e710c Willem de Bruijn         2018-11-30  1516  	if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) {
8c793822c5803e Jonathan Lemon           2021-01-06  1517  		uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
b5947e5d1e710c Willem de Bruijn         2018-11-30  1518  		if (!uarg)
b5947e5d1e710c Willem de Bruijn         2018-11-30  1519  			return -ENOBUFS;
522924b583082f Willem de Bruijn         2019-06-07  1520  		extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
b5947e5d1e710c Willem de Bruijn         2018-11-30  1521  		if (rt->dst.dev->features & NETIF_F_SG &&
b5947e5d1e710c Willem de Bruijn         2018-11-30  1522  		    csummode == CHECKSUM_PARTIAL) {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1523  			paged = true;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1524  		} else {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1525  			uarg->zerocopy = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1526  			skb_zcopy_set(skb, uarg, &extra_uref);
b5947e5d1e710c Willem de Bruijn         2018-11-30  1527  		}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1528  	}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1529  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1530  	/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1531  	 * Let's try using as much space as possible.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1532  	 * Use MTU if total length of the message fits into the MTU.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1533  	 * Otherwise, we need to reserve fragment header and
^1da177e4c3f41 Linus Torvalds           2005-04-16  1534  	 * fragment alignment (= 8-15 octects, in total).
^1da177e4c3f41 Linus Torvalds           2005-04-16  1535  	 *
634a63e73f0594 Randy Dunlap             2020-09-17  1536  	 * Note that we may need to "move" the data from the tail
^1da177e4c3f41 Linus Torvalds           2005-04-16  1537  	 * of the buffer to the new fragment when we split
^1da177e4c3f41 Linus Torvalds           2005-04-16  1538  	 * the message.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1539  	 *
^1da177e4c3f41 Linus Torvalds           2005-04-16  1540  	 * FIXME: It may be fragmented into multiple chunks
^1da177e4c3f41 Linus Torvalds           2005-04-16  1541  	 *        at once if non-fragmentable extension headers
^1da177e4c3f41 Linus Torvalds           2005-04-16  1542  	 *        are too large.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1543  	 * --yoshfuji
^1da177e4c3f41 Linus Torvalds           2005-04-16  1544  	 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1545  
2811ebac2521ce Hannes Frederic Sowa     2013-09-21  1546  	cork->length += length;
2811ebac2521ce Hannes Frederic Sowa     2013-09-21  1547  	if (!skb)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1548  		goto alloc_new_skb;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1549  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1550  	while (length > 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1551  		/* Check if the remaining data fits into current packet. */
bdc712b4c2baf9 David S. Miller          2011-05-06  1552  		copy = (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1553  		if (copy < length)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1554  			copy = maxfraglen - skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1555  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1556  		if (copy <= 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1557  			char *data;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1558  			unsigned int datalen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1559  			unsigned int fraglen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1560  			unsigned int fraggap;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1561  			unsigned int alloclen;
aba36930a35e7f Willem de Bruijn         2018-11-24  1562  			unsigned int pagedlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1563  alloc_new_skb:
^1da177e4c3f41 Linus Torvalds           2005-04-16  1564  			/* There's no room in the current skb */
0c1833797a5a6e Gao feng                 2012-05-26  1565  			if (skb)
0c1833797a5a6e Gao feng                 2012-05-26  1566  				fraggap = skb->len - maxfraglen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1567  			else
^1da177e4c3f41 Linus Torvalds           2005-04-16  1568  				fraggap = 0;
0c1833797a5a6e Gao feng                 2012-05-26  1569  			/* update mtu and maxfraglen if necessary */
63159f29be1df7 Ian Morris               2015-03-29  1570  			if (!skb || !skb_prev)
0c1833797a5a6e Gao feng                 2012-05-26  1571  				ip6_append_data_mtu(&mtu, &maxfraglen,
75a493e60ac4bb Hannes Frederic Sowa     2013-07-02  1572  						    fragheaderlen, skb, rt,
e367c2d03dba4c lucien                   2014-03-17  1573  						    orig_mtu);
0c1833797a5a6e Gao feng                 2012-05-26  1574  
0c1833797a5a6e Gao feng                 2012-05-26  1575  			skb_prev = skb;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1576  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1577  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1578  			 * If remaining data exceeds the mtu,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1579  			 * we know we need more fragment(s).
^1da177e4c3f41 Linus Torvalds           2005-04-16  1580  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1581  			datalen = length + fraggap;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1582  
0c1833797a5a6e Gao feng                 2012-05-26  1583  			if (datalen > (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - fragheaderlen)
0c1833797a5a6e Gao feng                 2012-05-26  1584  				datalen = maxfraglen - fragheaderlen - rt->dst.trailer_len;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1585  			fraglen = datalen + fragheaderlen;
aba36930a35e7f Willem de Bruijn         2018-11-24  1586  			pagedlen = 0;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1587  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1588  			if ((flags & MSG_MORE) &&
d8d1f30b95a635 Changli Gao              2010-06-10  1589  			    !(rt->dst.dev->features&NETIF_F_SG))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1590  				alloclen = mtu;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1591  			else if (!paged)
15e36f5b8e982d Willem de Bruijn         2018-04-26  1592  				alloclen = fraglen;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1593  			else {
15e36f5b8e982d Willem de Bruijn         2018-04-26  1594  				alloclen = min_t(int, fraglen, MAX_HEADER);
15e36f5b8e982d Willem de Bruijn         2018-04-26  1595  				pagedlen = fraglen - alloclen;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1596  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1597  
299b0767642a65 Steffen Klassert         2011-10-11  1598  			alloclen += dst_exthdrlen;
299b0767642a65 Steffen Klassert         2011-10-11  1599  
0c1833797a5a6e Gao feng                 2012-05-26  1600  			if (datalen != length + fraggap) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1601  				/*
0c1833797a5a6e Gao feng                 2012-05-26  1602  				 * this is not the last fragment, the trailer
0c1833797a5a6e Gao feng                 2012-05-26  1603  				 * space is regarded as data space.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1604  				 */
0c1833797a5a6e Gao feng                 2012-05-26  1605  				datalen += rt->dst.trailer_len;
0c1833797a5a6e Gao feng                 2012-05-26  1606  			}
0c1833797a5a6e Gao feng                 2012-05-26  1607  
d8d1f30b95a635 Changli Gao              2010-06-10  1608  			alloclen += rt->dst.trailer_len;
0c1833797a5a6e Gao feng                 2012-05-26  1609  			fraglen = datalen + fragheaderlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1610  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1611  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1612  			 * We just reserve space for fragment header.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1613  			 * Note: this may be overallocation if the message
^1da177e4c3f41 Linus Torvalds           2005-04-16  1614  			 * (without MSG_MORE) fits into the MTU.
^1da177e4c3f41 Linus Torvalds           2005-04-16  1615  			 */
^1da177e4c3f41 Linus Torvalds           2005-04-16  1616  			alloclen += sizeof(struct frag_hdr);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1617  
15e36f5b8e982d Willem de Bruijn         2018-04-26  1618  			copy = datalen - transhdrlen - fraggap - pagedlen;
232cd35d0804cc Eric Dumazet             2017-05-19  1619  			if (copy < 0) {
232cd35d0804cc Eric Dumazet             2017-05-19  1620  				err = -EINVAL;
232cd35d0804cc Eric Dumazet             2017-05-19  1621  				goto error;
232cd35d0804cc Eric Dumazet             2017-05-19  1622  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1623  			if (transhdrlen) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1624  				skb = sock_alloc_send_skb(sk,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1625  						alloclen + hh_len,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1626  						(flags & MSG_DONTWAIT), &err);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1627  			} else {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1628  				skb = NULL;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1629  				if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
^1da177e4c3f41 Linus Torvalds           2005-04-16  1630  				    2 * sk->sk_sndbuf)
1f4c6eb2402968 Eric Dumazet             2018-03-31  1631  					skb = alloc_skb(alloclen + hh_len,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1632  							sk->sk_allocation);
63159f29be1df7 Ian Morris               2015-03-29  1633  				if (unlikely(!skb))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1634  					err = -ENOBUFS;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1635  			}
63159f29be1df7 Ian Morris               2015-03-29  1636  			if (!skb)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1637  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1638  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1639  			 *	Fill in the control structures
^1da177e4c3f41 Linus Torvalds           2005-04-16  1640  			 */
9c9c9ad5fae7e9 Hannes Frederic Sowa     2013-08-26  1641  			skb->protocol = htons(ETH_P_IPV6);
32dce968dd987a Vlad Yasevich            2015-01-31  1642  			skb->ip_summed = csummode;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1643  			skb->csum = 0;
1f85851e17b64c Gao feng                 2012-03-19  1644  			/* reserve for fragmentation and ipsec header */
1f85851e17b64c Gao feng                 2012-03-19  1645  			skb_reserve(skb, hh_len + sizeof(struct frag_hdr) +
1f85851e17b64c Gao feng                 2012-03-19  1646  				    dst_exthdrlen);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1647  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1648  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1649  			 *	Find where to start putting bytes
^1da177e4c3f41 Linus Torvalds           2005-04-16  1650  			 */
15e36f5b8e982d Willem de Bruijn         2018-04-26  1651  			data = skb_put(skb, fraglen - pagedlen);
1f85851e17b64c Gao feng                 2012-03-19  1652  			skb_set_network_header(skb, exthdrlen);
1f85851e17b64c Gao feng                 2012-03-19  1653  			data += fragheaderlen;
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10  1654  			skb->transport_header = (skb->network_header +
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10  1655  						 fragheaderlen);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1656  			if (fraggap) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1657  				skb->csum = skb_copy_and_csum_bits(
^1da177e4c3f41 Linus Torvalds           2005-04-16  1658  					skb_prev, maxfraglen,
8d5930dfb7edbf Al Viro                  2020-07-10  1659  					data + transhdrlen, fraggap);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1660  				skb_prev->csum = csum_sub(skb_prev->csum,
^1da177e4c3f41 Linus Torvalds           2005-04-16  1661  							  skb->csum);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1662  				data += fraggap;
e9fa4f7bd291c2 Herbert Xu               2006-08-13  1663  				pskb_trim_unique(skb_prev, maxfraglen);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1664  			}
232cd35d0804cc Eric Dumazet             2017-05-19  1665  			if (copy > 0 &&
232cd35d0804cc Eric Dumazet             2017-05-19  1666  			    getfrag(from, data + transhdrlen, offset,
232cd35d0804cc Eric Dumazet             2017-05-19  1667  				    copy, fraggap, skb) < 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1668  				err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1669  				kfree_skb(skb);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1670  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1671  			}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1672  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1673  			offset += copy;
15e36f5b8e982d Willem de Bruijn         2018-04-26  1674  			length -= copy + transhdrlen;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1675  			transhdrlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1676  			exthdrlen = 0;
299b0767642a65 Steffen Klassert         2011-10-11  1677  			dst_exthdrlen = 0;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1678  
52900d22288e7d Willem de Bruijn         2018-11-30  1679  			/* Only the initial fragment is time stamped */
52900d22288e7d Willem de Bruijn         2018-11-30  1680  			skb_shinfo(skb)->tx_flags = cork->tx_flags;
52900d22288e7d Willem de Bruijn         2018-11-30  1681  			cork->tx_flags = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1682  			skb_shinfo(skb)->tskey = tskey;
52900d22288e7d Willem de Bruijn         2018-11-30  1683  			tskey = 0;
52900d22288e7d Willem de Bruijn         2018-11-30  1684  			skb_zcopy_set(skb, uarg, &extra_uref);
52900d22288e7d Willem de Bruijn         2018-11-30  1685  
0dec879f636f11 Julian Anastasov         2017-02-06  1686  			if ((flags & MSG_CONFIRM) && !skb_prev)
0dec879f636f11 Julian Anastasov         2017-02-06  1687  				skb_set_dst_pending_confirm(skb, 1);
0dec879f636f11 Julian Anastasov         2017-02-06  1688  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1689  			/*
^1da177e4c3f41 Linus Torvalds           2005-04-16  1690  			 * Put the packet on the pending queue
^1da177e4c3f41 Linus Torvalds           2005-04-16  1691  			 */
1f4c6eb2402968 Eric Dumazet             2018-03-31  1692  			if (!skb->destructor) {
1f4c6eb2402968 Eric Dumazet             2018-03-31  1693  				skb->destructor = sock_wfree;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1694  				skb->sk = sk;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1695  				wmem_alloc_delta += skb->truesize;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1696  			}
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1697  			__skb_queue_tail(queue, skb);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1698  			continue;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1699  		}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1700  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1701  		if (copy > length)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1702  			copy = length;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1703  
113f99c3358564 Willem de Bruijn         2018-05-17  1704  		if (!(rt->dst.dev->features&NETIF_F_SG) &&
113f99c3358564 Willem de Bruijn         2018-05-17  1705  		    skb_tailroom(skb) >= copy) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1706  			unsigned int off;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1707  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1708  			off = skb->len;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1709  			if (getfrag(from, skb_put(skb, copy),
^1da177e4c3f41 Linus Torvalds           2005-04-16  1710  						offset, copy, off, skb) < 0) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1711  				__skb_trim(skb, off);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1712  				err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1713  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1714  			}
b5947e5d1e710c Willem de Bruijn         2018-11-30  1715  		} else if (!uarg || !uarg->zerocopy) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1716  			int i = skb_shinfo(skb)->nr_frags;
5640f7685831e0 Eric Dumazet             2012-09-23  1717  
^1da177e4c3f41 Linus Torvalds           2005-04-16  1718  			err = -ENOMEM;
5640f7685831e0 Eric Dumazet             2012-09-23  1719  			if (!sk_page_frag_refill(sk, pfrag))
^1da177e4c3f41 Linus Torvalds           2005-04-16  1720  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1721  
5640f7685831e0 Eric Dumazet             2012-09-23  1722  			if (!skb_can_coalesce(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet             2012-09-23  1723  					      pfrag->offset)) {
^1da177e4c3f41 Linus Torvalds           2005-04-16  1724  				err = -EMSGSIZE;
5640f7685831e0 Eric Dumazet             2012-09-23  1725  				if (i == MAX_SKB_FRAGS)
^1da177e4c3f41 Linus Torvalds           2005-04-16  1726  					goto error;
5640f7685831e0 Eric Dumazet             2012-09-23  1727  
5640f7685831e0 Eric Dumazet             2012-09-23  1728  				__skb_fill_page_desc(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet             2012-09-23  1729  						     pfrag->offset, 0);
5640f7685831e0 Eric Dumazet             2012-09-23  1730  				skb_shinfo(skb)->nr_frags = ++i;
5640f7685831e0 Eric Dumazet             2012-09-23  1731  				get_page(pfrag->page);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1732  			}
5640f7685831e0 Eric Dumazet             2012-09-23  1733  			copy = min_t(int, copy, pfrag->size - pfrag->offset);
9e903e085262ff Eric Dumazet             2011-10-18  1734  			if (getfrag(from,
5640f7685831e0 Eric Dumazet             2012-09-23  1735  				    page_address(pfrag->page) + pfrag->offset,
5640f7685831e0 Eric Dumazet             2012-09-23  1736  				    offset, copy, skb->len, skb) < 0)
5640f7685831e0 Eric Dumazet             2012-09-23  1737  				goto error_efault;
5640f7685831e0 Eric Dumazet             2012-09-23  1738  
5640f7685831e0 Eric Dumazet             2012-09-23  1739  			pfrag->offset += copy;
5640f7685831e0 Eric Dumazet             2012-09-23  1740  			skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1741  			skb->len += copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1742  			skb->data_len += copy;
f945fa7ad9c12a Herbert Xu               2008-01-22  1743  			skb->truesize += copy;
1f4c6eb2402968 Eric Dumazet             2018-03-31  1744  			wmem_alloc_delta += copy;
b5947e5d1e710c Willem de Bruijn         2018-11-30  1745  		} else {
b5947e5d1e710c Willem de Bruijn         2018-11-30  1746  			err = skb_zerocopy_iter_dgram(skb, from, copy);
b5947e5d1e710c Willem de Bruijn         2018-11-30  1747  			if (err < 0)
b5947e5d1e710c Willem de Bruijn         2018-11-30  1748  				goto error;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1749  		}
^1da177e4c3f41 Linus Torvalds           2005-04-16  1750  		offset += copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1751  		length -= copy;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1752  	}
5640f7685831e0 Eric Dumazet             2012-09-23  1753  
9e8445a56c253f Paolo Abeni              2018-04-04  1754  	if (wmem_alloc_delta)
1f4c6eb2402968 Eric Dumazet             2018-03-31  1755  		refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1756  	return 0;
5640f7685831e0 Eric Dumazet             2012-09-23  1757  
5640f7685831e0 Eric Dumazet             2012-09-23  1758  error_efault:
5640f7685831e0 Eric Dumazet             2012-09-23  1759  	err = -EFAULT;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1760  error:
8e0449172497a9 Jonathan Lemon           2021-01-06  1761  	net_zcopy_put_abort(uarg, extra_uref);
bdc712b4c2baf9 David S. Miller          2011-05-06  1762  	cork->length -= length;
3bd653c8455bc7 Denis V. Lunev           2008-10-08  1763  	IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
1f4c6eb2402968 Eric Dumazet             2018-03-31  1764  	refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
^1da177e4c3f41 Linus Torvalds           2005-04-16  1765  	return err;
^1da177e4c3f41 Linus Torvalds           2005-04-16  1766  }
0bbe84a67b0b54 Vlad Yasevich            2015-01-31  1767  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 29663 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-01 10:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
2021-04-29 19:48 ` Sabrina Dubroca
2021-04-29 20:25   ` Jiri Bohac
2021-05-01 10:23     ` Sabrina Dubroca
2021-04-29 20:37 ` kernel test robot
2021-04-30  5:36 ` [RFC PATCH v2] " Jiri Bohac
2021-04-29 23:17 [RFC PATCH] " kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.