* [RFC PATCH] fix xfrm MTU regression
@ 2021-04-29 17:02 Jiri Bohac
2021-04-29 19:48 ` Sabrina Dubroca
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jiri Bohac @ 2021-04-29 17:02 UTC (permalink / raw)
To: Mike Maloney, Eric Dumazet, davem; +Cc: netdev, Steffen Klassert, Herbert Xu
Hi,
Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
A Packet Too Big ICMPv6 message received in response to an ESP
packet will prevent all further communication through the tunnel
if the reported MTU minus the ESP overhead is smaller than 1280.
E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
in all further packets in the tunnel dropped. A ping through the
tunnel fails with "ping: sendmsg: Invalid argument".
Apparently the MTU on the xfrm route is smaller than 1280 and
fails the check inside ip6_setup_cork() added by 749439bf.
We found this by debugging USGv6/ipv6ready failures. Failing
tests are: "Phase-2 Interoperability Test Scenario IPsec" /
5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
Below is my attempt to fix the situation by dropping the MTU
check and instead checking for the underflows described in the
749439bf commit message (without much understanding of the
details!). Does this make sense?:
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..8af6adb42c85 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
- return -EINVAL;
cork->base.fragsize = mtu;
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
@@ -1465,6 +1463,11 @@ static int __ip6_append_data(struct sock *sk,
fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
(opt ? opt->opt_nflen : 0);
+
+ if (mtu < fragheaderlen ||
+ ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
+ goto emsgsize;
+
maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
sizeof(struct frag_hdr);
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fix xfrm MTU regression
2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
@ 2021-04-29 19:48 ` Sabrina Dubroca
2021-04-29 20:25 ` Jiri Bohac
2021-04-29 20:37 ` kernel test robot
2021-04-30 5:36 ` [RFC PATCH v2] " Jiri Bohac
2 siblings, 1 reply; 7+ messages in thread
From: Sabrina Dubroca @ 2021-04-29 19:48 UTC (permalink / raw)
To: Jiri Bohac
Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert, Herbert Xu
2021-04-29, 19:02:54 +0200, Jiri Bohac wrote:
> Hi,
>
> Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
> sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
>
> A Packet Too Big ICMPv6 message received in response to an ESP
> packet will prevent all further communication through the tunnel
> if the reported MTU minus the ESP overhead is smaller than 1280.
>
> E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
> is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
> in all further packets in the tunnel dropped. A ping through the
> tunnel fails with "ping: sendmsg: Invalid argument".
>
> Apparently the MTU on the xfrm route is smaller than 1280 and
> fails the check inside ip6_setup_cork() added by 749439bf.
>
> We found this by debugging USGv6/ipv6ready failures. Failing
> tests are: "Phase-2 Interoperability Test Scenario IPsec" /
> 5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
should return at least 1280 for ipv6"), currently in Steffen's ipsec
tree:
https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276
--
Sabrina
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fix xfrm MTU regression
2021-04-29 19:48 ` Sabrina Dubroca
@ 2021-04-29 20:25 ` Jiri Bohac
2021-05-01 10:23 ` Sabrina Dubroca
0 siblings, 1 reply; 7+ messages in thread
From: Jiri Bohac @ 2021-04-29 20:25 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert, Herbert Xu
On Thu, Apr 29, 2021 at 09:48:09PM +0200, Sabrina Dubroca wrote:
> That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
> should return at least 1280 for ipv6"), currently in Steffen's ipsec
> tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276
Thanks, that is interesting! The patch makes my large (-s 1400) pings inside
ESP pass through a 1280-MTU link on an intermediary router but in a suboptimal
double-fragmented way. tcpdump on the router shows:
22:09:44.556452 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (0|1232) ESP(spi=0x00000001,seq=0xdd), length 1232
22:09:44.566269 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (1232|100)
22:09:44.566553 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0xde), length 276
I.e. the ping is fragmented into two ESP packets and the first ESP packet is then fragmented again.
The same pings with my patch come through in two fragments:
22:13:22.072934 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0x28), length 1236
22:13:22.073039 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0x29), length 356
I can do more tests if needed.
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fix xfrm MTU regression
2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
2021-04-29 19:48 ` Sabrina Dubroca
@ 2021-04-29 20:37 ` kernel test robot
2021-04-30 5:36 ` [RFC PATCH v2] " Jiri Bohac
2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-04-29 20:37 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 16600 bytes --]
Hi Jiri,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on linus/master]
[also build test WARNING on v5.12 next-20210429]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Jiri-Bohac/fix-xfrm-MTU-regression/20210430-010412
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git d72cd4ad4174cfd2257c426ad51e4f53bcfde9c9
config: x86_64-randconfig-a015-20210429 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9131a078901b00e68248a27a4f8c4b11bb1db1ae)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# https://github.com/0day-ci/linux/commit/f556543e005a1eb6567fc299e60f7d92dc508f88
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jiri-Bohac/fix-xfrm-MTU-regression/20210430-010412
git checkout f556543e005a1eb6567fc299e60f7d92dc508f88
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> net/ipv6/ip6_output.c:1467:6: warning: variable 'headersize' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (mtu < fragheaderlen ||
^~~~~~~~~~~~~~~~~~~~~~
net/ipv6/ip6_output.c:1501:27: note: uninitialized use occurs here
pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
^~~~~~~~~~
include/linux/minmax.h:118:48: note: expanded from macro 'max_t'
#define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
^
include/linux/minmax.h:44:14: note: expanded from macro '__careful_cmp'
__cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
^
include/linux/minmax.h:37:25: note: expanded from macro '__cmp_once'
typeof(x) unique_x = (x); \
^
net/ipv6/ip6_output.c:1467:2: note: remove the 'if' if its condition is always false
if (mtu < fragheaderlen ||
^~~~~~~~~~~~~~~~~~~~~~~~~~
>> net/ipv6/ip6_output.c:1467:6: warning: variable 'headersize' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
if (mtu < fragheaderlen ||
^~~~~~~~~~~~~~~~~~~
net/ipv6/ip6_output.c:1501:27: note: uninitialized use occurs here
pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
^~~~~~~~~~
include/linux/minmax.h:118:48: note: expanded from macro 'max_t'
#define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
^
include/linux/minmax.h:44:14: note: expanded from macro '__careful_cmp'
__cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
^
include/linux/minmax.h:37:25: note: expanded from macro '__cmp_once'
typeof(x) unique_x = (x); \
^
net/ipv6/ip6_output.c:1467:6: note: remove the '||' if its condition is always false
if (mtu < fragheaderlen ||
^~~~~~~~~~~~~~~~~~~~~~
net/ipv6/ip6_output.c:1444:41: note: initialize the variable 'headersize' to silence this warning
unsigned int maxnonfragsize, headersize;
^
= 0
2 warnings generated.
vim +1467 net/ipv6/ip6_output.c
1419
1420 static int __ip6_append_data(struct sock *sk,
1421 struct flowi6 *fl6,
1422 struct sk_buff_head *queue,
1423 struct inet_cork *cork,
1424 struct inet6_cork *v6_cork,
1425 struct page_frag *pfrag,
1426 int getfrag(void *from, char *to, int offset,
1427 int len, int odd, struct sk_buff *skb),
1428 void *from, int length, int transhdrlen,
1429 unsigned int flags, struct ipcm6_cookie *ipc6)
1430 {
1431 struct sk_buff *skb, *skb_prev = NULL;
1432 unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu;
1433 struct ubuf_info *uarg = NULL;
1434 int exthdrlen = 0;
1435 int dst_exthdrlen = 0;
1436 int hh_len;
1437 int copy;
1438 int err;
1439 int offset = 0;
1440 u32 tskey = 0;
1441 struct rt6_info *rt = (struct rt6_info *)cork->dst;
1442 struct ipv6_txoptions *opt = v6_cork->opt;
1443 int csummode = CHECKSUM_NONE;
1444 unsigned int maxnonfragsize, headersize;
1445 unsigned int wmem_alloc_delta = 0;
1446 bool paged, extra_uref = false;
1447
1448 skb = skb_peek_tail(queue);
1449 if (!skb) {
1450 exthdrlen = opt ? opt->opt_flen : 0;
1451 dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
1452 }
1453
1454 paged = !!cork->gso_size;
1455 mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
1456 orig_mtu = mtu;
1457
1458 if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
1459 sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
1460 tskey = sk->sk_tskey++;
1461
1462 hh_len = LL_RESERVED_SPACE(rt->dst.dev);
1463
1464 fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
1465 (opt ? opt->opt_nflen : 0);
1466
> 1467 if (mtu < fragheaderlen ||
1468 ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
1469 goto emsgsize;
1470
1471 maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
1472 sizeof(struct frag_hdr);
1473
1474 headersize = sizeof(struct ipv6hdr) +
1475 (opt ? opt->opt_flen + opt->opt_nflen : 0) +
1476 (dst_allfrag(&rt->dst) ?
1477 sizeof(struct frag_hdr) : 0) +
1478 rt->rt6i_nfheader_len;
1479
1480 /* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
1481 * the first fragment
1482 */
1483 if (headersize + transhdrlen > mtu)
1484 goto emsgsize;
1485
1486 if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
1487 (sk->sk_protocol == IPPROTO_UDP ||
1488 sk->sk_protocol == IPPROTO_RAW)) {
1489 ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
1490 sizeof(struct ipv6hdr));
1491 goto emsgsize;
1492 }
1493
1494 if (ip6_sk_ignore_df(sk))
1495 maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
1496 else
1497 maxnonfragsize = mtu;
1498
1499 if (cork->length + length > maxnonfragsize - headersize) {
1500 emsgsize:
1501 pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
1502 ipv6_local_error(sk, EMSGSIZE, fl6, pmtu);
1503 return -EMSGSIZE;
1504 }
1505
1506 /* CHECKSUM_PARTIAL only with no extension headers and when
1507 * we are not going to fragment
1508 */
1509 if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
1510 headersize == sizeof(struct ipv6hdr) &&
1511 length <= mtu - headersize &&
1512 (!(flags & MSG_MORE) || cork->gso_size) &&
1513 rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM))
1514 csummode = CHECKSUM_PARTIAL;
1515
1516 if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) {
1517 uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
1518 if (!uarg)
1519 return -ENOBUFS;
1520 extra_uref = !skb_zcopy(skb); /* only ref on new uarg */
1521 if (rt->dst.dev->features & NETIF_F_SG &&
1522 csummode == CHECKSUM_PARTIAL) {
1523 paged = true;
1524 } else {
1525 uarg->zerocopy = 0;
1526 skb_zcopy_set(skb, uarg, &extra_uref);
1527 }
1528 }
1529
1530 /*
1531 * Let's try using as much space as possible.
1532 * Use MTU if total length of the message fits into the MTU.
1533 * Otherwise, we need to reserve fragment header and
1534 * fragment alignment (= 8-15 octects, in total).
1535 *
1536 * Note that we may need to "move" the data from the tail
1537 * of the buffer to the new fragment when we split
1538 * the message.
1539 *
1540 * FIXME: It may be fragmented into multiple chunks
1541 * at once if non-fragmentable extension headers
1542 * are too large.
1543 * --yoshfuji
1544 */
1545
1546 cork->length += length;
1547 if (!skb)
1548 goto alloc_new_skb;
1549
1550 while (length > 0) {
1551 /* Check if the remaining data fits into current packet. */
1552 copy = (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - skb->len;
1553 if (copy < length)
1554 copy = maxfraglen - skb->len;
1555
1556 if (copy <= 0) {
1557 char *data;
1558 unsigned int datalen;
1559 unsigned int fraglen;
1560 unsigned int fraggap;
1561 unsigned int alloclen;
1562 unsigned int pagedlen;
1563 alloc_new_skb:
1564 /* There's no room in the current skb */
1565 if (skb)
1566 fraggap = skb->len - maxfraglen;
1567 else
1568 fraggap = 0;
1569 /* update mtu and maxfraglen if necessary */
1570 if (!skb || !skb_prev)
1571 ip6_append_data_mtu(&mtu, &maxfraglen,
1572 fragheaderlen, skb, rt,
1573 orig_mtu);
1574
1575 skb_prev = skb;
1576
1577 /*
1578 * If remaining data exceeds the mtu,
1579 * we know we need more fragment(s).
1580 */
1581 datalen = length + fraggap;
1582
1583 if (datalen > (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - fragheaderlen)
1584 datalen = maxfraglen - fragheaderlen - rt->dst.trailer_len;
1585 fraglen = datalen + fragheaderlen;
1586 pagedlen = 0;
1587
1588 if ((flags & MSG_MORE) &&
1589 !(rt->dst.dev->features&NETIF_F_SG))
1590 alloclen = mtu;
1591 else if (!paged)
1592 alloclen = fraglen;
1593 else {
1594 alloclen = min_t(int, fraglen, MAX_HEADER);
1595 pagedlen = fraglen - alloclen;
1596 }
1597
1598 alloclen += dst_exthdrlen;
1599
1600 if (datalen != length + fraggap) {
1601 /*
1602 * this is not the last fragment, the trailer
1603 * space is regarded as data space.
1604 */
1605 datalen += rt->dst.trailer_len;
1606 }
1607
1608 alloclen += rt->dst.trailer_len;
1609 fraglen = datalen + fragheaderlen;
1610
1611 /*
1612 * We just reserve space for fragment header.
1613 * Note: this may be overallocation if the message
1614 * (without MSG_MORE) fits into the MTU.
1615 */
1616 alloclen += sizeof(struct frag_hdr);
1617
1618 copy = datalen - transhdrlen - fraggap - pagedlen;
1619 if (copy < 0) {
1620 err = -EINVAL;
1621 goto error;
1622 }
1623 if (transhdrlen) {
1624 skb = sock_alloc_send_skb(sk,
1625 alloclen + hh_len,
1626 (flags & MSG_DONTWAIT), &err);
1627 } else {
1628 skb = NULL;
1629 if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
1630 2 * sk->sk_sndbuf)
1631 skb = alloc_skb(alloclen + hh_len,
1632 sk->sk_allocation);
1633 if (unlikely(!skb))
1634 err = -ENOBUFS;
1635 }
1636 if (!skb)
1637 goto error;
1638 /*
1639 * Fill in the control structures
1640 */
1641 skb->protocol = htons(ETH_P_IPV6);
1642 skb->ip_summed = csummode;
1643 skb->csum = 0;
1644 /* reserve for fragmentation and ipsec header */
1645 skb_reserve(skb, hh_len + sizeof(struct frag_hdr) +
1646 dst_exthdrlen);
1647
1648 /*
1649 * Find where to start putting bytes
1650 */
1651 data = skb_put(skb, fraglen - pagedlen);
1652 skb_set_network_header(skb, exthdrlen);
1653 data += fragheaderlen;
1654 skb->transport_header = (skb->network_header +
1655 fragheaderlen);
1656 if (fraggap) {
1657 skb->csum = skb_copy_and_csum_bits(
1658 skb_prev, maxfraglen,
1659 data + transhdrlen, fraggap);
1660 skb_prev->csum = csum_sub(skb_prev->csum,
1661 skb->csum);
1662 data += fraggap;
1663 pskb_trim_unique(skb_prev, maxfraglen);
1664 }
1665 if (copy > 0 &&
1666 getfrag(from, data + transhdrlen, offset,
1667 copy, fraggap, skb) < 0) {
1668 err = -EFAULT;
1669 kfree_skb(skb);
1670 goto error;
1671 }
1672
1673 offset += copy;
1674 length -= copy + transhdrlen;
1675 transhdrlen = 0;
1676 exthdrlen = 0;
1677 dst_exthdrlen = 0;
1678
1679 /* Only the initial fragment is time stamped */
1680 skb_shinfo(skb)->tx_flags = cork->tx_flags;
1681 cork->tx_flags = 0;
1682 skb_shinfo(skb)->tskey = tskey;
1683 tskey = 0;
1684 skb_zcopy_set(skb, uarg, &extra_uref);
1685
1686 if ((flags & MSG_CONFIRM) && !skb_prev)
1687 skb_set_dst_pending_confirm(skb, 1);
1688
1689 /*
1690 * Put the packet on the pending queue
1691 */
1692 if (!skb->destructor) {
1693 skb->destructor = sock_wfree;
1694 skb->sk = sk;
1695 wmem_alloc_delta += skb->truesize;
1696 }
1697 __skb_queue_tail(queue, skb);
1698 continue;
1699 }
1700
1701 if (copy > length)
1702 copy = length;
1703
1704 if (!(rt->dst.dev->features&NETIF_F_SG) &&
1705 skb_tailroom(skb) >= copy) {
1706 unsigned int off;
1707
1708 off = skb->len;
1709 if (getfrag(from, skb_put(skb, copy),
1710 offset, copy, off, skb) < 0) {
1711 __skb_trim(skb, off);
1712 err = -EFAULT;
1713 goto error;
1714 }
1715 } else if (!uarg || !uarg->zerocopy) {
1716 int i = skb_shinfo(skb)->nr_frags;
1717
1718 err = -ENOMEM;
1719 if (!sk_page_frag_refill(sk, pfrag))
1720 goto error;
1721
1722 if (!skb_can_coalesce(skb, i, pfrag->page,
1723 pfrag->offset)) {
1724 err = -EMSGSIZE;
1725 if (i == MAX_SKB_FRAGS)
1726 goto error;
1727
1728 __skb_fill_page_desc(skb, i, pfrag->page,
1729 pfrag->offset, 0);
1730 skb_shinfo(skb)->nr_frags = ++i;
1731 get_page(pfrag->page);
1732 }
1733 copy = min_t(int, copy, pfrag->size - pfrag->offset);
1734 if (getfrag(from,
1735 page_address(pfrag->page) + pfrag->offset,
1736 offset, copy, skb->len, skb) < 0)
1737 goto error_efault;
1738
1739 pfrag->offset += copy;
1740 skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
1741 skb->len += copy;
1742 skb->data_len += copy;
1743 skb->truesize += copy;
1744 wmem_alloc_delta += copy;
1745 } else {
1746 err = skb_zerocopy_iter_dgram(skb, from, copy);
1747 if (err < 0)
1748 goto error;
1749 }
1750 offset += copy;
1751 length -= copy;
1752 }
1753
1754 if (wmem_alloc_delta)
1755 refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
1756 return 0;
1757
1758 error_efault:
1759 err = -EFAULT;
1760 error:
1761 net_zcopy_put_abort(uarg, extra_uref);
1762 cork->length -= length;
1763 IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
1764 refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
1765 return err;
1766 }
1767
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33314 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH v2] fix xfrm MTU regression
2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
2021-04-29 19:48 ` Sabrina Dubroca
2021-04-29 20:37 ` kernel test robot
@ 2021-04-30 5:36 ` Jiri Bohac
2 siblings, 0 replies; 7+ messages in thread
From: Jiri Bohac @ 2021-04-30 5:36 UTC (permalink / raw)
To: Mike Maloney, Eric Dumazet, davem; +Cc: netdev, Steffen Klassert, Herbert Xu
On Thu, Apr 29, 2021 at 07:02:55PM +0200, Jiri Bohac wrote:
> Below is my attempt to fix the situation by dropping the MTU
> check and instead checking for the underflows described in the
> 749439bf commit message (without much understanding of the
> details!). Does this make sense?:
the first version left headersize uninitialized in the error
path; v2 below fixes this.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..171eb4ec1e67 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
- return -EINVAL;
cork->base.fragsize = mtu;
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
@@ -1465,8 +1463,6 @@ static int __ip6_append_data(struct sock *sk,
fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
(opt ? opt->opt_nflen : 0);
- maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
- sizeof(struct frag_hdr);
headersize = sizeof(struct ipv6hdr) +
(opt ? opt->opt_flen + opt->opt_nflen : 0) +
@@ -1474,6 +1470,13 @@ static int __ip6_append_data(struct sock *sk,
sizeof(struct frag_hdr) : 0) +
rt->rt6i_nfheader_len;
+ if (mtu < fragheaderlen ||
+ ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
+ goto emsgsize;
+
+ maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
+ sizeof(struct frag_hdr);
+
/* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
* the first fragment
*/
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fix xfrm MTU regression
2021-04-29 20:25 ` Jiri Bohac
@ 2021-05-01 10:23 ` Sabrina Dubroca
0 siblings, 0 replies; 7+ messages in thread
From: Sabrina Dubroca @ 2021-05-01 10:23 UTC (permalink / raw)
To: Jiri Bohac
Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert, Herbert Xu
2021-04-29, 22:25:29 +0200, Jiri Bohac wrote:
> On Thu, Apr 29, 2021 at 09:48:09PM +0200, Sabrina Dubroca wrote:
> > That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
> > should return at least 1280 for ipv6"), currently in Steffen's ipsec
> > tree:
> > https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276
>
> Thanks, that is interesting! The patch makes my large (-s 1400) pings inside
> ESP pass through a 1280-MTU link on an intermediary router but in a suboptimal
> double-fragmented way. tcpdump on the router shows:
>
> 22:09:44.556452 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (0|1232) ESP(spi=0x00000001,seq=0xdd), length 1232
> 22:09:44.566269 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (1232|100)
> 22:09:44.566553 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0xde), length 276
>
> I.e. the ping is fragmented into two ESP packets and the first ESP packet is then fragmented again.
It's a bit ugly, but I don't think we can do any better. We're going
through the stack twice in tunnel mode. The first pass (before xfrm)
we fragment according to the PMTU (adjusted to IPV6_MIN_MTU, because
MTUs lower than that are illegal in IPv6). The second time (after
xfrm), the first ESP packet is too big so we fragment it. This
behavior is consistent with a vti device running over a network with
MTU=1280 (which doesn't seem to work without my patch).
In transport mode, we're only going through the stack once, so we
don't see this double fragmentation.
I think my patch is correct, because without it we have IPv6 dsts
going around the kernel with an associated MTU smaller than
IPV6_MIN_MTU.
--
Sabrina
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] fix xfrm MTU regression
@ 2021-04-29 23:17 kernel test robot
0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-04-29 23:17 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 31458 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210429170254.5grfgsz2hgy2qjhk@dwarf.suse.cz>
References: <20210429170254.5grfgsz2hgy2qjhk@dwarf.suse.cz>
TO: Jiri Bohac <jbohac@suse.cz>
Hi Jiri,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on linus/master]
[also build test WARNING on v5.12 next-20210429]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Jiri-Bohac/fix-xfrm-MTU-regression/20210430-010412
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git d72cd4ad4174cfd2257c426ad51e4f53bcfde9c9
:::::: branch date: 6 hours ago
:::::: commit date: 6 hours ago
config: i386-randconfig-m021-20210429 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
net/ipv6/ip6_output.c:1501 __ip6_append_data() error: uninitialized symbol 'headersize'.
Old smatch warnings:
net/ipv6/ip6_output.c:292 ip6_xmit() error: we previously assumed 'np' could be null (see line 286)
vim +/headersize +1501 net/ipv6/ip6_output.c
366e41d9774d70 Vlad Yasevich 2015-01-31 1419
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1420 static int __ip6_append_data(struct sock *sk,
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1421 struct flowi6 *fl6,
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1422 struct sk_buff_head *queue,
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1423 struct inet_cork *cork,
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1424 struct inet6_cork *v6_cork,
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1425 struct page_frag *pfrag,
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1426 int getfrag(void *from, char *to, int offset,
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1427 int len, int odd, struct sk_buff *skb),
366e41d9774d70 Vlad Yasevich 2015-01-31 1428 void *from, int length, int transhdrlen,
5fdaa88dfefa87 Willem de Bruijn 2018-07-06 1429 unsigned int flags, struct ipcm6_cookie *ipc6)
366e41d9774d70 Vlad Yasevich 2015-01-31 1430 {
366e41d9774d70 Vlad Yasevich 2015-01-31 1431 struct sk_buff *skb, *skb_prev = NULL;
10b8a3de603df7 Paolo Abeni 2018-03-23 1432 unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu;
b5947e5d1e710c Willem de Bruijn 2018-11-30 1433 struct ubuf_info *uarg = NULL;
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1434 int exthdrlen = 0;
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1435 int dst_exthdrlen = 0;
366e41d9774d70 Vlad Yasevich 2015-01-31 1436 int hh_len;
366e41d9774d70 Vlad Yasevich 2015-01-31 1437 int copy;
366e41d9774d70 Vlad Yasevich 2015-01-31 1438 int err;
366e41d9774d70 Vlad Yasevich 2015-01-31 1439 int offset = 0;
366e41d9774d70 Vlad Yasevich 2015-01-31 1440 u32 tskey = 0;
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1441 struct rt6_info *rt = (struct rt6_info *)cork->dst;
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1442 struct ipv6_txoptions *opt = v6_cork->opt;
32dce968dd987a Vlad Yasevich 2015-01-31 1443 int csummode = CHECKSUM_NONE;
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1444 unsigned int maxnonfragsize, headersize;
1f4c6eb2402968 Eric Dumazet 2018-03-31 1445 unsigned int wmem_alloc_delta = 0;
100f6d8e09905c Willem de Bruijn 2019-05-30 1446 bool paged, extra_uref = false;
366e41d9774d70 Vlad Yasevich 2015-01-31 1447
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1448 skb = skb_peek_tail(queue);
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1449 if (!skb) {
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1450 exthdrlen = opt ? opt->opt_flen : 0;
7efdba5bd9a2f3 Romain KUNTZ 2013-01-16 1451 dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1452 }
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1453
15e36f5b8e982d Willem de Bruijn 2018-04-26 1454 paged = !!cork->gso_size;
bec1f6f697362c Willem de Bruijn 2018-04-26 1455 mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
e367c2d03dba4c lucien 2014-03-17 1456 orig_mtu = mtu;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1457
678ca42d688534 Willem de Bruijn 2018-07-06 1458 if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
678ca42d688534 Willem de Bruijn 2018-07-06 1459 sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
678ca42d688534 Willem de Bruijn 2018-07-06 1460 tskey = sk->sk_tskey++;
678ca42d688534 Willem de Bruijn 2018-07-06 1461
d8d1f30b95a635 Changli Gao 2010-06-10 1462 hh_len = LL_RESERVED_SPACE(rt->dst.dev);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1463
a1b051405bc162 Masahide NAKAMURA 2007-12-20 1464 fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
b4ce92775c2e7f Herbert Xu 2007-11-13 1465 (opt ? opt->opt_nflen : 0);
f556543e005a1e Jiri Bohac 2021-04-29 1466
f556543e005a1e Jiri Bohac 2021-04-29 1467 if (mtu < fragheaderlen ||
f556543e005a1e Jiri Bohac 2021-04-29 1468 ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
f556543e005a1e Jiri Bohac 2021-04-29 1469 goto emsgsize;
f556543e005a1e Jiri Bohac 2021-04-29 1470
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1471 maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1472 sizeof(struct frag_hdr);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1473
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1474 headersize = sizeof(struct ipv6hdr) +
3a1cebe7e05027 Hannes Frederic Sowa 2014-05-11 1475 (opt ? opt->opt_flen + opt->opt_nflen : 0) +
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1476 (dst_allfrag(&rt->dst) ?
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1477 sizeof(struct frag_hdr) : 0) +
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1478 rt->rt6i_nfheader_len;
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1479
10b8a3de603df7 Paolo Abeni 2018-03-23 1480 /* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
10b8a3de603df7 Paolo Abeni 2018-03-23 1481 * the first fragment
10b8a3de603df7 Paolo Abeni 2018-03-23 1482 */
10b8a3de603df7 Paolo Abeni 2018-03-23 1483 if (headersize + transhdrlen > mtu)
10b8a3de603df7 Paolo Abeni 2018-03-23 1484 goto emsgsize;
10b8a3de603df7 Paolo Abeni 2018-03-23 1485
26879da58711aa Wei Wang 2016-05-02 1486 if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1487 (sk->sk_protocol == IPPROTO_UDP ||
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1488 sk->sk_protocol == IPPROTO_RAW)) {
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1489 ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1490 sizeof(struct ipv6hdr));
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1491 goto emsgsize;
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1492 }
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1493
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1494 if (ip6_sk_ignore_df(sk))
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1495 maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1496 else
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1497 maxnonfragsize = mtu;
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1498
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1499 if (cork->length + length > maxnonfragsize - headersize) {
4df98e76cde7c6 Hannes Frederic Sowa 2013-12-16 1500 emsgsize:
10b8a3de603df7 Paolo Abeni 2018-03-23 @1501 pmtu = max_t(int, mtu - headersize + sizeof(struct ipv6hdr), 0);
10b8a3de603df7 Paolo Abeni 2018-03-23 1502 ipv6_local_error(sk, EMSGSIZE, fl6, pmtu);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1503 return -EMSGSIZE;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1504 }
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1505
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1506 /* CHECKSUM_PARTIAL only with no extension headers and when
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1507 * we are not going to fragment
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1508 */
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1509 if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1510 headersize == sizeof(struct ipv6hdr) &&
2b89ed65a6f201 Vlad Yasevich 2017-01-29 1511 length <= mtu - headersize &&
bec1f6f697362c Willem de Bruijn 2018-04-26 1512 (!(flags & MSG_MORE) || cork->gso_size) &&
c8cd0989bd151f Tom Herbert 2015-12-14 1513 rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM))
682b1a9d3f9686 Hannes Frederic Sowa 2015-10-27 1514 csummode = CHECKSUM_PARTIAL;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1515
b5947e5d1e710c Willem de Bruijn 2018-11-30 1516 if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) {
8c793822c5803e Jonathan Lemon 2021-01-06 1517 uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
b5947e5d1e710c Willem de Bruijn 2018-11-30 1518 if (!uarg)
b5947e5d1e710c Willem de Bruijn 2018-11-30 1519 return -ENOBUFS;
522924b583082f Willem de Bruijn 2019-06-07 1520 extra_uref = !skb_zcopy(skb); /* only ref on new uarg */
b5947e5d1e710c Willem de Bruijn 2018-11-30 1521 if (rt->dst.dev->features & NETIF_F_SG &&
b5947e5d1e710c Willem de Bruijn 2018-11-30 1522 csummode == CHECKSUM_PARTIAL) {
b5947e5d1e710c Willem de Bruijn 2018-11-30 1523 paged = true;
b5947e5d1e710c Willem de Bruijn 2018-11-30 1524 } else {
b5947e5d1e710c Willem de Bruijn 2018-11-30 1525 uarg->zerocopy = 0;
52900d22288e7d Willem de Bruijn 2018-11-30 1526 skb_zcopy_set(skb, uarg, &extra_uref);
b5947e5d1e710c Willem de Bruijn 2018-11-30 1527 }
b5947e5d1e710c Willem de Bruijn 2018-11-30 1528 }
b5947e5d1e710c Willem de Bruijn 2018-11-30 1529
^1da177e4c3f41 Linus Torvalds 2005-04-16 1530 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1531 * Let's try using as much space as possible.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1532 * Use MTU if total length of the message fits into the MTU.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1533 * Otherwise, we need to reserve fragment header and
^1da177e4c3f41 Linus Torvalds 2005-04-16 1534 * fragment alignment (= 8-15 octects, in total).
^1da177e4c3f41 Linus Torvalds 2005-04-16 1535 *
634a63e73f0594 Randy Dunlap 2020-09-17 1536 * Note that we may need to "move" the data from the tail
^1da177e4c3f41 Linus Torvalds 2005-04-16 1537 * of the buffer to the new fragment when we split
^1da177e4c3f41 Linus Torvalds 2005-04-16 1538 * the message.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1539 *
^1da177e4c3f41 Linus Torvalds 2005-04-16 1540 * FIXME: It may be fragmented into multiple chunks
^1da177e4c3f41 Linus Torvalds 2005-04-16 1541 * at once if non-fragmentable extension headers
^1da177e4c3f41 Linus Torvalds 2005-04-16 1542 * are too large.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1543 * --yoshfuji
^1da177e4c3f41 Linus Torvalds 2005-04-16 1544 */
^1da177e4c3f41 Linus Torvalds 2005-04-16 1545
2811ebac2521ce Hannes Frederic Sowa 2013-09-21 1546 cork->length += length;
2811ebac2521ce Hannes Frederic Sowa 2013-09-21 1547 if (!skb)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1548 goto alloc_new_skb;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1549
^1da177e4c3f41 Linus Torvalds 2005-04-16 1550 while (length > 0) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1551 /* Check if the remaining data fits into current packet. */
bdc712b4c2baf9 David S. Miller 2011-05-06 1552 copy = (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - skb->len;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1553 if (copy < length)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1554 copy = maxfraglen - skb->len;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1555
^1da177e4c3f41 Linus Torvalds 2005-04-16 1556 if (copy <= 0) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1557 char *data;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1558 unsigned int datalen;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1559 unsigned int fraglen;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1560 unsigned int fraggap;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1561 unsigned int alloclen;
aba36930a35e7f Willem de Bruijn 2018-11-24 1562 unsigned int pagedlen;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1563 alloc_new_skb:
^1da177e4c3f41 Linus Torvalds 2005-04-16 1564 /* There's no room in the current skb */
0c1833797a5a6e Gao feng 2012-05-26 1565 if (skb)
0c1833797a5a6e Gao feng 2012-05-26 1566 fraggap = skb->len - maxfraglen;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1567 else
^1da177e4c3f41 Linus Torvalds 2005-04-16 1568 fraggap = 0;
0c1833797a5a6e Gao feng 2012-05-26 1569 /* update mtu and maxfraglen if necessary */
63159f29be1df7 Ian Morris 2015-03-29 1570 if (!skb || !skb_prev)
0c1833797a5a6e Gao feng 2012-05-26 1571 ip6_append_data_mtu(&mtu, &maxfraglen,
75a493e60ac4bb Hannes Frederic Sowa 2013-07-02 1572 fragheaderlen, skb, rt,
e367c2d03dba4c lucien 2014-03-17 1573 orig_mtu);
0c1833797a5a6e Gao feng 2012-05-26 1574
0c1833797a5a6e Gao feng 2012-05-26 1575 skb_prev = skb;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1576
^1da177e4c3f41 Linus Torvalds 2005-04-16 1577 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1578 * If remaining data exceeds the mtu,
^1da177e4c3f41 Linus Torvalds 2005-04-16 1579 * we know we need more fragment(s).
^1da177e4c3f41 Linus Torvalds 2005-04-16 1580 */
^1da177e4c3f41 Linus Torvalds 2005-04-16 1581 datalen = length + fraggap;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1582
0c1833797a5a6e Gao feng 2012-05-26 1583 if (datalen > (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - fragheaderlen)
0c1833797a5a6e Gao feng 2012-05-26 1584 datalen = maxfraglen - fragheaderlen - rt->dst.trailer_len;
15e36f5b8e982d Willem de Bruijn 2018-04-26 1585 fraglen = datalen + fragheaderlen;
aba36930a35e7f Willem de Bruijn 2018-11-24 1586 pagedlen = 0;
15e36f5b8e982d Willem de Bruijn 2018-04-26 1587
^1da177e4c3f41 Linus Torvalds 2005-04-16 1588 if ((flags & MSG_MORE) &&
d8d1f30b95a635 Changli Gao 2010-06-10 1589 !(rt->dst.dev->features&NETIF_F_SG))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1590 alloclen = mtu;
15e36f5b8e982d Willem de Bruijn 2018-04-26 1591 else if (!paged)
15e36f5b8e982d Willem de Bruijn 2018-04-26 1592 alloclen = fraglen;
15e36f5b8e982d Willem de Bruijn 2018-04-26 1593 else {
15e36f5b8e982d Willem de Bruijn 2018-04-26 1594 alloclen = min_t(int, fraglen, MAX_HEADER);
15e36f5b8e982d Willem de Bruijn 2018-04-26 1595 pagedlen = fraglen - alloclen;
15e36f5b8e982d Willem de Bruijn 2018-04-26 1596 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1597
299b0767642a65 Steffen Klassert 2011-10-11 1598 alloclen += dst_exthdrlen;
299b0767642a65 Steffen Klassert 2011-10-11 1599
0c1833797a5a6e Gao feng 2012-05-26 1600 if (datalen != length + fraggap) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1601 /*
0c1833797a5a6e Gao feng 2012-05-26 1602 * this is not the last fragment, the trailer
0c1833797a5a6e Gao feng 2012-05-26 1603 * space is regarded as data space.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1604 */
0c1833797a5a6e Gao feng 2012-05-26 1605 datalen += rt->dst.trailer_len;
0c1833797a5a6e Gao feng 2012-05-26 1606 }
0c1833797a5a6e Gao feng 2012-05-26 1607
d8d1f30b95a635 Changli Gao 2010-06-10 1608 alloclen += rt->dst.trailer_len;
0c1833797a5a6e Gao feng 2012-05-26 1609 fraglen = datalen + fragheaderlen;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1610
^1da177e4c3f41 Linus Torvalds 2005-04-16 1611 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1612 * We just reserve space for fragment header.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1613 * Note: this may be overallocation if the message
^1da177e4c3f41 Linus Torvalds 2005-04-16 1614 * (without MSG_MORE) fits into the MTU.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1615 */
^1da177e4c3f41 Linus Torvalds 2005-04-16 1616 alloclen += sizeof(struct frag_hdr);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1617
15e36f5b8e982d Willem de Bruijn 2018-04-26 1618 copy = datalen - transhdrlen - fraggap - pagedlen;
232cd35d0804cc Eric Dumazet 2017-05-19 1619 if (copy < 0) {
232cd35d0804cc Eric Dumazet 2017-05-19 1620 err = -EINVAL;
232cd35d0804cc Eric Dumazet 2017-05-19 1621 goto error;
232cd35d0804cc Eric Dumazet 2017-05-19 1622 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1623 if (transhdrlen) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1624 skb = sock_alloc_send_skb(sk,
^1da177e4c3f41 Linus Torvalds 2005-04-16 1625 alloclen + hh_len,
^1da177e4c3f41 Linus Torvalds 2005-04-16 1626 (flags & MSG_DONTWAIT), &err);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1627 } else {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1628 skb = NULL;
1f4c6eb2402968 Eric Dumazet 2018-03-31 1629 if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
^1da177e4c3f41 Linus Torvalds 2005-04-16 1630 2 * sk->sk_sndbuf)
1f4c6eb2402968 Eric Dumazet 2018-03-31 1631 skb = alloc_skb(alloclen + hh_len,
^1da177e4c3f41 Linus Torvalds 2005-04-16 1632 sk->sk_allocation);
63159f29be1df7 Ian Morris 2015-03-29 1633 if (unlikely(!skb))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1634 err = -ENOBUFS;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1635 }
63159f29be1df7 Ian Morris 2015-03-29 1636 if (!skb)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1637 goto error;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1638 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1639 * Fill in the control structures
^1da177e4c3f41 Linus Torvalds 2005-04-16 1640 */
9c9c9ad5fae7e9 Hannes Frederic Sowa 2013-08-26 1641 skb->protocol = htons(ETH_P_IPV6);
32dce968dd987a Vlad Yasevich 2015-01-31 1642 skb->ip_summed = csummode;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1643 skb->csum = 0;
1f85851e17b64c Gao feng 2012-03-19 1644 /* reserve for fragmentation and ipsec header */
1f85851e17b64c Gao feng 2012-03-19 1645 skb_reserve(skb, hh_len + sizeof(struct frag_hdr) +
1f85851e17b64c Gao feng 2012-03-19 1646 dst_exthdrlen);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1647
^1da177e4c3f41 Linus Torvalds 2005-04-16 1648 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1649 * Find where to start putting bytes
^1da177e4c3f41 Linus Torvalds 2005-04-16 1650 */
15e36f5b8e982d Willem de Bruijn 2018-04-26 1651 data = skb_put(skb, fraglen - pagedlen);
1f85851e17b64c Gao feng 2012-03-19 1652 skb_set_network_header(skb, exthdrlen);
1f85851e17b64c Gao feng 2012-03-19 1653 data += fragheaderlen;
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10 1654 skb->transport_header = (skb->network_header +
b0e380b1d8a8e0 Arnaldo Carvalho de Melo 2007-04-10 1655 fragheaderlen);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1656 if (fraggap) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1657 skb->csum = skb_copy_and_csum_bits(
^1da177e4c3f41 Linus Torvalds 2005-04-16 1658 skb_prev, maxfraglen,
8d5930dfb7edbf Al Viro 2020-07-10 1659 data + transhdrlen, fraggap);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1660 skb_prev->csum = csum_sub(skb_prev->csum,
^1da177e4c3f41 Linus Torvalds 2005-04-16 1661 skb->csum);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1662 data += fraggap;
e9fa4f7bd291c2 Herbert Xu 2006-08-13 1663 pskb_trim_unique(skb_prev, maxfraglen);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1664 }
232cd35d0804cc Eric Dumazet 2017-05-19 1665 if (copy > 0 &&
232cd35d0804cc Eric Dumazet 2017-05-19 1666 getfrag(from, data + transhdrlen, offset,
232cd35d0804cc Eric Dumazet 2017-05-19 1667 copy, fraggap, skb) < 0) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1668 err = -EFAULT;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1669 kfree_skb(skb);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1670 goto error;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1671 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1672
^1da177e4c3f41 Linus Torvalds 2005-04-16 1673 offset += copy;
15e36f5b8e982d Willem de Bruijn 2018-04-26 1674 length -= copy + transhdrlen;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1675 transhdrlen = 0;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1676 exthdrlen = 0;
299b0767642a65 Steffen Klassert 2011-10-11 1677 dst_exthdrlen = 0;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1678
52900d22288e7d Willem de Bruijn 2018-11-30 1679 /* Only the initial fragment is time stamped */
52900d22288e7d Willem de Bruijn 2018-11-30 1680 skb_shinfo(skb)->tx_flags = cork->tx_flags;
52900d22288e7d Willem de Bruijn 2018-11-30 1681 cork->tx_flags = 0;
52900d22288e7d Willem de Bruijn 2018-11-30 1682 skb_shinfo(skb)->tskey = tskey;
52900d22288e7d Willem de Bruijn 2018-11-30 1683 tskey = 0;
52900d22288e7d Willem de Bruijn 2018-11-30 1684 skb_zcopy_set(skb, uarg, &extra_uref);
52900d22288e7d Willem de Bruijn 2018-11-30 1685
0dec879f636f11 Julian Anastasov 2017-02-06 1686 if ((flags & MSG_CONFIRM) && !skb_prev)
0dec879f636f11 Julian Anastasov 2017-02-06 1687 skb_set_dst_pending_confirm(skb, 1);
0dec879f636f11 Julian Anastasov 2017-02-06 1688
^1da177e4c3f41 Linus Torvalds 2005-04-16 1689 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1690 * Put the packet on the pending queue
^1da177e4c3f41 Linus Torvalds 2005-04-16 1691 */
1f4c6eb2402968 Eric Dumazet 2018-03-31 1692 if (!skb->destructor) {
1f4c6eb2402968 Eric Dumazet 2018-03-31 1693 skb->destructor = sock_wfree;
1f4c6eb2402968 Eric Dumazet 2018-03-31 1694 skb->sk = sk;
1f4c6eb2402968 Eric Dumazet 2018-03-31 1695 wmem_alloc_delta += skb->truesize;
1f4c6eb2402968 Eric Dumazet 2018-03-31 1696 }
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1697 __skb_queue_tail(queue, skb);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1698 continue;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1699 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1700
^1da177e4c3f41 Linus Torvalds 2005-04-16 1701 if (copy > length)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1702 copy = length;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1703
113f99c3358564 Willem de Bruijn 2018-05-17 1704 if (!(rt->dst.dev->features&NETIF_F_SG) &&
113f99c3358564 Willem de Bruijn 2018-05-17 1705 skb_tailroom(skb) >= copy) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1706 unsigned int off;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1707
^1da177e4c3f41 Linus Torvalds 2005-04-16 1708 off = skb->len;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1709 if (getfrag(from, skb_put(skb, copy),
^1da177e4c3f41 Linus Torvalds 2005-04-16 1710 offset, copy, off, skb) < 0) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1711 __skb_trim(skb, off);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1712 err = -EFAULT;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1713 goto error;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1714 }
b5947e5d1e710c Willem de Bruijn 2018-11-30 1715 } else if (!uarg || !uarg->zerocopy) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1716 int i = skb_shinfo(skb)->nr_frags;
5640f7685831e0 Eric Dumazet 2012-09-23 1717
^1da177e4c3f41 Linus Torvalds 2005-04-16 1718 err = -ENOMEM;
5640f7685831e0 Eric Dumazet 2012-09-23 1719 if (!sk_page_frag_refill(sk, pfrag))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1720 goto error;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1721
5640f7685831e0 Eric Dumazet 2012-09-23 1722 if (!skb_can_coalesce(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet 2012-09-23 1723 pfrag->offset)) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1724 err = -EMSGSIZE;
5640f7685831e0 Eric Dumazet 2012-09-23 1725 if (i == MAX_SKB_FRAGS)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1726 goto error;
5640f7685831e0 Eric Dumazet 2012-09-23 1727
5640f7685831e0 Eric Dumazet 2012-09-23 1728 __skb_fill_page_desc(skb, i, pfrag->page,
5640f7685831e0 Eric Dumazet 2012-09-23 1729 pfrag->offset, 0);
5640f7685831e0 Eric Dumazet 2012-09-23 1730 skb_shinfo(skb)->nr_frags = ++i;
5640f7685831e0 Eric Dumazet 2012-09-23 1731 get_page(pfrag->page);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1732 }
5640f7685831e0 Eric Dumazet 2012-09-23 1733 copy = min_t(int, copy, pfrag->size - pfrag->offset);
9e903e085262ff Eric Dumazet 2011-10-18 1734 if (getfrag(from,
5640f7685831e0 Eric Dumazet 2012-09-23 1735 page_address(pfrag->page) + pfrag->offset,
5640f7685831e0 Eric Dumazet 2012-09-23 1736 offset, copy, skb->len, skb) < 0)
5640f7685831e0 Eric Dumazet 2012-09-23 1737 goto error_efault;
5640f7685831e0 Eric Dumazet 2012-09-23 1738
5640f7685831e0 Eric Dumazet 2012-09-23 1739 pfrag->offset += copy;
5640f7685831e0 Eric Dumazet 2012-09-23 1740 skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1741 skb->len += copy;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1742 skb->data_len += copy;
f945fa7ad9c12a Herbert Xu 2008-01-22 1743 skb->truesize += copy;
1f4c6eb2402968 Eric Dumazet 2018-03-31 1744 wmem_alloc_delta += copy;
b5947e5d1e710c Willem de Bruijn 2018-11-30 1745 } else {
b5947e5d1e710c Willem de Bruijn 2018-11-30 1746 err = skb_zerocopy_iter_dgram(skb, from, copy);
b5947e5d1e710c Willem de Bruijn 2018-11-30 1747 if (err < 0)
b5947e5d1e710c Willem de Bruijn 2018-11-30 1748 goto error;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1749 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1750 offset += copy;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1751 length -= copy;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1752 }
5640f7685831e0 Eric Dumazet 2012-09-23 1753
9e8445a56c253f Paolo Abeni 2018-04-04 1754 if (wmem_alloc_delta)
1f4c6eb2402968 Eric Dumazet 2018-03-31 1755 refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1756 return 0;
5640f7685831e0 Eric Dumazet 2012-09-23 1757
5640f7685831e0 Eric Dumazet 2012-09-23 1758 error_efault:
5640f7685831e0 Eric Dumazet 2012-09-23 1759 err = -EFAULT;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1760 error:
8e0449172497a9 Jonathan Lemon 2021-01-06 1761 net_zcopy_put_abort(uarg, extra_uref);
bdc712b4c2baf9 David S. Miller 2011-05-06 1762 cork->length -= length;
3bd653c8455bc7 Denis V. Lunev 2008-10-08 1763 IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
1f4c6eb2402968 Eric Dumazet 2018-03-31 1764 refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1765 return err;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1766 }
0bbe84a67b0b54 Vlad Yasevich 2015-01-31 1767
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 29663 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-05-01 10:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
2021-04-29 19:48 ` Sabrina Dubroca
2021-04-29 20:25 ` Jiri Bohac
2021-05-01 10:23 ` Sabrina Dubroca
2021-04-29 20:37 ` kernel test robot
2021-04-30 5:36 ` [RFC PATCH v2] " Jiri Bohac
2021-04-29 23:17 [RFC PATCH] " kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.