From: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, shuah@kernel.org,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
posk@google.com
Subject: Re: [PATCH] selftests: net: ip_defrag: increase netdev_max_backlog
Date: Fri, 6 Dec 2019 11:50:10 -0300 [thread overview]
Message-ID: <20191206145010.GE5083@calabresa> (raw)
In-Reply-To: <d2dddb34-f126-81f8-cbf7-04635f04795a@gmail.com>
On Fri, Dec 06, 2019 at 05:41:01AM -0800, Eric Dumazet wrote:
>
>
> On 12/6/19 4:17 AM, Thadeu Lima de Souza Cascardo wrote:
> > On Wed, Dec 04, 2019 at 12:03:57PM -0800, Eric Dumazet wrote:
> >>
> >>
> >> On 12/4/19 11:53 AM, Thadeu Lima de Souza Cascardo wrote:
> >>> When using fragments with size 8 and payload larger than 8000, the backlog
> >>> might fill up and packets will be dropped, causing the test to fail. This
> >>> happens often enough when conntrack is on during the IPv6 test.
> >>>
> >>> As the larger payload in the test is 10000, using a backlog of 1250 allow
> >>> the test to run repeatedly without failure. At least a 1000 runs were
> >>> possible with no failures, when usually less than 50 runs were good enough
> >>> for showing a failure.
> >>>
> >>> As netdev_max_backlog is not a pernet setting, this sets the backlog to
> >>> 1000 during exit to prevent disturbing following tests.
> >>>
> >>
> >> Hmmm... I would prefer not changing a global setting like that.
> >> This is going to be flaky since we often run tests in parallel (using different netns)
> >>
> >> What about adding a small delay after each sent packet ?
> >>
> >> diff --git a/tools/testing/selftests/net/ip_defrag.c b/tools/testing/selftests/net/ip_defrag.c
> >> index c0c9ecb891e1d78585e0db95fd8783be31bc563a..24d0723d2e7e9b94c3e365ee2ee30e9445deafa8 100644
> >> --- a/tools/testing/selftests/net/ip_defrag.c
> >> +++ b/tools/testing/selftests/net/ip_defrag.c
> >> @@ -198,6 +198,7 @@ static void send_fragment(int fd_raw, struct sockaddr *addr, socklen_t alen,
> >> error(1, 0, "send_fragment: %d vs %d", res, frag_len);
> >>
> >> frag_counter++;
> >> + usleep(1000);
> >> }
> >>
> >> static void send_udp_frags(int fd_raw, struct sockaddr *addr,
> >>
> >
> > That won't work because the issue only shows when we using conntrack, as the
> > packet will be reassembled on output, then fragmented again. When this happens,
> > the fragmentation code is transmitting the fragments in a tight loop, which
> > floods the backlog.
>
> Interesting !
>
> So it looks like the test is correct, and exposed a long standing problem in this code.
>
> We should not adjust the test to some kernel-of-the-day-constraints, and instead fix the kernel bug ;)
>
> Where is this tight loop exactly ?
>
> If this is feeding/bursting ~1000 skbs via netif_rx() in a BH context, maybe we need to call a variant
> that allows immediate processing instead of (ab)using the softnet backlog.
>
> Thanks !
This is the loopback interface, so its xmit calls netif_rx. I suppose we would
have the same problem with veth, for example.
So net/ipv6/ip6_output.c:ip6_fragment has this:
for (;;) {
/* Prepare header of the next frame,
* before previous one went down. */
if (iter.frag)
ip6_fraglist_prepare(skb, &iter);
skb->tstamp = tstamp;
err = output(net, sk, skb);
if (!err)
IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
IPSTATS_MIB_FRAGCREATES);
if (err || !iter.frag)
break;
skb = ip6_fraglist_next(&iter);
}
output is ip6_finish_output2, which will call neigh_output, which ends up
calling dev_queue_xmit.
In this case, ip6_fragment is being called probably from rawv6_send_hdrinc ->
dst_output -> ip6_output -> ip6_finish_output -> __ip6_finish_output ->
ip6_fragment.
dst_output at rawv6_send_hdrinc is being called after netfilter
NF_INET_LOCAL_OUT hook. That one is gathering the fragments and only accepting
that last, reassembled skb, which causes ip6_fragment enter that loop.
So, basically, the easiest way to reproduce this is using this test with
loopback and netfilter doing the reassembly during conntrack. I see some BH
locks here and there, but I think this is just filling up the backlog too fast
to give any chance for softirq to kick in.
I will see if I can reproduce this using routed veths.
Cascardo.
Cascardo.
next prev parent reply other threads:[~2019-12-06 14:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-04 19:53 [PATCH] selftests: net: ip_defrag: increase netdev_max_backlog Thadeu Lima de Souza Cascardo
2019-12-04 20:03 ` Eric Dumazet
2019-12-06 12:17 ` Thadeu Lima de Souza Cascardo
2019-12-06 13:41 ` Eric Dumazet
2019-12-06 14:50 ` Thadeu Lima de Souza Cascardo [this message]
2019-12-06 15:50 ` Thadeu Lima de Souza Cascardo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191206145010.GE5083@calabresa \
--to=cascardo@canonical.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=posk@google.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).