Andrii Nakryiko reports that sockmap_listen test suite is frequently failing due to accept() calls erroring out with EAGAIN: ./test_progs:connect_accept_thread:733: accept: Resource temporarily unavailable connect_accept_thread:FAIL:733 This is because we are needlessly putting the listening TCP sockets in non-blocking mode. Fix it by using the default blocking mode in all tests in this suite. Fixes: 44d28be2b8d4 ("selftests/bpf: Tests for sockmap/sockhash holding listening sockets") Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> --- tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 52aa468bdccd..90271ec90388 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -754,7 +754,7 @@ static void test_syn_recv_insert_delete(int family, int sotype, int mapfd) int err, s; u64 value; - s = socket_loopback(family, sotype | SOCK_NONBLOCK); + s = socket_loopback(family, sotype); if (s < 0) return; @@ -896,7 +896,7 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, zero_verdict_count(verd_mapfd); - s = socket_loopback(family, sotype | SOCK_NONBLOCK); + s = socket_loopback(family, sotype); if (s < 0) return; @@ -1028,7 +1028,7 @@ static void redir_to_listening(int family, int sotype, int sock_mapfd, zero_verdict_count(verd_mapfd); - s = socket_loopback(family, sotype | SOCK_NONBLOCK); + s = socket_loopback(family, sotype); if (s < 0) return; -- 2.24.1
On Thu, Mar 12, 2020 at 10:11 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > Andrii Nakryiko reports that sockmap_listen test suite is frequently > failing due to accept() calls erroring out with EAGAIN: > > ./test_progs:connect_accept_thread:733: accept: Resource temporarily unavailable > connect_accept_thread:FAIL:733 > > This is because we are needlessly putting the listening TCP sockets in > non-blocking mode. > > Fix it by using the default blocking mode in all tests in this suite. > > Fixes: 44d28be2b8d4 ("selftests/bpf: Tests for sockmap/sockhash holding listening sockets") > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > --- Thanks for looking into this. Can you please verify that test successfully fails (not hangs) when, say, network is down (do `ip link set lo down` before running test?). The reason I'm asking is that I just fixed a problem in tcp_rtt selftest, in which accept() would block forever, even if listening socket was closed. > tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c > index 52aa468bdccd..90271ec90388 100644 > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c > @@ -754,7 +754,7 @@ static void test_syn_recv_insert_delete(int family, int sotype, int mapfd) > int err, s; > u64 value; > > - s = socket_loopback(family, sotype | SOCK_NONBLOCK); > + s = socket_loopback(family, sotype); > if (s < 0) > return; > > @@ -896,7 +896,7 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, > > zero_verdict_count(verd_mapfd); > > - s = socket_loopback(family, sotype | SOCK_NONBLOCK); > + s = socket_loopback(family, sotype); > if (s < 0) > return; > > @@ -1028,7 +1028,7 @@ static void redir_to_listening(int family, int sotype, int sock_mapfd, > > zero_verdict_count(verd_mapfd); > > - s = socket_loopback(family, sotype | SOCK_NONBLOCK); > + s = socket_loopback(family, sotype); > if (s < 0) > return; > > -- > 2.24.1 >
On Thu, Mar 12, 2020 at 06:57 PM CET, Andrii Nakryiko wrote:
> On Thu, Mar 12, 2020 at 10:11 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Andrii Nakryiko reports that sockmap_listen test suite is frequently
>> failing due to accept() calls erroring out with EAGAIN:
>>
>> ./test_progs:connect_accept_thread:733: accept: Resource temporarily unavailable
>> connect_accept_thread:FAIL:733
>>
>> This is because we are needlessly putting the listening TCP sockets in
>> non-blocking mode.
>>
>> Fix it by using the default blocking mode in all tests in this suite.
>>
>> Fixes: 44d28be2b8d4 ("selftests/bpf: Tests for sockmap/sockhash holding listening sockets")
>> Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>
> Thanks for looking into this. Can you please verify that test
> successfully fails (not hangs) when, say, network is down (do `ip link
> set lo down` before running test?). The reason I'm asking is that I
> just fixed a problem in tcp_rtt selftest, in which accept() would
> block forever, even if listening socket was closed.
Right, good point. We don't want tests hanging. Let me rework it.
[...]
On Thu, Mar 12, 2020 at 06:57 PM CET, Andrii Nakryiko wrote:
> Thanks for looking into this. Can you please verify that test
> successfully fails (not hangs) when, say, network is down (do `ip link
> set lo down` before running test?). The reason I'm asking is that I
> just fixed a problem in tcp_rtt selftest, in which accept() would
> block forever, even if listening socket was closed.
While on the topic writing network tests with test_progs.
There are a couple pain points because all tests run as one process:
1) resource cleanup on failure
Tests can't simply exit(), abort(), or error() on failure. Instead
they need to clean up all resources, like opened file descriptors and
memory allocations, and propagate the error up to the main test
function so it can return to the test runner.
2) terminating in timely fashion
We don't have an option of simply setting alarm() to terminate after
a reasnable timeout without worrying about I/O syscalls in blocking
mode being stuck.
Careful error and timeout handling makes test code more complicated that
it really needs to be, IMHO. Making writing as well as maintaing them
harder.
What if we extended test_progs runner to support process-per-test
execution model? Perhaps as an opt-in for selected tests.
Is that in line with the plans/vision for BPF selftests?
On Fri, Mar 13, 2020 at 9:42 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > On Thu, Mar 12, 2020 at 06:57 PM CET, Andrii Nakryiko wrote: > > Thanks for looking into this. Can you please verify that test > > successfully fails (not hangs) when, say, network is down (do `ip link > > set lo down` before running test?). The reason I'm asking is that I > > just fixed a problem in tcp_rtt selftest, in which accept() would > > block forever, even if listening socket was closed. > > While on the topic writing network tests with test_progs. > > There are a couple pain points because all tests run as one process: > > 1) resource cleanup on failure > > Tests can't simply exit(), abort(), or error() on failure. Instead > they need to clean up all resources, like opened file descriptors and > memory allocations, and propagate the error up to the main test > function so it can return to the test runner. > > 2) terminating in timely fashion > > We don't have an option of simply setting alarm() to terminate after > a reasnable timeout without worrying about I/O syscalls in blocking > mode being stuck. I agree, those APIs suck, unfortunately. > > Careful error and timeout handling makes test code more complicated that > it really needs to be, IMHO. Making writing as well as maintaing them > harder. Well, I think it's actually a good thing. Tests are as important as features, if not more, so it pays to invest in having reliable tests. > > What if we extended test_progs runner to support process-per-test > execution model? Perhaps as an opt-in for selected tests. > > Is that in line with the plans/vision for BPF selftests? It would be nice indeed, though I'd still maintain that tests shouldn't be sloppy. But having that would allow parallelizing tests, which would be awesome. So yeah, it would be good to have, IMO.
On Fri, Mar 13, 2020 at 05:42:36PM +0100, Jakub Sitnicki wrote:
>
> What if we extended test_progs runner to support process-per-test
> execution model? Perhaps as an opt-in for selected tests.
I would love that.
Especially if fork-per-test can make majority of the tests to execute in
parallel. Running test_progs is the biggest time sync for me when I apply
patches. Running them in parallel will help me apply patches faster, so I can
dedicate more time to reviews :)