bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN
@ 2020-03-12 17:11 Jakub Sitnicki
  2020-03-12 17:57 ` Andrii Nakryiko
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Sitnicki @ 2020-03-12 17:11 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko

Andrii Nakryiko reports that sockmap_listen test suite is frequently
failing due to accept() calls erroring out with EAGAIN:

  ./test_progs:connect_accept_thread:733: accept: Resource temporarily unavailable
  connect_accept_thread:FAIL:733

This is because we are needlessly putting the listening TCP sockets in
non-blocking mode.

Fix it by using the default blocking mode in all tests in this suite.

Fixes: 44d28be2b8d4 ("selftests/bpf: Tests for sockmap/sockhash holding listening sockets")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 52aa468bdccd..90271ec90388 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -754,7 +754,7 @@ static void test_syn_recv_insert_delete(int family, int sotype, int mapfd)
 	int err, s;
 	u64 value;
 
-	s = socket_loopback(family, sotype | SOCK_NONBLOCK);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -896,7 +896,7 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
 
 	zero_verdict_count(verd_mapfd);
 
-	s = socket_loopback(family, sotype | SOCK_NONBLOCK);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
@@ -1028,7 +1028,7 @@ static void redir_to_listening(int family, int sotype, int sock_mapfd,
 
 	zero_verdict_count(verd_mapfd);
 
-	s = socket_loopback(family, sotype | SOCK_NONBLOCK);
+	s = socket_loopback(family, sotype);
 	if (s < 0)
 		return;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN
  2020-03-12 17:11 [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN Jakub Sitnicki
@ 2020-03-12 17:57 ` Andrii Nakryiko
  2020-03-12 19:19   ` Jakub Sitnicki
  2020-03-13 16:42   ` Jakub Sitnicki
  0 siblings, 2 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2020-03-12 17:57 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Thu, Mar 12, 2020 at 10:11 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Andrii Nakryiko reports that sockmap_listen test suite is frequently
> failing due to accept() calls erroring out with EAGAIN:
>
>   ./test_progs:connect_accept_thread:733: accept: Resource temporarily unavailable
>   connect_accept_thread:FAIL:733
>
> This is because we are needlessly putting the listening TCP sockets in
> non-blocking mode.
>
> Fix it by using the default blocking mode in all tests in this suite.
>
> Fixes: 44d28be2b8d4 ("selftests/bpf: Tests for sockmap/sockhash holding listening sockets")
> Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

Thanks for looking into this. Can you please verify that test
successfully fails (not hangs) when, say, network is down (do `ip link
set lo down` before running test?). The reason I'm asking is that I
just fixed a problem in tcp_rtt selftest, in which accept() would
block forever, even if listening socket was closed.


>  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> index 52aa468bdccd..90271ec90388 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> @@ -754,7 +754,7 @@ static void test_syn_recv_insert_delete(int family, int sotype, int mapfd)
>         int err, s;
>         u64 value;
>
> -       s = socket_loopback(family, sotype | SOCK_NONBLOCK);
> +       s = socket_loopback(family, sotype);
>         if (s < 0)
>                 return;
>
> @@ -896,7 +896,7 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd,
>
>         zero_verdict_count(verd_mapfd);
>
> -       s = socket_loopback(family, sotype | SOCK_NONBLOCK);
> +       s = socket_loopback(family, sotype);
>         if (s < 0)
>                 return;
>
> @@ -1028,7 +1028,7 @@ static void redir_to_listening(int family, int sotype, int sock_mapfd,
>
>         zero_verdict_count(verd_mapfd);
>
> -       s = socket_loopback(family, sotype | SOCK_NONBLOCK);
> +       s = socket_loopback(family, sotype);
>         if (s < 0)
>                 return;
>
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN
  2020-03-12 17:57 ` Andrii Nakryiko
@ 2020-03-12 19:19   ` Jakub Sitnicki
  2020-03-13 16:42   ` Jakub Sitnicki
  1 sibling, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2020-03-12 19:19 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Thu, Mar 12, 2020 at 06:57 PM CET, Andrii Nakryiko wrote:
> On Thu, Mar 12, 2020 at 10:11 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Andrii Nakryiko reports that sockmap_listen test suite is frequently
>> failing due to accept() calls erroring out with EAGAIN:
>>
>>   ./test_progs:connect_accept_thread:733: accept: Resource temporarily unavailable
>>   connect_accept_thread:FAIL:733
>>
>> This is because we are needlessly putting the listening TCP sockets in
>> non-blocking mode.
>>
>> Fix it by using the default blocking mode in all tests in this suite.
>>
>> Fixes: 44d28be2b8d4 ("selftests/bpf: Tests for sockmap/sockhash holding listening sockets")
>> Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>
> Thanks for looking into this. Can you please verify that test
> successfully fails (not hangs) when, say, network is down (do `ip link
> set lo down` before running test?). The reason I'm asking is that I
> just fixed a problem in tcp_rtt selftest, in which accept() would
> block forever, even if listening socket was closed.

Right, good point. We don't want tests hanging. Let me rework it.

[...]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN
  2020-03-12 17:57 ` Andrii Nakryiko
  2020-03-12 19:19   ` Jakub Sitnicki
@ 2020-03-13 16:42   ` Jakub Sitnicki
  2020-03-13 18:30     ` Andrii Nakryiko
  2020-03-14  2:48     ` Alexei Starovoitov
  1 sibling, 2 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2020-03-13 16:42 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, Networking, kernel-team

On Thu, Mar 12, 2020 at 06:57 PM CET, Andrii Nakryiko wrote:
> Thanks for looking into this. Can you please verify that test
> successfully fails (not hangs) when, say, network is down (do `ip link
> set lo down` before running test?). The reason I'm asking is that I
> just fixed a problem in tcp_rtt selftest, in which accept() would
> block forever, even if listening socket was closed.

While on the topic writing network tests with test_progs.

There are a couple pain points because all tests run as one process:

1) resource cleanup on failure

   Tests can't simply exit(), abort(), or error() on failure. Instead
   they need to clean up all resources, like opened file descriptors and
   memory allocations, and propagate the error up to the main test
   function so it can return to the test runner.

2) terminating in timely fashion

   We don't have an option of simply setting alarm() to terminate after
   a reasnable timeout without worrying about I/O syscalls in blocking
   mode being stuck.

Careful error and timeout handling makes test code more complicated that
it really needs to be, IMHO. Making writing as well as maintaing them
harder.

What if we extended test_progs runner to support process-per-test
execution model? Perhaps as an opt-in for selected tests.

Is that in line with the plans/vision for BPF selftests?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN
  2020-03-13 16:42   ` Jakub Sitnicki
@ 2020-03-13 18:30     ` Andrii Nakryiko
  2020-03-14  2:48     ` Alexei Starovoitov
  1 sibling, 0 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2020-03-13 18:30 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Networking, kernel-team

On Fri, Mar 13, 2020 at 9:42 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Thu, Mar 12, 2020 at 06:57 PM CET, Andrii Nakryiko wrote:
> > Thanks for looking into this. Can you please verify that test
> > successfully fails (not hangs) when, say, network is down (do `ip link
> > set lo down` before running test?). The reason I'm asking is that I
> > just fixed a problem in tcp_rtt selftest, in which accept() would
> > block forever, even if listening socket was closed.
>
> While on the topic writing network tests with test_progs.
>
> There are a couple pain points because all tests run as one process:
>
> 1) resource cleanup on failure
>
>    Tests can't simply exit(), abort(), or error() on failure. Instead
>    they need to clean up all resources, like opened file descriptors and
>    memory allocations, and propagate the error up to the main test
>    function so it can return to the test runner.
>
> 2) terminating in timely fashion
>
>    We don't have an option of simply setting alarm() to terminate after
>    a reasnable timeout without worrying about I/O syscalls in blocking
>    mode being stuck.

I agree, those APIs suck, unfortunately.

>
> Careful error and timeout handling makes test code more complicated that
> it really needs to be, IMHO. Making writing as well as maintaing them
> harder.

Well, I think it's actually a good thing. Tests are as important as
features, if not more, so it pays to invest in having reliable tests.

>
> What if we extended test_progs runner to support process-per-test
> execution model? Perhaps as an opt-in for selected tests.
>
> Is that in line with the plans/vision for BPF selftests?

It would be nice indeed, though I'd still maintain that tests
shouldn't be sloppy. But having that would allow parallelizing tests,
which would be awesome. So yeah, it would be good to have, IMO.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN
  2020-03-13 16:42   ` Jakub Sitnicki
  2020-03-13 18:30     ` Andrii Nakryiko
@ 2020-03-14  2:48     ` Alexei Starovoitov
  1 sibling, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2020-03-14  2:48 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, bpf,
	Networking, kernel-team

On Fri, Mar 13, 2020 at 05:42:36PM +0100, Jakub Sitnicki wrote:
> 
> What if we extended test_progs runner to support process-per-test
> execution model? Perhaps as an opt-in for selected tests.

I would love that.
Especially if fork-per-test can make majority of the tests to execute in
parallel. Running test_progs is the biggest time sync for me when I apply
patches. Running them in parallel will help me apply patches faster, so I can
dedicate more time to reviews :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-14  2:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12 17:11 [PATCH bpf-next] selftests/bpf: Fix spurious failures in accept due to EAGAIN Jakub Sitnicki
2020-03-12 17:57 ` Andrii Nakryiko
2020-03-12 19:19   ` Jakub Sitnicki
2020-03-13 16:42   ` Jakub Sitnicki
2020-03-13 18:30     ` Andrii Nakryiko
2020-03-14  2:48     ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).