All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
@ 2021-05-19 20:41 Cong Wang
  2021-05-19 21:55 ` John Fastabend
  0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2021-05-19 20:41 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, Jiang Wang, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

We use non-blocking sockets for testing sockmap redirections,
and got some random EAGAIN errors from UDP tests.

There is no guarantee the packet would be immediately available
to receive as soon as it is sent out, even on the local host.
For UDP, this is especially true because it does not lock the
sock during BH (unlike the TCP path). This is probably why we
only saw this error in UDP cases.

No matter how hard we try to make the queue empty check accurate,
it is always possible for recvmsg() to beat ->sk_data_ready().
Therefore, we should just retry in case of EAGAIN.

Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 648d9ae898d2..b1ed182c4720 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
 
+again:
 	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
-	if (n < 0)
+	if (n < 0) {
+		if (errno == EAGAIN)
+			goto again;
 		FAIL_ERRNO("%s: read", log_prefix);
+	}
 	if (n == 0)
 		FAIL("%s: incomplete read", log_prefix);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-19 20:41 [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() Cong Wang
@ 2021-05-19 21:55 ` John Fastabend
  2021-05-19 23:36   ` Cong Wang
  0 siblings, 1 reply; 11+ messages in thread
From: John Fastabend @ 2021-05-19 21:55 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: bpf, Cong Wang, Jiang Wang, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> We use non-blocking sockets for testing sockmap redirections,
> and got some random EAGAIN errors from UDP tests.
> 
> There is no guarantee the packet would be immediately available
> to receive as soon as it is sent out, even on the local host.
> For UDP, this is especially true because it does not lock the
> sock during BH (unlike the TCP path). This is probably why we
> only saw this error in UDP cases.
> 
> No matter how hard we try to make the queue empty check accurate,
> it is always possible for recvmsg() to beat ->sk_data_ready().
> Therefore, we should just retry in case of EAGAIN.
> 
> Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> index 648d9ae898d2..b1ed182c4720 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
>  	if (pass != 1)
>  		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
>  
> +again:
>  	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> -	if (n < 0)
> +	if (n < 0) {
> +		if (errno == EAGAIN)
> +			goto again;
>  		FAIL_ERRNO("%s: read", log_prefix);

Needs a counter and abort logic we don't want to loop forever in the
case the packet is lost.

> +	}
>  	if (n == 0)
>  		FAIL("%s: incomplete read", log_prefix);
>  
> -- 
> 2.25.1
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-19 21:55 ` John Fastabend
@ 2021-05-19 23:36   ` Cong Wang
  2021-05-20 17:47     ` John Fastabend
  2021-05-20 20:14     ` Andrii Nakryiko
  0 siblings, 2 replies; 11+ messages in thread
From: Cong Wang @ 2021-05-19 23:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, bpf, Cong Wang, Jiang Wang,
	Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Wed, May 19, 2021 at 2:56 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > We use non-blocking sockets for testing sockmap redirections,
> > and got some random EAGAIN errors from UDP tests.
> >
> > There is no guarantee the packet would be immediately available
> > to receive as soon as it is sent out, even on the local host.
> > For UDP, this is especially true because it does not lock the
> > sock during BH (unlike the TCP path). This is probably why we
> > only saw this error in UDP cases.
> >
> > No matter how hard we try to make the queue empty check accurate,
> > it is always possible for recvmsg() to beat ->sk_data_ready().
> > Therefore, we should just retry in case of EAGAIN.
> >
> > Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> > Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > index 648d9ae898d2..b1ed182c4720 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
> >       if (pass != 1)
> >               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> >
> > +again:
> >       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > -     if (n < 0)
> > +     if (n < 0) {
> > +             if (errno == EAGAIN)
> > +                     goto again;
> >               FAIL_ERRNO("%s: read", log_prefix);
>
> Needs a counter and abort logic we don't want to loop forever in the
> case the packet is lost.

It should not be lost because selftests must be self-contained,
if the selftests could not even predict whether its own packet is
lost or not, we would have a much bigger trouble than just this
infinite loop.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-19 23:36   ` Cong Wang
@ 2021-05-20 17:47     ` John Fastabend
  2021-05-20 20:00       ` Cong Wang
  2021-05-20 20:14     ` Andrii Nakryiko
  1 sibling, 1 reply; 11+ messages in thread
From: John Fastabend @ 2021-05-20 17:47 UTC (permalink / raw)
  To: Cong Wang, John Fastabend
  Cc: Linux Kernel Network Developers, bpf, Cong Wang, Jiang Wang,
	Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

Cong Wang wrote:
> On Wed, May 19, 2021 at 2:56 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > We use non-blocking sockets for testing sockmap redirections,
> > > and got some random EAGAIN errors from UDP tests.
> > >
> > > There is no guarantee the packet would be immediately available
> > > to receive as soon as it is sent out, even on the local host.
> > > For UDP, this is especially true because it does not lock the
> > > sock during BH (unlike the TCP path). This is probably why we
> > > only saw this error in UDP cases.
> > >
> > > No matter how hard we try to make the queue empty check accurate,
> > > it is always possible for recvmsg() to beat ->sk_data_ready().
> > > Therefore, we should just retry in case of EAGAIN.
> > >
> > > Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> > > Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> > >  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > index 648d9ae898d2..b1ed182c4720 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
> > >       if (pass != 1)
> > >               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> > >
> > > +again:
> > >       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > > -     if (n < 0)
> > > +     if (n < 0) {
> > > +             if (errno == EAGAIN)
> > > +                     goto again;
> > >               FAIL_ERRNO("%s: read", log_prefix);
> >
> > Needs a counter and abort logic we don't want to loop forever in the
> > case the packet is lost.
> 
> It should not be lost because selftests must be self-contained,
> if the selftests could not even predict whether its own packet is
> lost or not, we would have a much bigger trouble than just this
> infinite loop.
> 
> Thanks.

Add the retry counter its maybe 4 lines of code. When I run this in a container
and my memcg kicks in for some unknown reason I don't want it to loop
forever I want it to fail gracefully. Plus it just looks bad to encode
a non-terminating loop, in my opinion, so I want the counter.

.John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-20 17:47     ` John Fastabend
@ 2021-05-20 20:00       ` Cong Wang
  2021-05-22  0:12         ` Andrii Nakryiko
  0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2021-05-20 20:00 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, bpf, Cong Wang, Jiang Wang,
	Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Thu, May 20, 2021 at 10:47 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > On Wed, May 19, 2021 at 2:56 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > >
> > > Cong Wang wrote:
> > > > From: Cong Wang <cong.wang@bytedance.com>
> > > >
> > > > We use non-blocking sockets for testing sockmap redirections,
> > > > and got some random EAGAIN errors from UDP tests.
> > > >
> > > > There is no guarantee the packet would be immediately available
> > > > to receive as soon as it is sent out, even on the local host.
> > > > For UDP, this is especially true because it does not lock the
> > > > sock during BH (unlike the TCP path). This is probably why we
> > > > only saw this error in UDP cases.
> > > >
> > > > No matter how hard we try to make the queue empty check accurate,
> > > > it is always possible for recvmsg() to beat ->sk_data_ready().
> > > > Therefore, we should just retry in case of EAGAIN.
> > > >
> > > > Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> > > > Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > > ---
> > > >  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > index 648d9ae898d2..b1ed182c4720 100644
> > > > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
> > > >       if (pass != 1)
> > > >               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> > > >
> > > > +again:
> > > >       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > > > -     if (n < 0)
> > > > +     if (n < 0) {
> > > > +             if (errno == EAGAIN)
> > > > +                     goto again;
> > > >               FAIL_ERRNO("%s: read", log_prefix);
> > >
> > > Needs a counter and abort logic we don't want to loop forever in the
> > > case the packet is lost.
> >
> > It should not be lost because selftests must be self-contained,
> > if the selftests could not even predict whether its own packet is
> > lost or not, we would have a much bigger trouble than just this
> > infinite loop.
> >
> > Thanks.
>
> Add the retry counter its maybe 4 lines of code. When I run this in a container

Sure, then the next question is how many times are you going to retry?
Let's say, 10. Then the next question would be is 10 sufficient? Clearly
not, because if the packet can be dropped (let's say by firewall), it can
be delayed too (let's say by netem).

Really, are we going to handle all of such cases? Or if we simply
assume the environment is self-contained and not hostile, none of
the above would happen hence nothing needs to be checked.

> and my memcg kicks in for some unknown reason I don't want it to loop
> forever I want it to fail gracefully. Plus it just looks bad to encode
> a non-terminating loop, in my opinion, so I want the counter.

There could be hundreds of reasons to drop a packet, or delay a
packet. How many of them do you want to consider and how many
of them do you not consider? Please draw a boundary.

The boundary I drew is very clear: we just assume the selftests
environment is not hostile. In fact, I believe selftests should setup
such an environment before running any test, for example, by creating
a container. And I believe this would make everyone happy.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-19 23:36   ` Cong Wang
  2021-05-20 17:47     ` John Fastabend
@ 2021-05-20 20:14     ` Andrii Nakryiko
  2021-05-20 20:44       ` Cong Wang
  1 sibling, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2021-05-20 20:14 UTC (permalink / raw)
  To: Cong Wang
  Cc: John Fastabend, Linux Kernel Network Developers, bpf, Cong Wang,
	Jiang Wang, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Wed, May 19, 2021 at 4:36 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Wed, May 19, 2021 at 2:56 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > We use non-blocking sockets for testing sockmap redirections,
> > > and got some random EAGAIN errors from UDP tests.
> > >
> > > There is no guarantee the packet would be immediately available
> > > to receive as soon as it is sent out, even on the local host.
> > > For UDP, this is especially true because it does not lock the
> > > sock during BH (unlike the TCP path). This is probably why we
> > > only saw this error in UDP cases.
> > >
> > > No matter how hard we try to make the queue empty check accurate,
> > > it is always possible for recvmsg() to beat ->sk_data_ready().
> > > Therefore, we should just retry in case of EAGAIN.
> > >
> > > Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> > > Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> > >  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > index 648d9ae898d2..b1ed182c4720 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
> > >       if (pass != 1)
> > >               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> > >
> > > +again:
> > >       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > > -     if (n < 0)
> > > +     if (n < 0) {
> > > +             if (errno == EAGAIN)
> > > +                     goto again;
> > >               FAIL_ERRNO("%s: read", log_prefix);
> >
> > Needs a counter and abort logic we don't want to loop forever in the
> > case the packet is lost.
>
> It should not be lost because selftests must be self-contained,
> if the selftests could not even predict whether its own packet is
> lost or not, we would have a much bigger trouble than just this
> infinite loop.

Bugs do happen though, so if you can detect some error condition
instead of having an infinite loop, then do it.

>
> Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-20 20:14     ` Andrii Nakryiko
@ 2021-05-20 20:44       ` Cong Wang
  2021-05-21  5:12         ` Andrii Nakryiko
  2021-06-03  8:10         ` Jakub Sitnicki
  0 siblings, 2 replies; 11+ messages in thread
From: Cong Wang @ 2021-05-20 20:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Linux Kernel Network Developers, bpf, Cong Wang,
	Jiang Wang, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Thu, May 20, 2021 at 1:14 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> Bugs do happen though, so if you can detect some error condition
> instead of having an infinite loop, then do it.

You both are underestimating the problem. There are two different things
to consider here:

1) Kernel bugs: This is known unknown, we certainly do not know
how many bugs we have, otherwise they would have been fixed
already. So we can not predict the consequence of the bug either,
assuming a bug could only cause packet drop is underestimated.

2) Configurations: For instance, firewall rules. If the selftests are run
in a weird firewall setup which drops all UDP packets, there is nothing
we can do in the test itself. If we have to detect this, then we would
have to detect netem cases too where packets can be held indefinitely
or reordered arbitrarily. The possibilities here are too many to detect,
hence I argue the selftests should setup its own non-hostile environment,
which has nothing to do with any specific program.

This is why I ask you to draw a boundary: what we can assume and
what we can't. My boundary is obviously clear: we just assume the
environment is non-hostile and we can't predict any kernel bugs,
nor their consequences.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-20 20:44       ` Cong Wang
@ 2021-05-21  5:12         ` Andrii Nakryiko
  2021-05-21 21:49           ` Cong Wang
  2021-06-03  8:10         ` Jakub Sitnicki
  1 sibling, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2021-05-21  5:12 UTC (permalink / raw)
  To: Cong Wang
  Cc: John Fastabend, Linux Kernel Network Developers, bpf, Cong Wang,
	Jiang Wang, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Thu, May 20, 2021 at 1:44 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Thu, May 20, 2021 at 1:14 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > Bugs do happen though, so if you can detect some error condition
> > instead of having an infinite loop, then do it.
>
> You both are underestimating the problem. There are two different things
> to consider here:
>
> 1) Kernel bugs: This is known unknown, we certainly do not know
> how many bugs we have, otherwise they would have been fixed
> already. So we can not predict the consequence of the bug either,
> assuming a bug could only cause packet drop is underestimated.
>
> 2) Configurations: For instance, firewall rules. If the selftests are run
> in a weird firewall setup which drops all UDP packets, there is nothing
> we can do in the test itself. If we have to detect this, then we would
> have to detect netem cases too where packets can be held indefinitely
> or reordered arbitrarily. The possibilities here are too many to detect,
> hence I argue the selftests should setup its own non-hostile environment,
> which has nothing to do with any specific program.
>
> This is why I ask you to draw a boundary: what we can assume and
> what we can't. My boundary is obviously clear: we just assume the
> environment is non-hostile and we can't predict any kernel bugs,
> nor their consequences.

It doesn't matter. Instead of having an endless loop, please add a
counter and limit the number of iterations to some reasonable number.
That's all, no one is asking you to do something impossible, just
prevent looping forever in some unforeseen situations and just fail
the test instead.

>
> Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-21  5:12         ` Andrii Nakryiko
@ 2021-05-21 21:49           ` Cong Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-05-21 21:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Linux Kernel Network Developers, bpf, Cong Wang,
	Jiang Wang, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Thu, May 20, 2021 at 10:12 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> It doesn't matter. Instead of having an endless loop, please add a
> counter and limit the number of iterations to some reasonable number.
> That's all, no one is asking you to do something impossible, just
> prevent looping forever in some unforeseen situations and just fail
> the test instead.

Well, in a non-hostile environment, the packet will not be dropped
as it is sent via a loopback, hence the loop is not endless.

"my memcg kicks in for some unknown reason" is pretty much
a hostile environment, for which we should not consider. The counter
argument is simply not to run selftests in such a hostile environment.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-20 20:00       ` Cong Wang
@ 2021-05-22  0:12         ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2021-05-22  0:12 UTC (permalink / raw)
  To: Cong Wang
  Cc: John Fastabend, Linux Kernel Network Developers, bpf, Cong Wang,
	Jiang Wang, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer

On Fri, May 21, 2021 at 12:31 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Thu, May 20, 2021 at 10:47 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > On Wed, May 19, 2021 at 2:56 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > > >
> > > > Cong Wang wrote:
> > > > > From: Cong Wang <cong.wang@bytedance.com>
> > > > >
> > > > > We use non-blocking sockets for testing sockmap redirections,
> > > > > and got some random EAGAIN errors from UDP tests.
> > > > >
> > > > > There is no guarantee the packet would be immediately available
> > > > > to receive as soon as it is sent out, even on the local host.
> > > > > For UDP, this is especially true because it does not lock the
> > > > > sock during BH (unlike the TCP path). This is probably why we
> > > > > only saw this error in UDP cases.
> > > > >
> > > > > No matter how hard we try to make the queue empty check accurate,
> > > > > it is always possible for recvmsg() to beat ->sk_data_ready().
> > > > > Therefore, we should just retry in case of EAGAIN.
> > > > >
> > > > > Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> > > > > Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > > > ---
> > > > >  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
> > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > index 648d9ae898d2..b1ed182c4720 100644
> > > > > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
> > > > >       if (pass != 1)
> > > > >               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> > > > >
> > > > > +again:
> > > > >       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > > > > -     if (n < 0)
> > > > > +     if (n < 0) {
> > > > > +             if (errno == EAGAIN)
> > > > > +                     goto again;
> > > > >               FAIL_ERRNO("%s: read", log_prefix);
> > > >
> > > > Needs a counter and abort logic we don't want to loop forever in the
> > > > case the packet is lost.
> > >
> > > It should not be lost because selftests must be self-contained,
> > > if the selftests could not even predict whether its own packet is
> > > lost or not, we would have a much bigger trouble than just this
> > > infinite loop.
> > >
> > > Thanks.
> >
> > Add the retry counter its maybe 4 lines of code. When I run this in a container
>
> Sure, then the next question is how many times are you going to retry?
> Let's say, 10. Then the next question would be is 10 sufficient? Clearly
> not, because if the packet can be dropped (let's say by firewall), it can
> be delayed too (let's say by netem).

10 is fine. If something unexpected happens (whatever hostility of the
environment), getting stuck is much-much worse than erroring out. So
please add the counter and be done with it.

>
> Really, are we going to handle all of such cases? Or if we simply
> assume the environment is self-contained and not hostile, none of
> the above would happen hence nothing needs to be checked.

We have many different environments in which selftests are running. We
shouldn't have an infinite loop in any of them, even if some selftests
can't succeed in some of them. Not in all environments it is possible
to do Ctrl-C (e.g., CIs).

>
> > and my memcg kicks in for some unknown reason I don't want it to loop
> > forever I want it to fail gracefully. Plus it just looks bad to encode
> > a non-terminating loop, in my opinion, so I want the counter.
>
> There could be hundreds of reasons to drop a packet, or delay a
> packet. How many of them do you want to consider and how many
> of them do you not consider? Please draw a boundary.
>
> The boundary I drew is very clear: we just assume the selftests

See above, the minimum bar is that selftest shouldn't get stuck, no matter what.

> environment is not hostile. In fact, I believe selftests should setup
> such an environment before running any test, for example, by creating
> a container. And I believe this would make everyone happy.
>
> Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-05-20 20:44       ` Cong Wang
  2021-05-21  5:12         ` Andrii Nakryiko
@ 2021-06-03  8:10         ` Jakub Sitnicki
  1 sibling, 0 replies; 11+ messages in thread
From: Jakub Sitnicki @ 2021-06-03  8:10 UTC (permalink / raw)
  To: Cong Wang
  Cc: Andrii Nakryiko, John Fastabend, Linux Kernel Network Developers,
	bpf, Cong Wang, Jiang Wang, Daniel Borkmann, Lorenz Bauer

On Thu, May 20, 2021 at 10:44 PM CEST, Cong Wang wrote:
> On Thu, May 20, 2021 at 1:14 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>> Bugs do happen though, so if you can detect some error condition
>> instead of having an infinite loop, then do it.
>
> You both are underestimating the problem. There are two different things
> to consider here:
>
> 1) Kernel bugs: This is known unknown, we certainly do not know
> how many bugs we have, otherwise they would have been fixed
> already. So we can not predict the consequence of the bug either,
> assuming a bug could only cause packet drop is underestimated.
>
> 2) Configurations: For instance, firewall rules. If the selftests are run
> in a weird firewall setup which drops all UDP packets, there is nothing
> we can do in the test itself. If we have to detect this, then we would
> have to detect netem cases too where packets can be held indefinitely
> or reordered arbitrarily. The possibilities here are too many to detect,
> hence I argue the selftests should setup its own non-hostile environment,
> which has nothing to do with any specific program.
>
> This is why I ask you to draw a boundary: what we can assume and
> what we can't. My boundary is obviously clear: we just assume the
> environment is non-hostile and we can't predict any kernel bugs,
> nor their consequences.
>
> Thanks.

(Sorry for the delay in reviews. I've been out.)

In my mind uAPI tests should not be tailored to the underlying
implementation (non-blocking read after write over loopback succeeds for
TCP), or the environment they run in (packets don't get dropped due to
OOM, signals don't interrupt syscalls).

If it's a non-blocking socket, then EAGAIN can happen. That's the
contract between the kernel and the user-space.

There is already a helper in this test case for polling and reading with
a timeout (see recv_timeout()). IMO we should be using it in all tests
that use non-blocking I/O.

If it's not being used already, that is most likely my fault.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-03  8:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 20:41 [Patch bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() Cong Wang
2021-05-19 21:55 ` John Fastabend
2021-05-19 23:36   ` Cong Wang
2021-05-20 17:47     ` John Fastabend
2021-05-20 20:00       ` Cong Wang
2021-05-22  0:12         ` Andrii Nakryiko
2021-05-20 20:14     ` Andrii Nakryiko
2021-05-20 20:44       ` Cong Wang
2021-05-21  5:12         ` Andrii Nakryiko
2021-05-21 21:49           ` Cong Wang
2021-06-03  8:10         ` Jakub Sitnicki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.