bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf] selftests/bpf: Poll for receive in cg_storage_multi test
@ 2023-04-03 21:58 YiFei Zhu
  2023-04-03 22:16 ` Stanislav Fomichev
  2023-04-03 23:44 ` Martin KaFai Lau
  0 siblings, 2 replies; 3+ messages in thread
From: YiFei Zhu @ 2023-04-03 21:58 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Martin KaFai Lau, Andrii Nakryiko

In some cases the loopback latency might be large enough, causing
the assertion on invocations to be run before ingress prog getting
executed. The assertion would fail and the test would flake.

This can be reliably reproduced by arbitrarily increaing the loopback
latency (thanks to [1]):
  tc qdisc add dev lo root handle 1: htb default 12
  tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps
  tc qdisc add dev lo parent 1:12 netem delay 100ms

Fix this by polling on the receive end and waiting for up to a
second, instead of instantly returning to the assert.

[1] https://gist.github.com/kstevens715/4598301

Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/
Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 .../testing/selftests/bpf/prog_tests/cg_storage_multi.c  | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
index 621c57222191..3b0094a2a353 100644
--- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
+++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
@@ -7,6 +7,7 @@
 #include <test_progs.h>
 #include <cgroup_helpers.h>
 #include <network_helpers.h>
+#include <poll.h>
 
 #include "progs/cg_storage_multi.h"
 
@@ -56,8 +57,9 @@ static bool assert_storage_noexist(struct bpf_map *map, const void *key)
 
 static bool connect_send(const char *cgroup_path)
 {
-	bool res = true;
 	int server_fd = -1, client_fd = -1;
+	struct pollfd pollfd;
+	bool res = true;
 
 	if (join_cgroup(cgroup_path))
 		goto out_clean;
@@ -73,6 +75,11 @@ static bool connect_send(const char *cgroup_path)
 	if (send(client_fd, "message", strlen("message"), 0) < 0)
 		goto out_clean;
 
+	pollfd.fd = server_fd;
+	pollfd.events = POLLIN;
+	if (poll(&pollfd, 1, 1000) != 1)
+		goto out_clean;
+
 	res = false;
 
 out_clean:
-- 
2.40.0.348.gf938b09366-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf] selftests/bpf: Poll for receive in cg_storage_multi test
  2023-04-03 21:58 [PATCH bpf] selftests/bpf: Poll for receive in cg_storage_multi test YiFei Zhu
@ 2023-04-03 22:16 ` Stanislav Fomichev
  2023-04-03 23:44 ` Martin KaFai Lau
  1 sibling, 0 replies; 3+ messages in thread
From: Stanislav Fomichev @ 2023-04-03 22:16 UTC (permalink / raw)
  To: YiFei Zhu
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Andrii Nakryiko

On Mon, Apr 3, 2023 at 2:59 PM YiFei Zhu <zhuyifei@google.com> wrote:
>
> In some cases the loopback latency might be large enough, causing
> the assertion on invocations to be run before ingress prog getting
> executed. The assertion would fail and the test would flake.
>
> This can be reliably reproduced by arbitrarily increaing the loopback
> latency (thanks to [1]):
>   tc qdisc add dev lo root handle 1: htb default 12
>   tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps
>   tc qdisc add dev lo parent 1:12 netem delay 100ms
>
> Fix this by polling on the receive end and waiting for up to a
> second, instead of instantly returning to the assert.
>
> [1] https://gist.github.com/kstevens715/4598301
>
> Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
> Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/
> Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>

Thank you!

Acked-by: Stanislav Fomichev <sdf@google.com>

> ---
>  .../testing/selftests/bpf/prog_tests/cg_storage_multi.c  | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> index 621c57222191..3b0094a2a353 100644
> --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> @@ -7,6 +7,7 @@
>  #include <test_progs.h>
>  #include <cgroup_helpers.h>
>  #include <network_helpers.h>
> +#include <poll.h>
>
>  #include "progs/cg_storage_multi.h"
>
> @@ -56,8 +57,9 @@ static bool assert_storage_noexist(struct bpf_map *map, const void *key)
>
>  static bool connect_send(const char *cgroup_path)
>  {
> -       bool res = true;
>         int server_fd = -1, client_fd = -1;
> +       struct pollfd pollfd;
> +       bool res = true;
>
>         if (join_cgroup(cgroup_path))
>                 goto out_clean;
> @@ -73,6 +75,11 @@ static bool connect_send(const char *cgroup_path)
>         if (send(client_fd, "message", strlen("message"), 0) < 0)
>                 goto out_clean;
>
> +       pollfd.fd = server_fd;
> +       pollfd.events = POLLIN;
> +       if (poll(&pollfd, 1, 1000) != 1)
> +               goto out_clean;
> +
>         res = false;
>
>  out_clean:
> --
> 2.40.0.348.gf938b09366-goog
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf] selftests/bpf: Poll for receive in cg_storage_multi test
  2023-04-03 21:58 [PATCH bpf] selftests/bpf: Poll for receive in cg_storage_multi test YiFei Zhu
  2023-04-03 22:16 ` Stanislav Fomichev
@ 2023-04-03 23:44 ` Martin KaFai Lau
  1 sibling, 0 replies; 3+ messages in thread
From: Martin KaFai Lau @ 2023-04-03 23:44 UTC (permalink / raw)
  To: YiFei Zhu
  Cc: Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Andrii Nakryiko, bpf

On 4/3/23 2:58 PM, YiFei Zhu wrote:
> In some cases the loopback latency might be large enough, causing
> the assertion on invocations to be run before ingress prog getting
> executed. The assertion would fail and the test would flake.
> 
> This can be reliably reproduced by arbitrarily increaing the loopback
> latency (thanks to [1]):
>    tc qdisc add dev lo root handle 1: htb default 12
>    tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps
>    tc qdisc add dev lo parent 1:12 netem delay 100ms
> 
> Fix this by polling on the receive end and waiting for up to a
> second, instead of instantly returning to the assert.
> 
> [1] https://gist.github.com/kstevens715/4598301
> 
> Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
> Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/
> Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
> Signed-off-by: YiFei Zhu <zhuyifei@google.com>
> ---
>   .../testing/selftests/bpf/prog_tests/cg_storage_multi.c  | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> index 621c57222191..3b0094a2a353 100644
> --- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> +++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
> @@ -7,6 +7,7 @@
>   #include <test_progs.h>
>   #include <cgroup_helpers.h>
>   #include <network_helpers.h>
> +#include <poll.h>
>   
>   #include "progs/cg_storage_multi.h"
>   
> @@ -56,8 +57,9 @@ static bool assert_storage_noexist(struct bpf_map *map, const void *key)
>   
>   static bool connect_send(const char *cgroup_path)
>   {
> -	bool res = true;
>   	int server_fd = -1, client_fd = -1;
> +	struct pollfd pollfd;
> +	bool res = true;
>   
>   	if (join_cgroup(cgroup_path))
>   		goto out_clean;
> @@ -73,6 +75,11 @@ static bool connect_send(const char *cgroup_path)
>   	if (send(client_fd, "message", strlen("message"), 0) < 0)
>   		goto out_clean;
>   
> +	pollfd.fd = server_fd;
> +	pollfd.events = POLLIN;
> +	if (poll(&pollfd, 1, 1000) != 1)
> +		goto out_clean;

Thanks for the fix. The slowness explanation makes sense.

A nit. All start_server() has a 3s SO_RCVTIMEO by default. How about a read() 
here instead of a poll(). Easier to change the default read timeout for all 
tests if needed.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-04-03 23:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-03 21:58 [PATCH bpf] selftests/bpf: Poll for receive in cg_storage_multi test YiFei Zhu
2023-04-03 22:16 ` Stanislav Fomichev
2023-04-03 23:44 ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).