bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 bpf] selftests/bpf: Add exponential backoff to map_update_retriable in test_maps
@ 2021-08-16 17:52 Yucong Sun
  2021-08-16 23:28 ` Song Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Yucong Sun @ 2021-08-16 17:52 UTC (permalink / raw)
  To: andrii; +Cc: sunyucong, bpf, Yucong Sun

Using a fixed delay of 1ms is proven flaky in slow CPU environment, eg.  github
action CI system. This patch adds exponential backoff with a cap of 50ms, to
reduce the flakyness of the test.

Signed-off-by: Yucong Sun <fallentree@fb.com>
---
 tools/testing/selftests/bpf/test_maps.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index 14cea869235b..ed92d56c19cf 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -1400,11 +1400,16 @@ static void test_map_stress(void)
 static int map_update_retriable(int map_fd, const void *key, const void *value,
 				int flags, int attempts)
 {
+	int delay = 1;
+
 	while (bpf_map_update_elem(map_fd, key, value, flags)) {
 		if (!attempts || (errno != EAGAIN && errno != EBUSY))
 			return -errno;
 
-		usleep(1);
+		if (delay < 50)
+			delay *= 2;
+
+		usleep(delay);
 		attempts--;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 bpf] selftests/bpf: Add exponential backoff to map_update_retriable in test_maps
  2021-08-16 17:52 [PATCH v1 bpf] selftests/bpf: Add exponential backoff to map_update_retriable in test_maps Yucong Sun
@ 2021-08-16 23:28 ` Song Liu
  2021-08-16 23:45   ` sunyucong
  0 siblings, 1 reply; 6+ messages in thread
From: Song Liu @ 2021-08-16 23:28 UTC (permalink / raw)
  To: Yucong Sun; +Cc: Andrii Nakryiko, sunyucong, bpf

On Mon, Aug 16, 2021 at 10:54 AM Yucong Sun <fallentree@fb.com> wrote:
>
> Using a fixed delay of 1ms is proven flaky in slow CPU environment, eg.  github
> action CI system. This patch adds exponential backoff with a cap of 50ms, to
> reduce the flakyness of the test.

Do we have data showing how flaky the test is before and after this change?

>
> Signed-off-by: Yucong Sun <fallentree@fb.com>
> ---
>  tools/testing/selftests/bpf/test_maps.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index 14cea869235b..ed92d56c19cf 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -1400,11 +1400,16 @@ static void test_map_stress(void)
>  static int map_update_retriable(int map_fd, const void *key, const void *value,
>                                 int flags, int attempts)
>  {
> +       int delay = 1;
> +
>         while (bpf_map_update_elem(map_fd, key, value, flags)) {
>                 if (!attempts || (errno != EAGAIN && errno != EBUSY))
>                         return -errno;
>
> -               usleep(1);
> +               if (delay < 50)
> +                       delay *= 2;
> +
> +               usleep(delay);

It is a little weird that the delay times in microseconds are 2, 4, 8,
16, 32, 64, 64, ...
Maybe just use rand()?

Thanks,
Song

>                 attempts--;
>         }
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 bpf] selftests/bpf: Add exponential backoff to map_update_retriable in test_maps
  2021-08-16 23:28 ` Song Liu
@ 2021-08-16 23:45   ` sunyucong
  2021-08-17  0:02     ` Song Liu
  2021-08-17  2:19     ` Andrii Nakryiko
  0 siblings, 2 replies; 6+ messages in thread
From: sunyucong @ 2021-08-16 23:45 UTC (permalink / raw)
  To: Song Liu; +Cc: Yucong Sun, Andrii Nakryiko, bpf

On Mon, Aug 16, 2021 at 4:28 PM Song Liu <song@kernel.org> wrote:
>
> On Mon, Aug 16, 2021 at 10:54 AM Yucong Sun <fallentree@fb.com> wrote:
> >
> > Using a fixed delay of 1ms is proven flaky in slow CPU environment, eg.  github
> > action CI system. This patch adds exponential backoff with a cap of 50ms, to
> > reduce the flakiness of the test.
>
> Do we have data showing how flaky the test is before and after this change?

Before the change, on 2 CPU KVM on my laptop the test is perfectly
fine, on Github action (2 emulated CPU) , it appeared to fail roughly
1 in 10 runs or even more frequently.
After the change, it appears pretty robust both on my laptop and on
github action, I ran the github action a couple times and it succeeded
every time.

>
> >
> > Signed-off-by: Yucong Sun <fallentree@fb.com>
> > ---
> >  tools/testing/selftests/bpf/test_maps.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> > index 14cea869235b..ed92d56c19cf 100644
> > --- a/tools/testing/selftests/bpf/test_maps.c
> > +++ b/tools/testing/selftests/bpf/test_maps.c
> > @@ -1400,11 +1400,16 @@ static void test_map_stress(void)
> >  static int map_update_retriable(int map_fd, const void *key, const void *value,
> >                                 int flags, int attempts)
> >  {
> > +       int delay = 1;
> > +
> >         while (bpf_map_update_elem(map_fd, key, value, flags)) {
> >                 if (!attempts || (errno != EAGAIN && errno != EBUSY))
> >                         return -errno;
> >
> > -               usleep(1);
> > +               if (delay < 50)
> > +                       delay *= 2;
> > +
> > +               usleep(delay);
>
> It is a little weird that the delay times in microseconds are 2, 4, 8,
> 16, 32, 64, 64, ...
> Maybe just use rand()?

map_update_retriable is called by test_map_update() , which is being
parallel executed in 1024 threads, so the lock contention is
intentional, I think if we introduce randomness in the delay it kind
of defeats the purpose of the test.
My original proposal is to just increase the attempts to 10X , Andrii
proposed to use an exponential back-off, which is what I ended up
implementing.

>
> Thanks,
> Song
>
> >                 attempts--;
> >         }
> >
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 bpf] selftests/bpf: Add exponential backoff to map_update_retriable in test_maps
  2021-08-16 23:45   ` sunyucong
@ 2021-08-17  0:02     ` Song Liu
  2021-08-17  2:19     ` Andrii Nakryiko
  1 sibling, 0 replies; 6+ messages in thread
From: Song Liu @ 2021-08-17  0:02 UTC (permalink / raw)
  To: sunyucong; +Cc: Yucong Sun, Andrii Nakryiko, bpf

On Mon, Aug 16, 2021 at 4:45 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
>
> On Mon, Aug 16, 2021 at 4:28 PM Song Liu <song@kernel.org> wrote:
> >
> > On Mon, Aug 16, 2021 at 10:54 AM Yucong Sun <fallentree@fb.com> wrote:
> > >
> > > Using a fixed delay of 1ms is proven flaky in slow CPU environment, eg.  github
> > > action CI system. This patch adds exponential backoff with a cap of 50ms, to
> > > reduce the flakiness of the test.
> >
> > Do we have data showing how flaky the test is before and after this change?
>
> Before the change, on 2 CPU KVM on my laptop the test is perfectly
> fine, on Github action (2 emulated CPU) , it appeared to fail roughly
> 1 in 10 runs or even more frequently.
> After the change, it appears pretty robust both on my laptop and on
> github action, I ran the github action a couple times and it succeeded
> every time.

Thanks for the data!

We should include this in the commit log. Maybe the maintainer could just
amend it when applying the patch.

>
> >
> > >
> > > Signed-off-by: Yucong Sun <fallentree@fb.com>
> > > ---
> > >  tools/testing/selftests/bpf/test_maps.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> > > index 14cea869235b..ed92d56c19cf 100644
> > > --- a/tools/testing/selftests/bpf/test_maps.c
> > > +++ b/tools/testing/selftests/bpf/test_maps.c
> > > @@ -1400,11 +1400,16 @@ static void test_map_stress(void)
> > >  static int map_update_retriable(int map_fd, const void *key, const void *value,
> > >                                 int flags, int attempts)
> > >  {
> > > +       int delay = 1;
> > > +
> > >         while (bpf_map_update_elem(map_fd, key, value, flags)) {
> > >                 if (!attempts || (errno != EAGAIN && errno != EBUSY))
> > >                         return -errno;
> > >
> > > -               usleep(1);
> > > +               if (delay < 50)
> > > +                       delay *= 2;
> > > +
> > > +               usleep(delay);
> >
> > It is a little weird that the delay times in microseconds are 2, 4, 8,
> > 16, 32, 64, 64, ...
> > Maybe just use rand()?
>
> map_update_retriable is called by test_map_update() , which is being
> parallel executed in 1024 threads, so the lock contention is
> intentional, I think if we introduce randomness in the delay it kind
> of defeats the purpose of the test.
> My original proposal is to just increase the attempts to 10X , Andrii
> proposed to use an exponential back-off, which is what I ended up
> implementing.

If that is what we agreed on, it works.

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 bpf] selftests/bpf: Add exponential backoff to map_update_retriable in test_maps
  2021-08-16 23:45   ` sunyucong
  2021-08-17  0:02     ` Song Liu
@ 2021-08-17  2:19     ` Andrii Nakryiko
  2021-08-17  4:15       ` Andrii Nakryiko
  1 sibling, 1 reply; 6+ messages in thread
From: Andrii Nakryiko @ 2021-08-17  2:19 UTC (permalink / raw)
  To: sunyucong; +Cc: Song Liu, Yucong Sun, Andrii Nakryiko, bpf

On Mon, Aug 16, 2021 at 4:45 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
>
> On Mon, Aug 16, 2021 at 4:28 PM Song Liu <song@kernel.org> wrote:
> >
> > On Mon, Aug 16, 2021 at 10:54 AM Yucong Sun <fallentree@fb.com> wrote:
> > >
> > > Using a fixed delay of 1ms is proven flaky in slow CPU environment, eg.  github
> > > action CI system. This patch adds exponential backoff with a cap of 50ms, to
> > > reduce the flakiness of the test.
> >
> > Do we have data showing how flaky the test is before and after this change?
>
> Before the change, on 2 CPU KVM on my laptop the test is perfectly
> fine, on Github action (2 emulated CPU) , it appeared to fail roughly
> 1 in 10 runs or even more frequently.
> After the change, it appears pretty robust both on my laptop and on
> github action, I ran the github action a couple times and it succeeded
> every time.
>
> >
> > >
> > > Signed-off-by: Yucong Sun <fallentree@fb.com>
> > > ---
> > >  tools/testing/selftests/bpf/test_maps.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> > > index 14cea869235b..ed92d56c19cf 100644
> > > --- a/tools/testing/selftests/bpf/test_maps.c
> > > +++ b/tools/testing/selftests/bpf/test_maps.c
> > > @@ -1400,11 +1400,16 @@ static void test_map_stress(void)
> > >  static int map_update_retriable(int map_fd, const void *key, const void *value,
> > >                                 int flags, int attempts)
> > >  {
> > > +       int delay = 1;
> > > +
> > >         while (bpf_map_update_elem(map_fd, key, value, flags)) {
> > >                 if (!attempts || (errno != EAGAIN && errno != EBUSY))
> > >                         return -errno;
> > >
> > > -               usleep(1);
> > > +               if (delay < 50)
> > > +                       delay *= 2;
> > > +
> > > +               usleep(delay);
> >
> > It is a little weird that the delay times in microseconds are 2, 4, 8,
> > 16, 32, 64, 64, ...
> > Maybe just use rand()?
>
> map_update_retriable is called by test_map_update() , which is being
> parallel executed in 1024 threads, so the lock contention is
> intentional, I think if we introduce randomness in the delay it kind
> of defeats the purpose of the test.

Not really, the purpose of the test is to test a lot of concurrent
updates with some collisions. It doesn't matter if there is one
collision or 20. We will still get an initial collision (we only
usleep() after the initial attempt fails with EAGAIN or EBUSY) even
with randomized initial sleep *after* the first collision, but after
that we should make sure that test eventually completes, so for that
random initial delay is a good thing.

I reworked the code a bit, added initial random delay between [0ms,
5ms), and then actually capped delay under 50ms (it can't go to
>50ms). Note also that your code is actually working with microseconds
(usleep() takes microseconds), while commit was actually talking about
milliseconds.

Applied to bpf-next, thanks.

> My original proposal is to just increase the attempts to 10X , Andrii
> proposed to use an exponential back-off, which is what I ended up
> implementing.
>
> >
> > Thanks,
> > Song
> >
> > >                 attempts--;
> > >         }
> > >
> > > --
> > > 2.30.2
> > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 bpf] selftests/bpf: Add exponential backoff to map_update_retriable in test_maps
  2021-08-17  2:19     ` Andrii Nakryiko
@ 2021-08-17  4:15       ` Andrii Nakryiko
  0 siblings, 0 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2021-08-17  4:15 UTC (permalink / raw)
  To: sunyucong; +Cc: Song Liu, Yucong Sun, Andrii Nakryiko, bpf

On Mon, Aug 16, 2021 at 7:19 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Aug 16, 2021 at 4:45 PM sunyucong@gmail.com <sunyucong@gmail.com> wrote:
> >
> > On Mon, Aug 16, 2021 at 4:28 PM Song Liu <song@kernel.org> wrote:
> > >
> > > On Mon, Aug 16, 2021 at 10:54 AM Yucong Sun <fallentree@fb.com> wrote:
> > > >
> > > > Using a fixed delay of 1ms is proven flaky in slow CPU environment, eg.  github
> > > > action CI system. This patch adds exponential backoff with a cap of 50ms, to
> > > > reduce the flakiness of the test.
> > >
> > > Do we have data showing how flaky the test is before and after this change?
> >
> > Before the change, on 2 CPU KVM on my laptop the test is perfectly
> > fine, on Github action (2 emulated CPU) , it appeared to fail roughly
> > 1 in 10 runs or even more frequently.
> > After the change, it appears pretty robust both on my laptop and on
> > github action, I ran the github action a couple times and it succeeded
> > every time.
> >
> > >
> > > >
> > > > Signed-off-by: Yucong Sun <fallentree@fb.com>
> > > > ---
> > > >  tools/testing/selftests/bpf/test_maps.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> > > > index 14cea869235b..ed92d56c19cf 100644
> > > > --- a/tools/testing/selftests/bpf/test_maps.c
> > > > +++ b/tools/testing/selftests/bpf/test_maps.c
> > > > @@ -1400,11 +1400,16 @@ static void test_map_stress(void)
> > > >  static int map_update_retriable(int map_fd, const void *key, const void *value,
> > > >                                 int flags, int attempts)
> > > >  {
> > > > +       int delay = 1;
> > > > +
> > > >         while (bpf_map_update_elem(map_fd, key, value, flags)) {
> > > >                 if (!attempts || (errno != EAGAIN && errno != EBUSY))
> > > >                         return -errno;
> > > >
> > > > -               usleep(1);
> > > > +               if (delay < 50)
> > > > +                       delay *= 2;
> > > > +
> > > > +               usleep(delay);
> > >
> > > It is a little weird that the delay times in microseconds are 2, 4, 8,
> > > 16, 32, 64, 64, ...
> > > Maybe just use rand()?
> >
> > map_update_retriable is called by test_map_update() , which is being
> > parallel executed in 1024 threads, so the lock contention is
> > intentional, I think if we introduce randomness in the delay it kind
> > of defeats the purpose of the test.
>
> Not really, the purpose of the test is to test a lot of concurrent
> updates with some collisions. It doesn't matter if there is one
> collision or 20. We will still get an initial collision (we only
> usleep() after the initial attempt fails with EAGAIN or EBUSY) even
> with randomized initial sleep *after* the first collision, but after
> that we should make sure that test eventually completes, so for that
> random initial delay is a good thing.
>
> I reworked the code a bit, added initial random delay between [0ms,
> 5ms), and then actually capped delay under 50ms (it can't go to
> >50ms). Note also that your code is actually working with microseconds
> (usleep() takes microseconds), while commit was actually talking about
> milliseconds.
>
> Applied to bpf-next, thanks.

Fun, two out of three test runs now fail with (see [0], for an example):

  error -16 16
  test_maps: test_maps.c:1374: test_update_delete: Assertion `err == 0' failed.
  test_maps: test_maps.c:1312: __run_parallel: Assertion `status == 0' failed.
  ./libbpf/travis-ci/vmtest/run_selftests.sh: line 19:  1226 Aborted
              ./test_maps


Seems like we need the same trick for bpf_map_delete_elem() calls?..


  [0] https://github.com/kernel-patches/bpf/pull/1632/checks?check_run_id=3345916773

>
> > My original proposal is to just increase the attempts to 10X , Andrii
> > proposed to use an exponential back-off, which is what I ended up
> > implementing.
> >
> > >
> > > Thanks,
> > > Song
> > >
> > > >                 attempts--;
> > > >         }
> > > >
> > > > --
> > > > 2.30.2
> > > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-17  4:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16 17:52 [PATCH v1 bpf] selftests/bpf: Add exponential backoff to map_update_retriable in test_maps Yucong Sun
2021-08-16 23:28 ` Song Liu
2021-08-16 23:45   ` sunyucong
2021-08-17  0:02     ` Song Liu
2021-08-17  2:19     ` Andrii Nakryiko
2021-08-17  4:15       ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).