All of lore.kernel.org
 help / color / mirror / Atom feed
* Selftest failures related to kern_sync_rcu()
@ 2021-04-08 19:34 Toke Høiland-Jørgensen
  2021-04-13  3:38 ` Andrii Nakryiko
  0 siblings, 1 reply; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-08 19:34 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf

Hi Andrii

I'm getting some selftest failures that all seem to have something to do
with kern_sync_rcu() not being enough to trigger the kernel events that
the selftest expects:

$ ./test_progs | grep FAIL
test_lookup_update:FAIL:map1_leak inner_map1 leaked!
#15/1 lookup_update:FAIL
#15 btf_map_in_map:FAIL
test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
#123/2 exit_creds:FAIL
#123 task_local_storage:FAIL
test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
#123/2 exit_creds:FAIL
#123 task_local_storage:FAIL

They are all fixed by adding a sleep(1) after the call(s) to
kern_sync_rcu(), so I'm guessing it's some kind of
timing/synchronisation problem. Is there a particular kernel config
that's needed for the membarrier syscall trick to work? I've tried with
various settings of PREEMPT and that doesn't really seem to make any
difference...

-Toke


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-08 19:34 Selftest failures related to kern_sync_rcu() Toke Høiland-Jørgensen
@ 2021-04-13  3:38 ` Andrii Nakryiko
  2021-04-13  8:50   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 16+ messages in thread
From: Andrii Nakryiko @ 2021-04-13  3:38 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bpf

On Thu, Apr 8, 2021 at 12:34 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Hi Andrii
>
> I'm getting some selftest failures that all seem to have something to do
> with kern_sync_rcu() not being enough to trigger the kernel events that
> the selftest expects:
>
> $ ./test_progs | grep FAIL
> test_lookup_update:FAIL:map1_leak inner_map1 leaked!
> #15/1 lookup_update:FAIL
> #15 btf_map_in_map:FAIL
> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> #123/2 exit_creds:FAIL
> #123 task_local_storage:FAIL
> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> #123/2 exit_creds:FAIL
> #123 task_local_storage:FAIL
>
> They are all fixed by adding a sleep(1) after the call(s) to
> kern_sync_rcu(), so I'm guessing it's some kind of
> timing/synchronisation problem. Is there a particular kernel config
> that's needed for the membarrier syscall trick to work? I've tried with
> various settings of PREEMPT and that doesn't really seem to make any
> difference...
>

If you check kern_sync_rcu(), it relies on membarrier() syscall
(passing cmd = MEMBARRIER_CMD_SHARED == MEMBARRIER_CMD_GLOBAL).
Now, looking at kernel sources:
  - CONFIG_MEMBARRIER should be enabled for that syscall;
  - it has some extra conditions:

           case MEMBARRIER_CMD_GLOBAL:
                /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
                if (tick_nohz_full_enabled())
                        return -EINVAL;
                if (num_online_cpus() > 1)
                        synchronize_rcu();
                return 0;

Could it be that one of those conditions is not satisfied?


> -Toke
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-13  3:38 ` Andrii Nakryiko
@ 2021-04-13  8:50   ` Toke Høiland-Jørgensen
  2021-04-13 21:43     ` Andrii Nakryiko
  0 siblings, 1 reply; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-13  8:50 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Thu, Apr 8, 2021 at 12:34 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Hi Andrii
>>
>> I'm getting some selftest failures that all seem to have something to do
>> with kern_sync_rcu() not being enough to trigger the kernel events that
>> the selftest expects:
>>
>> $ ./test_progs | grep FAIL
>> test_lookup_update:FAIL:map1_leak inner_map1 leaked!
>> #15/1 lookup_update:FAIL
>> #15 btf_map_in_map:FAIL
>> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
>> #123/2 exit_creds:FAIL
>> #123 task_local_storage:FAIL
>> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
>> #123/2 exit_creds:FAIL
>> #123 task_local_storage:FAIL
>>
>> They are all fixed by adding a sleep(1) after the call(s) to
>> kern_sync_rcu(), so I'm guessing it's some kind of
>> timing/synchronisation problem. Is there a particular kernel config
>> that's needed for the membarrier syscall trick to work? I've tried with
>> various settings of PREEMPT and that doesn't really seem to make any
>> difference...
>>
>
> If you check kern_sync_rcu(), it relies on membarrier() syscall
> (passing cmd = MEMBARRIER_CMD_SHARED == MEMBARRIER_CMD_GLOBAL).
> Now, looking at kernel sources:
>   - CONFIG_MEMBARRIER should be enabled for that syscall;
>   - it has some extra conditions:
>
>            case MEMBARRIER_CMD_GLOBAL:
>                 /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
>                 if (tick_nohz_full_enabled())
>                         return -EINVAL;
>                 if (num_online_cpus() > 1)
>                         synchronize_rcu();
>                 return 0;
>
> Could it be that one of those conditions is not satisfied?

Aha, bingo! Found the membarrier syscall stuff, but for some reason
didn't think to actually read the code of it; and I was running this in
a VM with a single CPU, adding another fixed this. Thanks! :)

Do you think we could detect this in the tests? I suppose the
tick_nohz_full_enabled() check should already result in a visible
failure since that makes the syscall fail; but the CPU thing is silent,
so it would be nice with a hint. Could kern_sync_rcu() check the CPU
count and print a warning or fail if it is 1? Or maybe just straight up
fall back to sleep()'ing?

-Toke


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-13  8:50   ` Toke Høiland-Jørgensen
@ 2021-04-13 21:43     ` Andrii Nakryiko
  2021-04-14 15:54       ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Andrii Nakryiko @ 2021-04-13 21:43 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: bpf

On Tue, Apr 13, 2021 at 1:50 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Thu, Apr 8, 2021 at 12:34 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Hi Andrii
> >>
> >> I'm getting some selftest failures that all seem to have something to do
> >> with kern_sync_rcu() not being enough to trigger the kernel events that
> >> the selftest expects:
> >>
> >> $ ./test_progs | grep FAIL
> >> test_lookup_update:FAIL:map1_leak inner_map1 leaked!
> >> #15/1 lookup_update:FAIL
> >> #15 btf_map_in_map:FAIL
> >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> >> #123/2 exit_creds:FAIL
> >> #123 task_local_storage:FAIL
> >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> >> #123/2 exit_creds:FAIL
> >> #123 task_local_storage:FAIL
> >>
> >> They are all fixed by adding a sleep(1) after the call(s) to
> >> kern_sync_rcu(), so I'm guessing it's some kind of
> >> timing/synchronisation problem. Is there a particular kernel config
> >> that's needed for the membarrier syscall trick to work? I've tried with
> >> various settings of PREEMPT and that doesn't really seem to make any
> >> difference...
> >>
> >
> > If you check kern_sync_rcu(), it relies on membarrier() syscall
> > (passing cmd = MEMBARRIER_CMD_SHARED == MEMBARRIER_CMD_GLOBAL).
> > Now, looking at kernel sources:
> >   - CONFIG_MEMBARRIER should be enabled for that syscall;
> >   - it has some extra conditions:
> >
> >            case MEMBARRIER_CMD_GLOBAL:
> >                 /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
> >                 if (tick_nohz_full_enabled())
> >                         return -EINVAL;
> >                 if (num_online_cpus() > 1)
> >                         synchronize_rcu();
> >                 return 0;
> >
> > Could it be that one of those conditions is not satisfied?
>
> Aha, bingo! Found the membarrier syscall stuff, but for some reason
> didn't think to actually read the code of it; and I was running this in
> a VM with a single CPU, adding another fixed this. Thanks! :)
>
> Do you think we could detect this in the tests? I suppose the
> tick_nohz_full_enabled() check should already result in a visible
> failure since that makes the syscall fail; but the CPU thing is silent,
> so it would be nice with a hint. Could kern_sync_rcu() check the CPU
> count and print a warning or fail if it is 1? Or maybe just straight up
> fall back to sleep()'ing?

If membarrier() is unreliable, I guess we can just go back to the
previous way of triggering synchronize_rcu() (create and update
map-in-map element)? See 635599bace25 ("selftests/bpf: Sync RCU before
unloading bpf_testmod") that removed that in favor of membarrier()
syscall.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-13 21:43     ` Andrii Nakryiko
@ 2021-04-14 15:54       ` Alexei Starovoitov
  2021-04-14 17:52         ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2021-04-14 15:54 UTC (permalink / raw)
  To: Andrii Nakryiko, Paul E. McKenney; +Cc: Toke Høiland-Jørgensen, bpf

On Tue, Apr 13, 2021 at 11:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Apr 13, 2021 at 1:50 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >
> > > On Thu, Apr 8, 2021 at 12:34 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >>
> > >> Hi Andrii
> > >>
> > >> I'm getting some selftest failures that all seem to have something to do
> > >> with kern_sync_rcu() not being enough to trigger the kernel events that
> > >> the selftest expects:
> > >>
> > >> $ ./test_progs | grep FAIL
> > >> test_lookup_update:FAIL:map1_leak inner_map1 leaked!
> > >> #15/1 lookup_update:FAIL
> > >> #15 btf_map_in_map:FAIL
> > >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> > >> #123/2 exit_creds:FAIL
> > >> #123 task_local_storage:FAIL
> > >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0
> > >> #123/2 exit_creds:FAIL
> > >> #123 task_local_storage:FAIL
> > >>
> > >> They are all fixed by adding a sleep(1) after the call(s) to
> > >> kern_sync_rcu(), so I'm guessing it's some kind of
> > >> timing/synchronisation problem. Is there a particular kernel config
> > >> that's needed for the membarrier syscall trick to work? I've tried with
> > >> various settings of PREEMPT and that doesn't really seem to make any
> > >> difference...
> > >>
> > >
> > > If you check kern_sync_rcu(), it relies on membarrier() syscall
> > > (passing cmd = MEMBARRIER_CMD_SHARED == MEMBARRIER_CMD_GLOBAL).
> > > Now, looking at kernel sources:
> > >   - CONFIG_MEMBARRIER should be enabled for that syscall;
> > >   - it has some extra conditions:
> > >
> > >            case MEMBARRIER_CMD_GLOBAL:
> > >                 /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
> > >                 if (tick_nohz_full_enabled())
> > >                         return -EINVAL;
> > >                 if (num_online_cpus() > 1)
> > >                         synchronize_rcu();
> > >                 return 0;
> > >
> > > Could it be that one of those conditions is not satisfied?
> >
> > Aha, bingo! Found the membarrier syscall stuff, but for some reason
> > didn't think to actually read the code of it; and I was running this in
> > a VM with a single CPU, adding another fixed this. Thanks! :)
> >
> > Do you think we could detect this in the tests? I suppose the
> > tick_nohz_full_enabled() check should already result in a visible
> > failure since that makes the syscall fail; but the CPU thing is silent,
> > so it would be nice with a hint. Could kern_sync_rcu() check the CPU
> > count and print a warning or fail if it is 1? Or maybe just straight up
> > fall back to sleep()'ing?
>
> If membarrier() is unreliable, I guess we can just go back to the
> previous way of triggering synchronize_rcu() (create and update
> map-in-map element)? See 635599bace25 ("selftests/bpf: Sync RCU before
> unloading bpf_testmod") that removed that in favor of membarrier()
> syscall.

maybe create+free socket_local_storage map ? Few syscalls less.
I guess map_in_map is fine too.

Paul,
What do you suggest to trigger synchronize_rcu() from user space?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 15:54       ` Alexei Starovoitov
@ 2021-04-14 17:52         ` Paul E. McKenney
  2021-04-14 17:59           ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-04-14 17:52 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Andrii Nakryiko, Toke Høiland-Jørgensen, bpf

On Wed, Apr 14, 2021 at 08:54:03AM -0700, Alexei Starovoitov wrote:
> On Tue, Apr 13, 2021 at 11:58 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Apr 13, 2021 at 1:50 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> > >
> > > > On Thu, Apr 8, 2021 at 12:34 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > > >>
> > > >> Hi Andrii
> > > >>
> > > >> I'm getting some selftest failures that all seem to have something to do
> > > >> with kern_sync_rcu() not being enough to trigger the kernel events that
> > > >> the selftest expects:
> > > >>
> > > >> $ ./test_progs | grep FAIL
> > > >> test_lookup_update:FAIL:map1_leak inner_map1 leaked!
> > > >> #15/1 lookup_update:FAIL
> > > >> #15 btf_map_in_map:FAIL
> > > >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0

You lost me on this one.  If actual equals expected, why the failure?
Or is this a case where the test need to capture the value so as to
compare and print the same thing?

> > > >> #123/2 exit_creds:FAIL
> > > >> #123 task_local_storage:FAIL
> > > >> test_exit_creds:FAIL:null_ptr_count unexpected null_ptr_count: actual 0 == expected 0

Same for this one.

> > > >> #123/2 exit_creds:FAIL
> > > >> #123 task_local_storage:FAIL
> > > >>
> > > >> They are all fixed by adding a sleep(1) after the call(s) to
> > > >> kern_sync_rcu(), so I'm guessing it's some kind of
> > > >> timing/synchronisation problem. Is there a particular kernel config
> > > >> that's needed for the membarrier syscall trick to work? I've tried with
> > > >> various settings of PREEMPT and that doesn't really seem to make any
> > > >> difference...
> > > >>
> > > >
> > > > If you check kern_sync_rcu(), it relies on membarrier() syscall
> > > > (passing cmd = MEMBARRIER_CMD_SHARED == MEMBARRIER_CMD_GLOBAL).
> > > > Now, looking at kernel sources:
> > > >   - CONFIG_MEMBARRIER should be enabled for that syscall;
> > > >   - it has some extra conditions:
> > > >
> > > >            case MEMBARRIER_CMD_GLOBAL:
> > > >                 /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
> > > >                 if (tick_nohz_full_enabled())
> > > >                         return -EINVAL;

This one has effect only in kernels built with CONFIG_NO_HZ_FULL=y.

The reason for this check is that RCU sees nohz_full userspace execution
the same as it sees idle, so a synchronize_rcu() is not (repeat, not)
guaranteed to provide order across any of the nohz_full userspace threads.
This lack of guarantee applies to all CONFIG_NO_HZ_FULL=y kernel builds
and to all userspace threads running on nohz_full CPUs.

So if you build your kernel with CONFIG_NO_HZ_FULL=y, and boot with
nohz_full=2-7, then the membarrier() system call has no way of providing
ordering to userspace threads running on CPUs 2, 3, 4, 5, 6, and 7.

Hence the -EINVAL.

In theory, we could use SRCU or similar, but there is much about the
interaction of membarrier() and the entry/exit code that I have long
since forgotten, so in practice, who knows?

Also in practice, in a CONFIG_PREEMPT=y kernel, synchronize_rcu() will
impose some delay, and that delay might well be sufficient to trick the
tests into passing, despite the fact that there is no guarantee.

> > > >                 if (num_online_cpus() > 1)
> > > >                         synchronize_rcu();

In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
synchronize_rcu() will be a no-op anyway due to there only being the
one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
and in tests where preemption could result in the observed failures?

Could you please send your .config file, or at least the relevant portions
of it?

> > > >                 return 0;
> > > >
> > > > Could it be that one of those conditions is not satisfied?
> > >
> > > Aha, bingo! Found the membarrier syscall stuff, but for some reason
> > > didn't think to actually read the code of it; and I was running this in
> > > a VM with a single CPU, adding another fixed this. Thanks! :)
> > >
> > > Do you think we could detect this in the tests? I suppose the
> > > tick_nohz_full_enabled() check should already result in a visible
> > > failure since that makes the syscall fail; but the CPU thing is silent,
> > > so it would be nice with a hint. Could kern_sync_rcu() check the CPU
> > > count and print a warning or fail if it is 1? Or maybe just straight up
> > > fall back to sleep()'ing?

Given that you have but one CPU, things are pretty well ordered.
Of course, userspace code can be preempted even in CONFIG_PREEMPT_NONE=y
kernels, but in that case synchronize_rcu() won't add any delays.

At this point, I am a bit confused about what is going on here.

> > If membarrier() is unreliable, I guess we can just go back to the
> > previous way of triggering synchronize_rcu() (create and update
> > map-in-map element)? See 635599bace25 ("selftests/bpf: Sync RCU before
> > unloading bpf_testmod") that removed that in favor of membarrier()
> > syscall.
> 
> maybe create+free socket_local_storage map ? Few syscalls less.
> I guess map_in_map is fine too.
> 
> Paul,
> What do you suggest to trigger synchronize_rcu() from user space?

My first suggestion is to make sure that we understand the problem.
Maybe it is only me who is confused, but in that case please unconfuse me.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 17:52         ` Paul E. McKenney
@ 2021-04-14 17:59           ` Alexei Starovoitov
  2021-04-14 18:19             ` Paul E. McKenney
  2021-04-14 18:27             ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2021-04-14 17:59 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Andrii Nakryiko, Toke Høiland-Jørgensen, bpf

On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> > > > >                 if (num_online_cpus() > 1)
> > > > >                         synchronize_rcu();
>
> In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
> synchronize_rcu() will be a no-op anyway due to there only being the
> one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
> and in tests where preemption could result in the observed failures?
>
> Could you please send your .config file, or at least the relevant portions
> of it?

That's my understanding as well. I assumed Toke has preempt=y.
Otherwise the whole thing needs to be root caused properly.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 17:59           ` Alexei Starovoitov
@ 2021-04-14 18:19             ` Paul E. McKenney
  2021-04-14 18:39               ` Toke Høiland-Jørgensen
  2021-04-14 18:27             ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-04-14 18:19 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Andrii Nakryiko, Toke Høiland-Jørgensen, bpf

On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > > > > >                 if (num_online_cpus() > 1)
> > > > > >                         synchronize_rcu();
> >
> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
> > synchronize_rcu() will be a no-op anyway due to there only being the
> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
> > and in tests where preemption could result in the observed failures?
> >
> > Could you please send your .config file, or at least the relevant portions
> > of it?
> 
> That's my understanding as well. I assumed Toke has preempt=y.
> Otherwise the whole thing needs to be root caused properly.

Given that there is only a single CPU, I am still confused about what
the tests are expecting the membarrier() system call to do for them.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 17:59           ` Alexei Starovoitov
  2021-04-14 18:19             ` Paul E. McKenney
@ 2021-04-14 18:27             ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-14 18:27 UTC (permalink / raw)
  To: Alexei Starovoitov, Paul E. McKenney; +Cc: Andrii Nakryiko, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>>
>> > > > >                 if (num_online_cpus() > 1)
>> > > > >                         synchronize_rcu();
>>
>> In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
>> synchronize_rcu() will be a no-op anyway due to there only being the
>> one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
>> and in tests where preemption could result in the observed failures?
>>
>> Could you please send your .config file, or at least the relevant portions
>> of it?
>
> That's my understanding as well. I assumed Toke has preempt=y.
> Otherwise the whole thing needs to be root caused properly.

Running with a single CPU fails, with multiple CPUs succeeds.
Happens without PREEMPT as well:

$ egrep 'HZ|PREEMPT|RCU' .config
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# RCU Subsystem
CONFIG_TREE_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_RCU_NOCB_CPU is not set
# CONFIG_TASKS_TRACE_RCU_READ_MB is not set
# end of RCU Subsystem
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_MACHZ_WDT is not set
# RCU Debugging
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# CONFIG_RCU_STRICT_GRACE_PERIOD is not set
# end of RCU Debugging
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Anything else you need from .config?

-Toke


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 18:19             ` Paul E. McKenney
@ 2021-04-14 18:39               ` Toke Høiland-Jørgensen
  2021-04-14 18:41                 ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-14 18:39 UTC (permalink / raw)
  To: paulmck, Alexei Starovoitov; +Cc: Andrii Nakryiko, bpf

"Paul E. McKenney" <paulmck@kernel.org> writes:

> On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
>> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>> >
>> > > > > >                 if (num_online_cpus() > 1)
>> > > > > >                         synchronize_rcu();
>> >
>> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
>> > synchronize_rcu() will be a no-op anyway due to there only being the
>> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
>> > and in tests where preemption could result in the observed failures?
>> >
>> > Could you please send your .config file, or at least the relevant portions
>> > of it?
>> 
>> That's my understanding as well. I assumed Toke has preempt=y.
>> Otherwise the whole thing needs to be root caused properly.
>
> Given that there is only a single CPU, I am still confused about what
> the tests are expecting the membarrier() system call to do for them.

It's basically a proxy for waiting until the objects are freed on the
kernel side, as far as I understand...

-Toke


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 18:39               ` Toke Høiland-Jørgensen
@ 2021-04-14 18:41                 ` Paul E. McKenney
  2021-04-14 19:18                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-04-14 18:41 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Alexei Starovoitov, Andrii Nakryiko, bpf

On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote:
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> 
> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >> >
> >> > > > > >                 if (num_online_cpus() > 1)
> >> > > > > >                         synchronize_rcu();
> >> >
> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
> >> > synchronize_rcu() will be a no-op anyway due to there only being the
> >> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
> >> > and in tests where preemption could result in the observed failures?
> >> >
> >> > Could you please send your .config file, or at least the relevant portions
> >> > of it?
> >> 
> >> That's my understanding as well. I assumed Toke has preempt=y.
> >> Otherwise the whole thing needs to be root caused properly.
> >
> > Given that there is only a single CPU, I am still confused about what
> > the tests are expecting the membarrier() system call to do for them.
> 
> It's basically a proxy for waiting until the objects are freed on the
> kernel side, as far as I understand...

There are in-kernel objects that are freed via call_rcu(), and the idea
is to wait until these objects really are freed?  Or am I still missing
out on what is going on?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 18:41                 ` Paul E. McKenney
@ 2021-04-14 19:18                   ` Toke Høiland-Jørgensen
  2021-04-14 21:25                     ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-14 19:18 UTC (permalink / raw)
  To: paulmck; +Cc: Alexei Starovoitov, Andrii Nakryiko, bpf

"Paul E. McKenney" <paulmck@kernel.org> writes:

> On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote:
>> "Paul E. McKenney" <paulmck@kernel.org> writes:
>> 
>> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
>> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>> >> >
>> >> > > > > >                 if (num_online_cpus() > 1)
>> >> > > > > >                         synchronize_rcu();
>> >> >
>> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
>> >> > synchronize_rcu() will be a no-op anyway due to there only being the
>> >> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
>> >> > and in tests where preemption could result in the observed failures?
>> >> >
>> >> > Could you please send your .config file, or at least the relevant portions
>> >> > of it?
>> >> 
>> >> That's my understanding as well. I assumed Toke has preempt=y.
>> >> Otherwise the whole thing needs to be root caused properly.
>> >
>> > Given that there is only a single CPU, I am still confused about what
>> > the tests are expecting the membarrier() system call to do for them.
>> 
>> It's basically a proxy for waiting until the objects are freed on the
>> kernel side, as far as I understand...
>
> There are in-kernel objects that are freed via call_rcu(), and the idea
> is to wait until these objects really are freed?  Or am I still missing
> out on what is going on?

Something like that? Although I'm not actually sure these are using
call_rcu()? One of them needs __put_task_struct() to run, and the other
waits for map freeing, with this comment:


	/* we need to either wait for or force synchronize_rcu(), before
	 * checking for "still exists" condition, otherwise map could still be
	 * resolvable by ID, causing false positives.
	 *
	 * Older kernels (5.8 and earlier) freed map only after two
	 * synchronize_rcu()s, so trigger two, to be entirely sure.
	 */
	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");


-Toke


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 19:18                   ` Toke Høiland-Jørgensen
@ 2021-04-14 21:25                     ` Paul E. McKenney
  2021-04-14 22:13                       ` Andrii Nakryiko
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-04-14 21:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Alexei Starovoitov, Andrii Nakryiko, bpf

On Wed, Apr 14, 2021 at 09:18:09PM +0200, Toke Høiland-Jørgensen wrote:
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> 
> > On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote:
> >> "Paul E. McKenney" <paulmck@kernel.org> writes:
> >> 
> >> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
> >> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >> >> >
> >> >> > > > > >                 if (num_online_cpus() > 1)
> >> >> > > > > >                         synchronize_rcu();
> >> >> >
> >> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
> >> >> > synchronize_rcu() will be a no-op anyway due to there only being the
> >> >> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
> >> >> > and in tests where preemption could result in the observed failures?
> >> >> >
> >> >> > Could you please send your .config file, or at least the relevant portions
> >> >> > of it?
> >> >> 
> >> >> That's my understanding as well. I assumed Toke has preempt=y.
> >> >> Otherwise the whole thing needs to be root caused properly.
> >> >
> >> > Given that there is only a single CPU, I am still confused about what
> >> > the tests are expecting the membarrier() system call to do for them.
> >> 
> >> It's basically a proxy for waiting until the objects are freed on the
> >> kernel side, as far as I understand...
> >
> > There are in-kernel objects that are freed via call_rcu(), and the idea
> > is to wait until these objects really are freed?  Or am I still missing
> > out on what is going on?
> 
> Something like that? Although I'm not actually sure these are using
> call_rcu()? One of them needs __put_task_struct() to run, and the other
> waits for map freeing, with this comment:
> 
> 
> 	/* we need to either wait for or force synchronize_rcu(), before
> 	 * checking for "still exists" condition, otherwise map could still be
> 	 * resolvable by ID, causing false positives.
> 	 *
> 	 * Older kernels (5.8 and earlier) freed map only after two
> 	 * synchronize_rcu()s, so trigger two, to be entirely sure.
> 	 */
> 	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
> 	CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");

OK, so the issue is that the membarrier() system call is designed to force
ordering only within a user process, and you need it in the kernel.

Give or take my being puzzled as to why the membarrier() system call
doesn't do it for you on a CONFIG_PREEMPT_NONE=y system, this brings
us back to the question Alexei asked me in the first place, what is the
best way to invoke an in-kernel synchronize_rcu() from userspace?

You guys gave some reasonable examples.  Here are a few others:

o	Bring a CPU online, then force it offline, or vice versa.
	But in this case, sys_membarrier() would do what you need
	given more than one CPU.

o	Use the membarrier() system call, but require that the tests
	run on systems with at least two CPUs.

o	Create a kernel module whose init function does a
	synchronize_rcu() and then returns failure.  This will
	avoid the overhead of removing that kernel module.

o	Create a sysfs or debugfs interface that does a
	synchronize_rcu().

But I am still concerned that you are needing more than synchronize_rcu()
can do.  Otherwise, the membarrier() system call would work just fine
on a single CPU on your CONFIG_PREEMPT_VOLUNTARY=y kernel.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 21:25                     ` Paul E. McKenney
@ 2021-04-14 22:13                       ` Andrii Nakryiko
  2021-04-14 22:27                         ` Paul E. McKenney
  2021-04-14 22:47                         ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 16+ messages in thread
From: Andrii Nakryiko @ 2021-04-14 22:13 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, bpf

On Wed, Apr 14, 2021 at 2:25 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Apr 14, 2021 at 09:18:09PM +0200, Toke Høiland-Jørgensen wrote:
> > "Paul E. McKenney" <paulmck@kernel.org> writes:
> >
> > > On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote:
> > >> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > >>
> > >> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
> > >> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >> >> >
> > >> >> > > > > >                 if (num_online_cpus() > 1)
> > >> >> > > > > >                         synchronize_rcu();
> > >> >> >
> > >> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
> > >> >> > synchronize_rcu() will be a no-op anyway due to there only being the
> > >> >> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
> > >> >> > and in tests where preemption could result in the observed failures?
> > >> >> >
> > >> >> > Could you please send your .config file, or at least the relevant portions
> > >> >> > of it?
> > >> >>
> > >> >> That's my understanding as well. I assumed Toke has preempt=y.
> > >> >> Otherwise the whole thing needs to be root caused properly.
> > >> >
> > >> > Given that there is only a single CPU, I am still confused about what
> > >> > the tests are expecting the membarrier() system call to do for them.
> > >>
> > >> It's basically a proxy for waiting until the objects are freed on the
> > >> kernel side, as far as I understand...
> > >
> > > There are in-kernel objects that are freed via call_rcu(), and the idea
> > > is to wait until these objects really are freed?  Or am I still missing
> > > out on what is going on?
> >
> > Something like that? Although I'm not actually sure these are using
> > call_rcu()? One of them needs __put_task_struct() to run, and the other
> > waits for map freeing, with this comment:
> >
> >
> >       /* we need to either wait for or force synchronize_rcu(), before
> >        * checking for "still exists" condition, otherwise map could still be
> >        * resolvable by ID, causing false positives.
> >        *
> >        * Older kernels (5.8 and earlier) freed map only after two
> >        * synchronize_rcu()s, so trigger two, to be entirely sure.
> >        */
> >       CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
> >       CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
>
> OK, so the issue is that the membarrier() system call is designed to force
> ordering only within a user process, and you need it in the kernel.
>
> Give or take my being puzzled as to why the membarrier() system call
> doesn't do it for you on a CONFIG_PREEMPT_NONE=y system, this brings
> us back to the question Alexei asked me in the first place, what is the
> best way to invoke an in-kernel synchronize_rcu() from userspace?
>
> You guys gave some reasonable examples.  Here are a few others:
>
> o       Bring a CPU online, then force it offline, or vice versa.
>         But in this case, sys_membarrier() would do what you need
>         given more than one CPU.
>
> o       Use the membarrier() system call, but require that the tests
>         run on systems with at least two CPUs.
>
> o       Create a kernel module whose init function does a
>         synchronize_rcu() and then returns failure.  This will
>         avoid the overhead of removing that kernel module.
>
> o       Create a sysfs or debugfs interface that does a
>         synchronize_rcu().
>
> But I am still concerned that you are needing more than synchronize_rcu()
> can do.  Otherwise, the membarrier() system call would work just fine
> on a single CPU on your CONFIG_PREEMPT_VOLUNTARY=y kernel.

Selftests know internals of kernel implementation and wait for some
objects to be freed with call_rcu(). So I think at this point the best
way is just to go back to map-in-map or socket local storage.
Map-in-map will probably work on older kernels, so I'd stick with that
(plus all the code is there in the referenced commit). The performance
and number of syscalls performed doesn't matter, really.

>
>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 22:13                       ` Andrii Nakryiko
@ 2021-04-14 22:27                         ` Paul E. McKenney
  2021-04-14 22:47                         ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2021-04-14 22:27 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, bpf

On Wed, Apr 14, 2021 at 03:13:38PM -0700, Andrii Nakryiko wrote:
> On Wed, Apr 14, 2021 at 2:25 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Apr 14, 2021 at 09:18:09PM +0200, Toke Høiland-Jørgensen wrote:
> > > "Paul E. McKenney" <paulmck@kernel.org> writes:
> > >
> > > > On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote:
> > > >> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > > >>
> > > >> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
> > > >> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >> >> >
> > > >> >> > > > > >                 if (num_online_cpus() > 1)
> > > >> >> > > > > >                         synchronize_rcu();
> > > >> >> >
> > > >> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
> > > >> >> > synchronize_rcu() will be a no-op anyway due to there only being the
> > > >> >> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
> > > >> >> > and in tests where preemption could result in the observed failures?
> > > >> >> >
> > > >> >> > Could you please send your .config file, or at least the relevant portions
> > > >> >> > of it?
> > > >> >>
> > > >> >> That's my understanding as well. I assumed Toke has preempt=y.
> > > >> >> Otherwise the whole thing needs to be root caused properly.
> > > >> >
> > > >> > Given that there is only a single CPU, I am still confused about what
> > > >> > the tests are expecting the membarrier() system call to do for them.
> > > >>
> > > >> It's basically a proxy for waiting until the objects are freed on the
> > > >> kernel side, as far as I understand...
> > > >
> > > > There are in-kernel objects that are freed via call_rcu(), and the idea
> > > > is to wait until these objects really are freed?  Or am I still missing
> > > > out on what is going on?
> > >
> > > Something like that? Although I'm not actually sure these are using
> > > call_rcu()? One of them needs __put_task_struct() to run, and the other
> > > waits for map freeing, with this comment:
> > >
> > >
> > >       /* we need to either wait for or force synchronize_rcu(), before
> > >        * checking for "still exists" condition, otherwise map could still be
> > >        * resolvable by ID, causing false positives.
> > >        *
> > >        * Older kernels (5.8 and earlier) freed map only after two
> > >        * synchronize_rcu()s, so trigger two, to be entirely sure.
> > >        */
> > >       CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
> > >       CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
> >
> > OK, so the issue is that the membarrier() system call is designed to force
> > ordering only within a user process, and you need it in the kernel.
> >
> > Give or take my being puzzled as to why the membarrier() system call
> > doesn't do it for you on a CONFIG_PREEMPT_NONE=y system, this brings
> > us back to the question Alexei asked me in the first place, what is the
> > best way to invoke an in-kernel synchronize_rcu() from userspace?
> >
> > You guys gave some reasonable examples.  Here are a few others:
> >
> > o       Bring a CPU online, then force it offline, or vice versa.
> >         But in this case, sys_membarrier() would do what you need
> >         given more than one CPU.
> >
> > o       Use the membarrier() system call, but require that the tests
> >         run on systems with at least two CPUs.
> >
> > o       Create a kernel module whose init function does a
> >         synchronize_rcu() and then returns failure.  This will
> >         avoid the overhead of removing that kernel module.
> >
> > o       Create a sysfs or debugfs interface that does a
> >         synchronize_rcu().
> >
> > But I am still concerned that you are needing more than synchronize_rcu()
> > can do.  Otherwise, the membarrier() system call would work just fine
> > on a single CPU on your CONFIG_PREEMPT_VOLUNTARY=y kernel.
> 
> Selftests know internals of kernel implementation and wait for some
> objects to be freed with call_rcu(). So I think at this point the best
> way is just to go back to map-in-map or socket local storage.
> Map-in-map will probably work on older kernels, so I'd stick with that
> (plus all the code is there in the referenced commit). The performance
> and number of syscalls performed doesn't matter, really.

Ah!  If they need to wait for objects to be freed with call_rcu(), then
they need to make the kernel execute an rcu_barrier().  One way to make
this happen is to unmount an ext4 filesystem.  This would explain why
the membarrier() system call wasn't doing the job on single-CPU systems
even in kernels built with CONFIG_PREEMPT_VOLUNTARY=y.

But if you have a more direct way to wait the required period of time,
so much the better!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Selftest failures related to kern_sync_rcu()
  2021-04-14 22:13                       ` Andrii Nakryiko
  2021-04-14 22:27                         ` Paul E. McKenney
@ 2021-04-14 22:47                         ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-14 22:47 UTC (permalink / raw)
  To: Andrii Nakryiko, Paul E . McKenney; +Cc: Alexei Starovoitov, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Wed, Apr 14, 2021 at 2:25 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>>
>> On Wed, Apr 14, 2021 at 09:18:09PM +0200, Toke Høiland-Jørgensen wrote:
>> > "Paul E. McKenney" <paulmck@kernel.org> writes:
>> >
>> > > On Wed, Apr 14, 2021 at 08:39:04PM +0200, Toke Høiland-Jørgensen wrote:
>> > >> "Paul E. McKenney" <paulmck@kernel.org> writes:
>> > >>
>> > >> > On Wed, Apr 14, 2021 at 10:59:23AM -0700, Alexei Starovoitov wrote:
>> > >> >> On Wed, Apr 14, 2021 at 10:52 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>> > >> >> >
>> > >> >> > > > > >                 if (num_online_cpus() > 1)
>> > >> >> > > > > >                         synchronize_rcu();
>> > >> >> >
>> > >> >> > In CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_VOLUNTARY=y kernels, this
>> > >> >> > synchronize_rcu() will be a no-op anyway due to there only being the
>> > >> >> > one CPU.  Or are these failures all happening in CONFIG_PREEMPT=y kernels,
>> > >> >> > and in tests where preemption could result in the observed failures?
>> > >> >> >
>> > >> >> > Could you please send your .config file, or at least the relevant portions
>> > >> >> > of it?
>> > >> >>
>> > >> >> That's my understanding as well. I assumed Toke has preempt=y.
>> > >> >> Otherwise the whole thing needs to be root caused properly.
>> > >> >
>> > >> > Given that there is only a single CPU, I am still confused about what
>> > >> > the tests are expecting the membarrier() system call to do for them.
>> > >>
>> > >> It's basically a proxy for waiting until the objects are freed on the
>> > >> kernel side, as far as I understand...
>> > >
>> > > There are in-kernel objects that are freed via call_rcu(), and the idea
>> > > is to wait until these objects really are freed?  Or am I still missing
>> > > out on what is going on?
>> >
>> > Something like that? Although I'm not actually sure these are using
>> > call_rcu()? One of them needs __put_task_struct() to run, and the other
>> > waits for map freeing, with this comment:
>> >
>> >
>> >       /* we need to either wait for or force synchronize_rcu(), before
>> >        * checking for "still exists" condition, otherwise map could still be
>> >        * resolvable by ID, causing false positives.
>> >        *
>> >        * Older kernels (5.8 and earlier) freed map only after two
>> >        * synchronize_rcu()s, so trigger two, to be entirely sure.
>> >        */
>> >       CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
>> >       CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
>>
>> OK, so the issue is that the membarrier() system call is designed to force
>> ordering only within a user process, and you need it in the kernel.
>>
>> Give or take my being puzzled as to why the membarrier() system call
>> doesn't do it for you on a CONFIG_PREEMPT_NONE=y system, this brings
>> us back to the question Alexei asked me in the first place, what is the
>> best way to invoke an in-kernel synchronize_rcu() from userspace?
>>
>> You guys gave some reasonable examples.  Here are a few others:
>>
>> o       Bring a CPU online, then force it offline, or vice versa.
>>         But in this case, sys_membarrier() would do what you need
>>         given more than one CPU.
>>
>> o       Use the membarrier() system call, but require that the tests
>>         run on systems with at least two CPUs.
>>
>> o       Create a kernel module whose init function does a
>>         synchronize_rcu() and then returns failure.  This will
>>         avoid the overhead of removing that kernel module.
>>
>> o       Create a sysfs or debugfs interface that does a
>>         synchronize_rcu().
>>
>> But I am still concerned that you are needing more than synchronize_rcu()
>> can do.  Otherwise, the membarrier() system call would work just fine
>> on a single CPU on your CONFIG_PREEMPT_VOLUNTARY=y kernel.
>
> Selftests know internals of kernel implementation and wait for some
> objects to be freed with call_rcu(). So I think at this point the best
> way is just to go back to map-in-map or socket local storage.
> Map-in-map will probably work on older kernels, so I'd stick with that
> (plus all the code is there in the referenced commit). The performance
> and number of syscalls performed doesn't matter, really.

Just tried that (with the patch below, pulled from the commit you
referred), and that doesn't help. Still get this with a single CPU:

test_lookup_update:FAIL:map1_leak inner_map1 leaked!
#15/1 lookup_update:FAIL
#15 btf_map_in_map:FAIL

It's fine with 2 CPUs. And the other failures (in the task_local_storage
test) seem to have gone away entirely after I just pulled the newest
bpf-next...

-Toke


diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 6396932b97e2..4c26d84a64dc 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -376,7 +376,25 @@ static int delete_module(const char *name, int flags)
  */
 int kern_sync_rcu(void)
 {
-	return syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0, 0);
+	int inner_map_fd, outer_map_fd, err, zero = 0;
+
+	inner_map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 4, 1, 0);
+	if (!ASSERT_LT(0, inner_map_fd, "inner_map_create"))
+		return -1;
+
+	outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL,
+					     sizeof(int), inner_map_fd, 1, 0);
+	if (!ASSERT_LT(0, outer_map_fd, "outer_map_create")) {
+		close(inner_map_fd);
+		return -1;
+	}
+
+	err = bpf_map_update_elem(outer_map_fd, &zero, &inner_map_fd, 0);
+	if (err)
+		err = -errno;
+	ASSERT_OK(err, "outer_map_update");
+	close(inner_map_fd);
+	close(outer_map_fd);
 }
 
 static void unload_bpf_testmod(void)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-04-14 22:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 19:34 Selftest failures related to kern_sync_rcu() Toke Høiland-Jørgensen
2021-04-13  3:38 ` Andrii Nakryiko
2021-04-13  8:50   ` Toke Høiland-Jørgensen
2021-04-13 21:43     ` Andrii Nakryiko
2021-04-14 15:54       ` Alexei Starovoitov
2021-04-14 17:52         ` Paul E. McKenney
2021-04-14 17:59           ` Alexei Starovoitov
2021-04-14 18:19             ` Paul E. McKenney
2021-04-14 18:39               ` Toke Høiland-Jørgensen
2021-04-14 18:41                 ` Paul E. McKenney
2021-04-14 19:18                   ` Toke Høiland-Jørgensen
2021-04-14 21:25                     ` Paul E. McKenney
2021-04-14 22:13                       ` Andrii Nakryiko
2021-04-14 22:27                         ` Paul E. McKenney
2021-04-14 22:47                         ` Toke Høiland-Jørgensen
2021-04-14 18:27             ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.