linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Some -serious- BPF-related litmus tests
@ 2020-05-22  0:38 Paul E. McKenney
  2020-05-22  9:44 ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2020-05-22  0:38 UTC (permalink / raw)
  To: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel
  Cc: linux-kernel, linux-arch, andriin

Hello!

Just wanted to call your attention to some pretty cool and pretty serious
litmus tests that Andrii did as part of his BPF ring-buffer work:

https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/

Thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22  0:38 Some -serious- BPF-related litmus tests Paul E. McKenney
@ 2020-05-22  9:44 ` Peter Zijlstra
  2020-05-22 10:56   ` Paul E. McKenney
  2020-05-22 14:32   ` Alan Stern
  0 siblings, 2 replies; 36+ messages in thread
From: Peter Zijlstra @ 2020-05-22  9:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: stern, parri.andrea, will, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, linux-kernel,
	linux-arch, andriin

On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> Just wanted to call your attention to some pretty cool and pretty serious
> litmus tests that Andrii did as part of his BPF ring-buffer work:
> 
> https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> 
> Thoughts?

I find:

	smp_wmb()
	smp_store_release()

a _very_ weird construct. What is that supposed to even do?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22  9:44 ` Peter Zijlstra
@ 2020-05-22 10:56   ` Paul E. McKenney
  2020-05-22 14:36     ` Alan Stern
  2020-05-22 14:32   ` Alan Stern
  1 sibling, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2020-05-22 10:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stern, parri.andrea, will, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, linux-kernel,
	linux-arch, andriin

On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > Just wanted to call your attention to some pretty cool and pretty serious
> > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > 
> > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > 
> > Thoughts?
> 
> I find:
> 
> 	smp_wmb()
> 	smp_store_release()
> 
> a _very_ weird construct. What is that supposed to even do?

Indeed, and I asked about that in my review of the patch containing the
code.  It -could- make sense if there is a prior read and a later store:

	r1 = READ_ONCE(a);
	WRITE_ONCE(b, 1);
	smp_wmb();
	smp_store_release(&c, 1);
	WRITE_ONCE(d, 1);

So a->c and b->c is smp_store_release() and b->d is smp_wmb().  But if
there were only stores, the smp_wmb() would suffice.  And if there wasn't
the trailing store, smp_store_release() would suffice.

But that would at least want a comment, in my opinion.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22  9:44 ` Peter Zijlstra
  2020-05-22 10:56   ` Paul E. McKenney
@ 2020-05-22 14:32   ` Alan Stern
  2020-05-22 17:43     ` Paul E. McKenney
  1 sibling, 1 reply; 36+ messages in thread
From: Alan Stern @ 2020-05-22 14:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, parri.andrea, will, boqun.feng, npiggin,
	dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel,
	linux-kernel, linux-arch, andriin

On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > Just wanted to call your attention to some pretty cool and pretty serious
> > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > 
> > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > 
> > Thoughts?
> 
> I find:
> 
> 	smp_wmb()
> 	smp_store_release()
> 
> a _very_ weird construct. What is that supposed to even do?

Indeed, it looks like one or the other of those is redundant (depending 
on the context).

Also, what use is a spinlock that is accessed in only one thread?

Finally, I doubt that these tests belong under tools/memory-model.  
Shouldn't they go under the new Documentation/ directory for litmus 
tests?  And shouldn't the patch update a README file?

Alan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22 10:56   ` Paul E. McKenney
@ 2020-05-22 14:36     ` Alan Stern
  2020-05-22 17:45       ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Alan Stern @ 2020-05-22 14:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, parri.andrea, will, boqun.feng, npiggin,
	dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel,
	linux-kernel, linux-arch, andriin

On Fri, May 22, 2020 at 03:56:59AM -0700, Paul E. McKenney wrote:
> On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > Just wanted to call your attention to some pretty cool and pretty serious
> > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > 
> > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > 
> > > Thoughts?
> > 
> > I find:
> > 
> > 	smp_wmb()
> > 	smp_store_release()
> > 
> > a _very_ weird construct. What is that supposed to even do?
> 
> Indeed, and I asked about that in my review of the patch containing the
> code.  It -could- make sense if there is a prior read and a later store:
> 
> 	r1 = READ_ONCE(a);
> 	WRITE_ONCE(b, 1);
> 	smp_wmb();
> 	smp_store_release(&c, 1);
> 	WRITE_ONCE(d, 1);
> 
> So a->c and b->c is smp_store_release() and b->d is smp_wmb().  But if
> there were only stores, the smp_wmb() would suffice.  And if there wasn't
> the trailing store, smp_store_release() would suffice.

But that wasn't the context in the litmus test.  The context was:

	smp_wmb();
	smp_store_release();
	spin_unlock();
	smp_store_release();

That certainly looks like a lot more ordering than is really needed.

Alan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22 14:32   ` Alan Stern
@ 2020-05-22 17:43     ` Paul E. McKenney
  2020-05-22 19:38       ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2020-05-22 17:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Peter Zijlstra, parri.andrea, will, boqun.feng, npiggin,
	dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel,
	linux-kernel, linux-arch, andriin

On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > Just wanted to call your attention to some pretty cool and pretty serious
> > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > 
> > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > 
> > > Thoughts?
> > 
> > I find:
> > 
> > 	smp_wmb()
> > 	smp_store_release()
> > 
> > a _very_ weird construct. What is that supposed to even do?
> 
> Indeed, it looks like one or the other of those is redundant (depending 
> on the context).

Probably.  Peter instead asked what it was supposed to even do.  ;-)

> Also, what use is a spinlock that is accessed in only one thread?

Multiple writers synchronize via the spinlock in this case.  I am
guessing that his larger 16-hour test contended this spinlock.

> Finally, I doubt that these tests belong under tools/memory-model.  
> Shouldn't they go under the new Documentation/ directory for litmus 
> tests?  And shouldn't the patch update a README file?

Agreed, and I responded to that effect to his original patch:

https://lore.kernel.org/bpf/20200522003433.GG2869@paulmck-ThinkPad-P72/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22 14:36     ` Alan Stern
@ 2020-05-22 17:45       ` Paul E. McKenney
  0 siblings, 0 replies; 36+ messages in thread
From: Paul E. McKenney @ 2020-05-22 17:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Peter Zijlstra, parri.andrea, will, boqun.feng, npiggin,
	dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel,
	linux-kernel, linux-arch, andriin

On Fri, May 22, 2020 at 10:36:09AM -0400, Alan Stern wrote:
> On Fri, May 22, 2020 at 03:56:59AM -0700, Paul E. McKenney wrote:
> > On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > > Hello!
> > > > 
> > > > Just wanted to call your attention to some pretty cool and pretty serious
> > > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > > 
> > > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > > 
> > > > Thoughts?
> > > 
> > > I find:
> > > 
> > > 	smp_wmb()
> > > 	smp_store_release()
> > > 
> > > a _very_ weird construct. What is that supposed to even do?
> > 
> > Indeed, and I asked about that in my review of the patch containing the
> > code.  It -could- make sense if there is a prior read and a later store:
> > 
> > 	r1 = READ_ONCE(a);
> > 	WRITE_ONCE(b, 1);
> > 	smp_wmb();
> > 	smp_store_release(&c, 1);
> > 	WRITE_ONCE(d, 1);
> > 
> > So a->c and b->c is smp_store_release() and b->d is smp_wmb().  But if
> > there were only stores, the smp_wmb() would suffice.  And if there wasn't
> > the trailing store, smp_store_release() would suffice.
> 
> But that wasn't the context in the litmus test.  The context was:
> 
> 	smp_wmb();
> 	smp_store_release();
> 	spin_unlock();
> 	smp_store_release();
> 
> That certainly looks like a lot more ordering than is really needed.

I suspect that you are right.  I asked him if there were other accesses
in my response to his ringbuffer (as opposed to litmus-test) patch:

https://lore.kernel.org/bpf/20200522002502.GF2869@paulmck-ThinkPad-P72/

If there are other accesses requiring both, the litmus tests might need
to be updated.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22 17:43     ` Paul E. McKenney
@ 2020-05-22 19:38       ` Andrii Nakryiko
  2020-05-24 12:09         ` Akira Yokosawa
                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-22 19:38 UTC (permalink / raw)
  To: paulmck, Alan Stern
  Cc: Peter Zijlstra, parri.andrea, will, boqun.feng, npiggin,
	dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel,
	linux-kernel, linux-arch, andrii.nakryiko

On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
>> On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
>>> On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
>>>> Hello!
>>>>
>>>> Just wanted to call your attention to some pretty cool and pretty serious
>>>> litmus tests that Andrii did as part of his BPF ring-buffer work:
>>>>
>>>> https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
>>>>
>>>> Thoughts?
>>>
>>> I find:
>>>
>>> 	smp_wmb()
>>> 	smp_store_release()
>>>
>>> a _very_ weird construct. What is that supposed to even do?
>>
>> Indeed, it looks like one or the other of those is redundant (depending
>> on the context).
> 
> Probably.  Peter instead asked what it was supposed to even do.  ;-)

I agree, I think smp_wmb() is redundant here. Can't remember why I 
thought that it's necessary, this algorithm went through a bunch of 
iterations, starting as completely lockless, also using 
READ_ONCE/WRITE_ONCE at some point, and settling on 
smp_read_acquire/smp_store_release, eventually. Maybe there was some 
reason, but might be that I was just over-cautious. See reply on patch 
thread as well ([0]).

   [0] 
https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/


> 
>> Also, what use is a spinlock that is accessed in only one thread?
> 
> Multiple writers synchronize via the spinlock in this case.  I am
> guessing that his larger 16-hour test contended this spinlock.

Yes, spinlock is for coordinating multiple producers. 2p1c cases 
(bounded and unbounded) rely on this already. 1p1c cases are sort of 
subsets (but very fast to verify) checking only consumer/producer 
interaction.

> 
>> Finally, I doubt that these tests belong under tools/memory-model.
>> Shouldn't they go under the new Documentation/ directory for litmus
>> tests?  And shouldn't the patch update a README file?
> 
> Agreed, and I responded to that effect to his original patch:
> 
> https://lore.kernel.org/bpf/20200522003433.GG2869@paulmck-ThinkPad-P72/

Yep, makes sense, I'll will move.

> 
> 							Thanx, Paul
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22 19:38       ` Andrii Nakryiko
@ 2020-05-24 12:09         ` Akira Yokosawa
  2020-05-25 18:31           ` Andrii Nakryiko
  2020-05-25 11:25         ` Peter Zijlstra
  2020-05-25 14:53         ` Boqun Feng
  2 siblings, 1 reply; 36+ messages in thread
From: Akira Yokosawa @ 2020-05-24 12:09 UTC (permalink / raw)
  To: Andrii Nakryiko, paulmck
  Cc: Alan Stern, Peter Zijlstra, parri.andrea, will, boqun.feng,
	npiggin, dhowells, j.alglave, luc.maranget, dlustig, joel,
	linux-kernel, linux-arch, andrii.nakryiko, Akira Yokosawa

On Fri, 22 May 2020 12:38:21 -0700, Andrii Nakryiko wrote:
> On 5/22/20 10:43 AM, Paul E. McKenney wrote:
>> On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
>>> On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
>>>> On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
>>>>> Hello!
>>>>>
>>>>> Just wanted to call your attention to some pretty cool and pretty serious
>>>>> litmus tests that Andrii did as part of his BPF ring-buffer work:
>>>>>
>>>>> https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
>>>>>
>>>>> Thoughts?
>>>>
>>>> I find:
>>>>
>>>>     smp_wmb()
>>>>     smp_store_release()
>>>>
>>>> a _very_ weird construct. What is that supposed to even do?
>>>
>>> Indeed, it looks like one or the other of those is redundant (depending
>>> on the context).
>>
>> Probably.  Peter instead asked what it was supposed to even do.  ;-)
> 
> I agree, I think smp_wmb() is redundant here. Can't remember why I thought that it's necessary, this algorithm went through a bunch of iterations, starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe there was some reason, but might be that I was just over-cautious. See reply on patch thread as well ([0]).
> 
>   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> 
> 
>>
>>> Also, what use is a spinlock that is accessed in only one thread?
>>
>> Multiple writers synchronize via the spinlock in this case.  I am
>> guessing that his larger 16-hour test contended this spinlock.
> 
> Yes, spinlock is for coordinating multiple producers. 2p1c cases (bounded and unbounded) rely on this already. 1p1c cases are sort of subsets (but very fast to verify) checking only consumer/producer interaction.
> 
>>
>>> Finally, I doubt that these tests belong under tools/memory-model.
>>> Shouldn't they go under the new Documentation/ directory for litmus
>>> tests?  And shouldn't the patch update a README file?
>>
>> Agreed, and I responded to that effect to his original patch:
>>
>> https://lore.kernel.org/bpf/20200522003433.GG2869@paulmck-ThinkPad-P72/
> 
> Yep, makes sense, I'll will move.

Hi Andrii,

Andrea reported off-the-list that your litmus tests are incompatible
with the to-be-released version 7.56 of herd7 and currently available
versions of klitmus7.

This is due to a couple of C-language level issues.

herd7 used to be fairly generous in parsing C litmus tests.
On the other hand, klitmus7 converts a litmus test into a
kernel module.  The converted code is built by an actual C compiler
with kernel headers included, and can fail to build due to syntax errors
or serious warnings.
herd7 HEAD is getting slightly stricter on uninitialized variable and
it emits an error to mpsc-rb+1p1c+bounded.litmus:

Warning: File "mpsc-rb+1p1c+bounded.litmus": read on location 0 does not match any write

Converted code by klitmus7 fails to build with the following warning messages:

$ make
make -C /lib/modules/5.3.0-53-generic/build/ M=/home/akira/bpf-rb/klitmus modules
make[1]: Entering directory '/usr/src/linux-headers-5.3.0-53-generic'
  CC [M]  /home/akira/bpf-rb/klitmus/litmus000.o
/home/akira/bpf-rb/klitmus/litmus000.c: In function ‘code1’:
/home/akira/bpf-rb/klitmus/litmus000.c:426:14: error: passing argument 1 of ‘atomic_inc’
  from incompatible pointer type [-Werror=incompatible-pointer-types]
   atomic_inc(dropped);
              ^~~~~~~
In file included from ./arch/x86/include/asm/atomic.h:265:0,
                 from ./arch/x86/include/asm/msr.h:67,
                 from ./arch/x86/include/asm/processor.h:21,
                 from ./arch/x86/include/asm/cpufeature.h:5,
                 from ./arch/x86/include/asm/thread_info.h:53,
                 from ./include/linux/thread_info.h:38,
                 from ./arch/x86/include/asm/preempt.h:7,
                 from ./include/linux/preempt.h:78,
                 from ./include/linux/spinlock.h:51,
                 from ./include/linux/seqlock.h:36,
                 from ./include/linux/time.h:6,
                 from ./include/linux/stat.h:19,
                 from ./include/linux/module.h:10,
                 from /home/akira/bpf-rb/klitmus/litmus000.c:11:
./include/asm-generic/atomic-instrumented.h:237:1: note: expected ‘atomic_t * {aka struct <anonymous> *}’ but argument is of type ‘int *’
 atomic_inc(atomic_t *v)
 ^~~~~~~~~~
In file included from ./include/linux/export.h:45:0,
                 from ./include/linux/linkage.h:7,
                 from ./include/linux/kernel.h:8,
                 from ./include/linux/list.h:9,
                 from ./include/linux/module.h:9,
                 from /home/akira/bpf-rb/klitmus/litmus000.c:11:
/home/akira/bpf-rb/klitmus/litmus000.c: In function ‘thread0’:
./include/linux/compiler.h:187:26: warning: ‘rLenPtr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  case 4: *(__u32 *)res = *(volatile __u32 *)p; break;  \
                          ^
/home/akira/bpf-rb/klitmus/litmus000.c:365:7: note: ‘rLenPtr’ was declared here
  int *rLenPtr;
       ^~~~~~~
In file included from ./include/linux/export.h:45:0,
                 from ./include/linux/linkage.h:7,
                 from ./include/linux/kernel.h:8,
                 from ./include/linux/list.h:9,
                 from ./include/linux/module.h:9,
                 from /home/akira/bpf-rb/klitmus/litmus000.c:11:
/home/akira/bpf-rb/klitmus/litmus000.c: In function ‘thread1’:
./include/linux/compiler.h:225:31: warning: ‘rLenPtr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
          ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/akira/bpf-rb/klitmus/litmus000.c:417:7: note: ‘rLenPtr’ was declared here
  int *rLenPtr;
       ^~~~~~~
cc1: some warnings being treated as errors
scripts/Makefile.build:288: recipe for target '/home/akira/bpf-rb/klitmus/litmus000.o' failed
make[2]: *** [/home/akira/bpf-rb/klitmus/litmus000.o] Error 1
Makefile:1656: recipe for target '_module_/home/akira/bpf-rb/klitmus' failed
make[1]: *** [_module_/home/akira/bpf-rb/klitmus] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.3.0-53-generic'
Makefile:8: recipe for target 'all' failed
make: *** [all] Error 2

Appended below is a patch I applied to mpsc-rb+1p1c+bounded.litmus to make
herd7 HEAD and klitmus7 happy. (Give or take the redundant memory barrier.)

The other variants need similar changes.

What I did here are:

    - Remove unnecessary initialization (shared variables are 0 by default)
    - Declare "dropped" as atomic_t
    - Promote rLenPtr to a shared variable LenPtr

Please note that if you are on Linux 5.6 (or later), you need an up-to-date
klitmus7 due to a change in kernel API.

Any question is welcome!

        Thanks, Akira

-----------------------
diff --git a/mpsc-rb+1p1c+bounded.litmus b/mpsc-rb+1p1c+bounded.litmus
index cafd17a..5af43c1 100644
--- a/mpsc-rb+1p1c+bounded.litmus
+++ b/mpsc-rb+1p1c+bounded.litmus
@@ -17,15 +17,11 @@ C mpsc-rb+1p1c+bounded
 
 {
 	max_len = 1;
-	len1 = 0;
-	px = 0;
-	cx = 0;
-	dropped = 0;
+	atomic_t dropped;
 }
 
-P0(int *len1, int *cx, int *px)
+P0(int *len1, int *cx, int *px, int *LenPtr)
 {
-	int *rLenPtr;
 	int rLen;
 	int rPx;
 	int rCx;
@@ -37,11 +33,11 @@ P0(int *len1, int *cx, int *px)
 	rPx = smp_load_acquire(px);
 	if (rCx < rPx) {
 		if (rCx == 0)
-			rLenPtr = len1;
+			LenPtr = len1;
 		else
 			rFail = 1;
 
-		rLen = smp_load_acquire(rLenPtr);
+		rLen = smp_load_acquire(LenPtr);
 		if (rLen == 0) {
 			rFail = 1;
 		} else if (rLen == 1) {
@@ -51,12 +47,11 @@ P0(int *len1, int *cx, int *px)
 	}
 }
 
-P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, int *dropped, int *max_len)
+P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, atomic_t *dropped, int *max_len, int *LenPtr)
 {
 	int rPx;
 	int rCx;
 	int rFail;
-	int *rLenPtr;
 
 	rFail = 0;
 	rCx = smp_load_acquire(cx);
@@ -69,17 +64,17 @@ P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, int *dropped, int *max_len)
 		spin_unlock(rb_lock);
 	} else {
 		if (rPx == 0)
-			rLenPtr = len1;
+			LenPtr = len1;
 		else
 			rFail = 1;
 
-		*rLenPtr = -1;
+		*LenPtr = -1;
 		smp_wmb();
 		smp_store_release(px, rPx + 1);
 
 		spin_unlock(rb_lock);
 
-		smp_store_release(rLenPtr, 1);
+		smp_store_release(LenPtr, 1);
 	}
 }
 
----------------


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22 19:38       ` Andrii Nakryiko
  2020-05-24 12:09         ` Akira Yokosawa
@ 2020-05-25 11:25         ` Peter Zijlstra
  2020-05-25 15:47           ` Paul E. McKenney
  2020-05-25 14:53         ` Boqun Feng
  2 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2020-05-25 11:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: paulmck, Alan Stern, parri.andrea, will, boqun.feng, npiggin,
	dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel,
	linux-kernel, linux-arch, andrii.nakryiko

On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:

> > > Also, what use is a spinlock that is accessed in only one thread?
> > 
> > Multiple writers synchronize via the spinlock in this case.  I am
> > guessing that his larger 16-hour test contended this spinlock.
> 
> Yes, spinlock is for coordinating multiple producers. 2p1c cases (bounded
> and unbounded) rely on this already. 1p1c cases are sort of subsets (but
> very fast to verify) checking only consumer/producer interaction.

Does that spinlock imply that we can now never fix that atrocious
bpf_prog_active trainwreck ?

How does that spinlock not trigger the USED <- IN-NMI lockdep check:

  f6f48e180404 ("lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions")

?

That is; how can you use a spinlock on the producer side at all?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-22 19:38       ` Andrii Nakryiko
  2020-05-24 12:09         ` Akira Yokosawa
  2020-05-25 11:25         ` Peter Zijlstra
@ 2020-05-25 14:53         ` Boqun Feng
  2020-05-25 18:38           ` Andrii Nakryiko
  2 siblings, 1 reply; 36+ messages in thread
From: Boqun Feng @ 2020-05-25 14:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: paulmck, Alan Stern, Peter Zijlstra, parri.andrea, will, npiggin,
	dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel,
	linux-kernel, linux-arch, andrii.nakryiko

[-- Attachment #1: Type: text/plain, Size: 5386 bytes --]

Hi Andrii,

On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> > > On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > > > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > > > Hello!
> > > > > 
> > > > > Just wanted to call your attention to some pretty cool and pretty serious
> > > > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > > > 
> > > > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > > > 
> > > > > Thoughts?
> > > > 
> > > > I find:
> > > > 
> > > > 	smp_wmb()
> > > > 	smp_store_release()
> > > > 
> > > > a _very_ weird construct. What is that supposed to even do?
> > > 
> > > Indeed, it looks like one or the other of those is redundant (depending
> > > on the context).
> > 
> > Probably.  Peter instead asked what it was supposed to even do.  ;-)
> 
> I agree, I think smp_wmb() is redundant here. Can't remember why I thought
> that it's necessary, this algorithm went through a bunch of iterations,
> starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some
> point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe
> there was some reason, but might be that I was just over-cautious. See reply
> on patch thread as well ([0]).
> 
>   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> 

While we are at it, could you explain a bit on why you use
smp_store_release() on consumer_pos? I ask because IIUC, consumer_pos is
only updated at consumer side, and there is no other write at consumer
side that we want to order with the write to consumer_pos. So I fail
to find why smp_store_release() is necessary.

I did the following modification on litmus tests, and I didn't see
different results (on States) between two versions of litmus tests.

Regards,
Boqun

---------------------->8

diff --git a/tools/memory-model/litmus-tests/mpsc-rb+1p1c+bounded.litmus b/tools/memory-model/litmus-tests/mpsc-rb+1p1c+bounded.litmus
index cafd17afe11e..255b23be7fa9 100644
--- a/tools/memory-model/litmus-tests/mpsc-rb+1p1c+bounded.litmus
+++ b/tools/memory-model/litmus-tests/mpsc-rb+1p1c+bounded.litmus
@@ -46,7 +46,7 @@ P0(int *len1, int *cx, int *px)
 			rFail = 1;
 		} else if (rLen == 1) {
 			rCx = rCx + 1;
-			smp_store_release(cx, rCx);
+			WRITE_ONCE(*cx, rCx);
 		}
 	}
 }
diff --git a/tools/memory-model/litmus-tests/mpsc-rb+1p1c+unbound.litmus b/tools/memory-model/litmus-tests/mpsc-rb+1p1c+unbound.litmus
index 84f660598015..5eecf14f87d1 100644
--- a/tools/memory-model/litmus-tests/mpsc-rb+1p1c+unbound.litmus
+++ b/tools/memory-model/litmus-tests/mpsc-rb+1p1c+unbound.litmus
@@ -44,7 +44,7 @@ P0(int *len1, int *cx, int *px)
 			rFail = 1;
 		} else if (rLen == 1) {
 			rCx = rCx + 1;
-			smp_store_release(cx, rCx);
+			WRITE_ONCE(*cx, rCx);
 		}
 	}
 }
diff --git a/tools/memory-model/litmus-tests/mpsc-rb+2p1c+bounded.litmus b/tools/memory-model/litmus-tests/mpsc-rb+2p1c+bounded.litmus
index 900104c4933b..54da1e5d7ec0 100644
--- a/tools/memory-model/litmus-tests/mpsc-rb+2p1c+bounded.litmus
+++ b/tools/memory-model/litmus-tests/mpsc-rb+2p1c+bounded.litmus
@@ -50,7 +50,7 @@ P0(int *len1, int *cx, int *px)
 			rFail = 1;
 		} else if (rLen == 1) {
 			rCx = rCx + 1;
-			smp_store_release(cx, rCx);
+			WRITE_ONCE(*cx, rCx);
 		}
 	}
 
@@ -68,7 +68,7 @@ P0(int *len1, int *cx, int *px)
 			rFail = 1;
 		} else if (rLen == 1) {
 			rCx = rCx + 1;
-			smp_store_release(cx, rCx);
+			WRITE_ONCE(*cx, rCx);
 		}
 	}
 }
diff --git a/tools/memory-model/litmus-tests/mpsc-rb+2p1c+unbound.litmus b/tools/memory-model/litmus-tests/mpsc-rb+2p1c+unbound.litmus
index 83372e9eb079..fd19433f4d9b 100644
--- a/tools/memory-model/litmus-tests/mpsc-rb+2p1c+unbound.litmus
+++ b/tools/memory-model/litmus-tests/mpsc-rb+2p1c+unbound.litmus
@@ -47,7 +47,7 @@ P0(int *len1, int *len2, int *cx, int *px)
 			rFail = 1;
 		} else if (rLen == 1) {
 			rCx = rCx + 1;
-			smp_store_release(cx, rCx);
+			WRITE_ONCE(*cx, rCx);
 		}
 	}
 
@@ -65,7 +65,7 @@ P0(int *len1, int *len2, int *cx, int *px)
 			rFail = 1;
 		} else if (rLen == 1) {
 			rCx = rCx + 1;
-			smp_store_release(cx, rCx);
+			WRITE_ONCE(*cx, rCx);
 		}
 	}
 }

> 
> > 
> > > Also, what use is a spinlock that is accessed in only one thread?
> > 
> > Multiple writers synchronize via the spinlock in this case.  I am
> > guessing that his larger 16-hour test contended this spinlock.
> 
> Yes, spinlock is for coordinating multiple producers. 2p1c cases (bounded
> and unbounded) rely on this already. 1p1c cases are sort of subsets (but
> very fast to verify) checking only consumer/producer interaction.
> 
> > 
> > > Finally, I doubt that these tests belong under tools/memory-model.
> > > Shouldn't they go under the new Documentation/ directory for litmus
> > > tests?  And shouldn't the patch update a README file?
> > 
> > Agreed, and I responded to that effect to his original patch:
> > 
> > https://lore.kernel.org/bpf/20200522003433.GG2869@paulmck-ThinkPad-P72/
> 
> Yep, makes sense, I'll will move.
> 
> > 
> > 							Thanx, Paul
> > 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 11:25         ` Peter Zijlstra
@ 2020-05-25 15:47           ` Paul E. McKenney
  2020-05-25 17:02             ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2020-05-25 15:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrii Nakryiko, Alan Stern, parri.andrea, will, boqun.feng,
	npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig,
	joel, linux-kernel, linux-arch, andrii.nakryiko

On Mon, May 25, 2020 at 01:25:21PM +0200, Peter Zijlstra wrote:
> On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> > On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> 
> > > > Also, what use is a spinlock that is accessed in only one thread?
> > > 
> > > Multiple writers synchronize via the spinlock in this case.  I am
> > > guessing that his larger 16-hour test contended this spinlock.
> > 
> > Yes, spinlock is for coordinating multiple producers. 2p1c cases (bounded
> > and unbounded) rely on this already. 1p1c cases are sort of subsets (but
> > very fast to verify) checking only consumer/producer interaction.
> 
> Does that spinlock imply that we can now never fix that atrocious
> bpf_prog_active trainwreck ?
> 
> How does that spinlock not trigger the USED <- IN-NMI lockdep check:
> 
>   f6f48e180404 ("lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions")
> 
> ?
> 
> That is; how can you use a spinlock on the producer side at all?

So even trylock is now forbidden in NMI handlers?  If so, why?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 15:47           ` Paul E. McKenney
@ 2020-05-25 17:02             ` Peter Zijlstra
  2020-05-25 17:21               ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2020-05-25 17:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrii Nakryiko, Alan Stern, parri.andrea, will, boqun.feng,
	npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig,
	joel, linux-kernel, linux-arch, andrii.nakryiko

On Mon, May 25, 2020 at 08:47:30AM -0700, Paul E. McKenney wrote:
> On Mon, May 25, 2020 at 01:25:21PM +0200, Peter Zijlstra wrote:

> > That is; how can you use a spinlock on the producer side at all?
> 
> So even trylock is now forbidden in NMI handlers?  If so, why?

The litmus tests don't have trylock.

But you made me look at the actual patch:

+static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
+{
+	unsigned long cons_pos, prod_pos, new_prod_pos, flags;
+	u32 len, pg_off;
+	struct bpf_ringbuf_hdr *hdr;
+
+	if (unlikely(size > RINGBUF_MAX_RECORD_SZ))
+		return NULL;
+
+	len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
+	cons_pos = smp_load_acquire(&rb->consumer_pos);
+
+	if (in_nmi()) {
+		if (!spin_trylock_irqsave(&rb->spinlock, flags))
+			return NULL;
+	} else {
+		spin_lock_irqsave(&rb->spinlock, flags);
+	}

And that is of course utter crap. That's like saying you don't care
about your NMI data.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 17:02             ` Peter Zijlstra
@ 2020-05-25 17:21               ` Paul E. McKenney
  2020-05-25 17:45                 ` Paul E. McKenney
  2020-05-28 22:00                 ` Joel Fernandes
  0 siblings, 2 replies; 36+ messages in thread
From: Paul E. McKenney @ 2020-05-25 17:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrii Nakryiko, Alan Stern, parri.andrea, will, boqun.feng,
	npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig,
	joel, linux-kernel, linux-arch, andrii.nakryiko

On Mon, May 25, 2020 at 07:02:57PM +0200, Peter Zijlstra wrote:
> On Mon, May 25, 2020 at 08:47:30AM -0700, Paul E. McKenney wrote:
> > On Mon, May 25, 2020 at 01:25:21PM +0200, Peter Zijlstra wrote:
> 
> > > That is; how can you use a spinlock on the producer side at all?
> > 
> > So even trylock is now forbidden in NMI handlers?  If so, why?
> 
> The litmus tests don't have trylock.

Fair point.

> But you made me look at the actual patch:
> 
> +static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
> +{
> +	unsigned long cons_pos, prod_pos, new_prod_pos, flags;
> +	u32 len, pg_off;
> +	struct bpf_ringbuf_hdr *hdr;
> +
> +	if (unlikely(size > RINGBUF_MAX_RECORD_SZ))
> +		return NULL;
> +
> +	len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
> +	cons_pos = smp_load_acquire(&rb->consumer_pos);
> +
> +	if (in_nmi()) {
> +		if (!spin_trylock_irqsave(&rb->spinlock, flags))
> +			return NULL;
> +	} else {
> +		spin_lock_irqsave(&rb->spinlock, flags);
> +	}
> 
> And that is of course utter crap. That's like saying you don't care
> about your NMI data.

Almost.  It is really saying that -if- there is sufficient lock
contention, printk()s will be lost.  Just as they always have been if
there is more printk() volume than can be accommodated.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 17:21               ` Paul E. McKenney
@ 2020-05-25 17:45                 ` Paul E. McKenney
  2020-05-28 22:00                 ` Joel Fernandes
  1 sibling, 0 replies; 36+ messages in thread
From: Paul E. McKenney @ 2020-05-25 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrii Nakryiko, Alan Stern, parri.andrea, will, boqun.feng,
	npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig,
	joel, linux-kernel, linux-arch, andrii.nakryiko

On Mon, May 25, 2020 at 10:21:54AM -0700, Paul E. McKenney wrote:
> On Mon, May 25, 2020 at 07:02:57PM +0200, Peter Zijlstra wrote:
> > On Mon, May 25, 2020 at 08:47:30AM -0700, Paul E. McKenney wrote:
> > > On Mon, May 25, 2020 at 01:25:21PM +0200, Peter Zijlstra wrote:
> > 
> > > > That is; how can you use a spinlock on the producer side at all?
> > > 
> > > So even trylock is now forbidden in NMI handlers?  If so, why?
> > 
> > The litmus tests don't have trylock.
> 
> Fair point.
> 
> > But you made me look at the actual patch:
> > 
> > +static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
> > +{
> > +	unsigned long cons_pos, prod_pos, new_prod_pos, flags;
> > +	u32 len, pg_off;
> > +	struct bpf_ringbuf_hdr *hdr;
> > +
> > +	if (unlikely(size > RINGBUF_MAX_RECORD_SZ))
> > +		return NULL;
> > +
> > +	len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
> > +	cons_pos = smp_load_acquire(&rb->consumer_pos);
> > +
> > +	if (in_nmi()) {
> > +		if (!spin_trylock_irqsave(&rb->spinlock, flags))
> > +			return NULL;
> > +	} else {
> > +		spin_lock_irqsave(&rb->spinlock, flags);
> > +	}
> > 
> > And that is of course utter crap. That's like saying you don't care
> > about your NMI data.
> 
> Almost.  It is really saying that -if- there is sufficient lock
> contention, printk()s will be lost.  Just as they always have been if
> there is more printk() volume than can be accommodated.

s/printk()/BPF output/

One of those days...  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-24 12:09         ` Akira Yokosawa
@ 2020-05-25 18:31           ` Andrii Nakryiko
  2020-05-25 22:01             ` Akira Yokosawa
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-25 18:31 UTC (permalink / raw)
  To: Akira Yokosawa
  Cc: Andrii Nakryiko, Paul E . McKenney, Alan Stern, Peter Zijlstra,
	parri.andrea, will, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, dlustig, Joel Fernandes, open list, linux-arch

On Sun, May 24, 2020 at 5:09 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>
> On Fri, 22 May 2020 12:38:21 -0700, Andrii Nakryiko wrote:
> > On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> >> On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> >>> On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> >>>> On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> >>>>> Hello!
> >>>>>
> >>>>> Just wanted to call your attention to some pretty cool and pretty serious
> >>>>> litmus tests that Andrii did as part of his BPF ring-buffer work:
> >>>>>
> >>>>> https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> >>>>>
> >>>>> Thoughts?
> >>>>
> >>>> I find:
> >>>>
> >>>>     smp_wmb()
> >>>>     smp_store_release()
> >>>>
> >>>> a _very_ weird construct. What is that supposed to even do?
> >>>
> >>> Indeed, it looks like one or the other of those is redundant (depending
> >>> on the context).
> >>
> >> Probably.  Peter instead asked what it was supposed to even do.  ;-)
> >
> > I agree, I think smp_wmb() is redundant here. Can't remember why I thought that it's necessary, this algorithm went through a bunch of iterations, starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe there was some reason, but might be that I was just over-cautious. See reply on patch thread as well ([0]).
> >
> >   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> >
> >
> >>
> >>> Also, what use is a spinlock that is accessed in only one thread?
> >>
> >> Multiple writers synchronize via the spinlock in this case.  I am
> >> guessing that his larger 16-hour test contended this spinlock.
> >
> > Yes, spinlock is for coordinating multiple producers. 2p1c cases (bounded and unbounded) rely on this already. 1p1c cases are sort of subsets (but very fast to verify) checking only consumer/producer interaction.
> >
> >>
> >>> Finally, I doubt that these tests belong under tools/memory-model.
> >>> Shouldn't they go under the new Documentation/ directory for litmus
> >>> tests?  And shouldn't the patch update a README file?
> >>
> >> Agreed, and I responded to that effect to his original patch:
> >>
> >> https://lore.kernel.org/bpf/20200522003433.GG2869@paulmck-ThinkPad-P72/
> >
> > Yep, makes sense, I'll will move.
>
> Hi Andrii,
>
> Andrea reported off-the-list that your litmus tests are incompatible
> with the to-be-released version 7.56 of herd7 and currently available
> versions of klitmus7.
>
> This is due to a couple of C-language level issues.
>
> herd7 used to be fairly generous in parsing C litmus tests.
> On the other hand, klitmus7 converts a litmus test into a
> kernel module.  The converted code is built by an actual C compiler
> with kernel headers included, and can fail to build due to syntax errors
> or serious warnings.
> herd7 HEAD is getting slightly stricter on uninitialized variable and
> it emits an error to mpsc-rb+1p1c+bounded.litmus:
>
> Warning: File "mpsc-rb+1p1c+bounded.litmus": read on location 0 does not match any write
>
> Converted code by klitmus7 fails to build with the following warning messages:
>
> $ make
> make -C /lib/modules/5.3.0-53-generic/build/ M=/home/akira/bpf-rb/klitmus modules
> make[1]: Entering directory '/usr/src/linux-headers-5.3.0-53-generic'
>   CC [M]  /home/akira/bpf-rb/klitmus/litmus000.o
> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘code1’:
> /home/akira/bpf-rb/klitmus/litmus000.c:426:14: error: passing argument 1 of ‘atomic_inc’
>   from incompatible pointer type [-Werror=incompatible-pointer-types]
>    atomic_inc(dropped);
>               ^~~~~~~
> In file included from ./arch/x86/include/asm/atomic.h:265:0,
>                  from ./arch/x86/include/asm/msr.h:67,
>                  from ./arch/x86/include/asm/processor.h:21,
>                  from ./arch/x86/include/asm/cpufeature.h:5,
>                  from ./arch/x86/include/asm/thread_info.h:53,
>                  from ./include/linux/thread_info.h:38,
>                  from ./arch/x86/include/asm/preempt.h:7,
>                  from ./include/linux/preempt.h:78,
>                  from ./include/linux/spinlock.h:51,
>                  from ./include/linux/seqlock.h:36,
>                  from ./include/linux/time.h:6,
>                  from ./include/linux/stat.h:19,
>                  from ./include/linux/module.h:10,
>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
> ./include/asm-generic/atomic-instrumented.h:237:1: note: expected ‘atomic_t * {aka struct <anonymous> *}’ but argument is of type ‘int *’
>  atomic_inc(atomic_t *v)
>  ^~~~~~~~~~
> In file included from ./include/linux/export.h:45:0,
>                  from ./include/linux/linkage.h:7,
>                  from ./include/linux/kernel.h:8,
>                  from ./include/linux/list.h:9,
>                  from ./include/linux/module.h:9,
>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘thread0’:
> ./include/linux/compiler.h:187:26: warning: ‘rLenPtr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   case 4: *(__u32 *)res = *(volatile __u32 *)p; break;  \
>                           ^
> /home/akira/bpf-rb/klitmus/litmus000.c:365:7: note: ‘rLenPtr’ was declared here
>   int *rLenPtr;
>        ^~~~~~~
> In file included from ./include/linux/export.h:45:0,
>                  from ./include/linux/linkage.h:7,
>                  from ./include/linux/kernel.h:8,
>                  from ./include/linux/list.h:9,
>                  from ./include/linux/module.h:9,
>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘thread1’:
> ./include/linux/compiler.h:225:31: warning: ‘rLenPtr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
>           ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
> /home/akira/bpf-rb/klitmus/litmus000.c:417:7: note: ‘rLenPtr’ was declared here
>   int *rLenPtr;
>        ^~~~~~~
> cc1: some warnings being treated as errors
> scripts/Makefile.build:288: recipe for target '/home/akira/bpf-rb/klitmus/litmus000.o' failed
> make[2]: *** [/home/akira/bpf-rb/klitmus/litmus000.o] Error 1
> Makefile:1656: recipe for target '_module_/home/akira/bpf-rb/klitmus' failed
> make[1]: *** [_module_/home/akira/bpf-rb/klitmus] Error 2
> make[1]: Leaving directory '/usr/src/linux-headers-5.3.0-53-generic'
> Makefile:8: recipe for target 'all' failed
> make: *** [all] Error 2
>
> Appended below is a patch I applied to mpsc-rb+1p1c+bounded.litmus to make
> herd7 HEAD and klitmus7 happy. (Give or take the redundant memory barrier.)
>
> The other variants need similar changes.

Ok, cool, thanks for letting me know. I'll see if I can upgrade
everything and test on my side (if you have a pointer to instructions
how to use klitmus7, that would be greatly appreaciated!)

>
> What I did here are:
>
>     - Remove unnecessary initialization (shared variables are 0 by default)
>     - Declare "dropped" as atomic_t

These two look good.

>     - Promote rLenPtr to a shared variable LenPtr

This one might work for single producer litmus tests, but it's wrong
for 2- and 3-producer cases, because it has to be local to producer.
But I think I can work around unitialized read warning by assigning it
to 0 in failure case.

>
> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
> klitmus7 due to a change in kernel API.
>
> Any question is welcome!
>
>         Thanks, Akira
>
> -----------------------
> diff --git a/mpsc-rb+1p1c+bounded.litmus b/mpsc-rb+1p1c+bounded.litmus
> index cafd17a..5af43c1 100644
> --- a/mpsc-rb+1p1c+bounded.litmus
> +++ b/mpsc-rb+1p1c+bounded.litmus
> @@ -17,15 +17,11 @@ C mpsc-rb+1p1c+bounded
>
>  {
>         max_len = 1;
> -       len1 = 0;
> -       px = 0;
> -       cx = 0;
> -       dropped = 0;
> +       atomic_t dropped;
>  }
>
> -P0(int *len1, int *cx, int *px)
> +P0(int *len1, int *cx, int *px, int *LenPtr)
>  {
> -       int *rLenPtr;
>         int rLen;
>         int rPx;
>         int rCx;
> @@ -37,11 +33,11 @@ P0(int *len1, int *cx, int *px)
>         rPx = smp_load_acquire(px);
>         if (rCx < rPx) {
>                 if (rCx == 0)
> -                       rLenPtr = len1;
> +                       LenPtr = len1;
>                 else
>                         rFail = 1;
>
> -               rLen = smp_load_acquire(rLenPtr);
> +               rLen = smp_load_acquire(LenPtr);
>                 if (rLen == 0) {
>                         rFail = 1;
>                 } else if (rLen == 1) {
> @@ -51,12 +47,11 @@ P0(int *len1, int *cx, int *px)
>         }
>  }
>
> -P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, int *dropped, int *max_len)
> +P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, atomic_t *dropped, int *max_len, int *LenPtr)
>  {
>         int rPx;
>         int rCx;
>         int rFail;
> -       int *rLenPtr;
>
>         rFail = 0;
>         rCx = smp_load_acquire(cx);
> @@ -69,17 +64,17 @@ P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, int *dropped, int *max_len)
>                 spin_unlock(rb_lock);
>         } else {
>                 if (rPx == 0)
> -                       rLenPtr = len1;
> +                       LenPtr = len1;
>                 else
>                         rFail = 1;
>
> -               *rLenPtr = -1;
> +               *LenPtr = -1;
>                 smp_wmb();
>                 smp_store_release(px, rPx + 1);
>
>                 spin_unlock(rb_lock);
>
> -               smp_store_release(rLenPtr, 1);
> +               smp_store_release(LenPtr, 1);
>         }
>  }
>
> ----------------
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 14:53         ` Boqun Feng
@ 2020-05-25 18:38           ` Andrii Nakryiko
  2020-05-28 21:48             ` Joel Fernandes
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-25 18:38 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Andrii Nakryiko, Paul E . McKenney, Alan Stern, Peter Zijlstra,
	parri.andrea, will, npiggin, dhowells, j.alglave, luc.maranget,
	akiyks, dlustig, Joel Fernandes, open list, linux-arch

On Mon, May 25, 2020 at 7:53 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> Hi Andrii,
>
> On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> > On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> > > > On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > > > > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > > > > Hello!
> > > > > >
> > > > > > Just wanted to call your attention to some pretty cool and pretty serious
> > > > > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > > > >
> > > > > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > I find:
> > > > >
> > > > >         smp_wmb()
> > > > >         smp_store_release()
> > > > >
> > > > > a _very_ weird construct. What is that supposed to even do?
> > > >
> > > > Indeed, it looks like one or the other of those is redundant (depending
> > > > on the context).
> > >
> > > Probably.  Peter instead asked what it was supposed to even do.  ;-)
> >
> > I agree, I think smp_wmb() is redundant here. Can't remember why I thought
> > that it's necessary, this algorithm went through a bunch of iterations,
> > starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some
> > point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe
> > there was some reason, but might be that I was just over-cautious. See reply
> > on patch thread as well ([0]).
> >
> >   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> >
>
> While we are at it, could you explain a bit on why you use
> smp_store_release() on consumer_pos? I ask because IIUC, consumer_pos is
> only updated at consumer side, and there is no other write at consumer
> side that we want to order with the write to consumer_pos. So I fail
> to find why smp_store_release() is necessary.
>
> I did the following modification on litmus tests, and I didn't see
> different results (on States) between two versions of litmus tests.
>

This is needed to ensure that producer can reliably detect whether it
needs to trigger poll notification. Basically, consumer caught up at
about same time as producer commits new record, we need to make sure
that:
  - either consumer sees updated producer_pos > consumer_pos, and thus
knows that there is more data to consumer (but producer might not send
notification of new data in this case);
  - or producer sees that consumer already caught up (i.e.,
consumer_pos == producer_pos before currently committed record), and
in such case will definitely send notifications.

This is critical for correctness of epoll notifications.
Unfortunately, litmus tests don't test this notification aspect, as I
haven't originally figured out the invariant that can be defined to
validate this. I'll give it another thought, though, maybe this time
I'll come up with something.

> Regards,
> Boqun
>

[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 18:31           ` Andrii Nakryiko
@ 2020-05-25 22:01             ` Akira Yokosawa
  2020-05-25 23:31               ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Akira Yokosawa @ 2020-05-25 22:01 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Paul E . McKenney, Alan Stern, Peter Zijlstra,
	parri.andrea, will, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, dlustig, Joel Fernandes, open list, linux-arch,
	Akira Yokosawa

On Mon, 25 May 2020 11:31:27 -0700, Andrii Nakryiko wrote:
> On Sun, May 24, 2020 at 5:09 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>>
>> On Fri, 22 May 2020 12:38:21 -0700, Andrii Nakryiko wrote:
>>> On 5/22/20 10:43 AM, Paul E. McKenney wrote:
>>>> On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
>>>>> On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
>>>>>> On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
>>>>>>> Hello!
>>>>>>>
>>>>>>> Just wanted to call your attention to some pretty cool and pretty serious
>>>>>>> litmus tests that Andrii did as part of his BPF ring-buffer work:
>>>>>>>
>>>>>>> https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> I find:
>>>>>>
>>>>>>     smp_wmb()
>>>>>>     smp_store_release()
>>>>>>
>>>>>> a _very_ weird construct. What is that supposed to even do?
>>>>>
>>>>> Indeed, it looks like one or the other of those is redundant (depending
>>>>> on the context).
>>>>
>>>> Probably.  Peter instead asked what it was supposed to even do.  ;-)
>>>
>>> I agree, I think smp_wmb() is redundant here. Can't remember why I thought that it's necessary, this algorithm went through a bunch of iterations, starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe there was some reason, but might be that I was just over-cautious. See reply on patch thread as well ([0]).
>>>
>>>   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
>>>
>>>
>>>>
>>>>> Also, what use is a spinlock that is accessed in only one thread?
>>>>
>>>> Multiple writers synchronize via the spinlock in this case.  I am
>>>> guessing that his larger 16-hour test contended this spinlock.
>>>
>>> Yes, spinlock is for coordinating multiple producers. 2p1c cases (bounded and unbounded) rely on this already. 1p1c cases are sort of subsets (but very fast to verify) checking only consumer/producer interaction.
>>>
>>>>
>>>>> Finally, I doubt that these tests belong under tools/memory-model.
>>>>> Shouldn't they go under the new Documentation/ directory for litmus
>>>>> tests?  And shouldn't the patch update a README file?
>>>>
>>>> Agreed, and I responded to that effect to his original patch:
>>>>
>>>> https://lore.kernel.org/bpf/20200522003433.GG2869@paulmck-ThinkPad-P72/
>>>
>>> Yep, makes sense, I'll will move.
>>
>> Hi Andrii,
>>
>> Andrea reported off-the-list that your litmus tests are incompatible
>> with the to-be-released version 7.56 of herd7 and currently available
>> versions of klitmus7.
>>
>> This is due to a couple of C-language level issues.
>>
>> herd7 used to be fairly generous in parsing C litmus tests.
>> On the other hand, klitmus7 converts a litmus test into a
>> kernel module.  The converted code is built by an actual C compiler
>> with kernel headers included, and can fail to build due to syntax errors
>> or serious warnings.
>> herd7 HEAD is getting slightly stricter on uninitialized variable and
>> it emits an error to mpsc-rb+1p1c+bounded.litmus:
>>
>> Warning: File "mpsc-rb+1p1c+bounded.litmus": read on location 0 does not match any write
>>
>> Converted code by klitmus7 fails to build with the following warning messages:
>>
>> $ make
>> make -C /lib/modules/5.3.0-53-generic/build/ M=/home/akira/bpf-rb/klitmus modules
>> make[1]: Entering directory '/usr/src/linux-headers-5.3.0-53-generic'
>>   CC [M]  /home/akira/bpf-rb/klitmus/litmus000.o
>> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘code1’:
>> /home/akira/bpf-rb/klitmus/litmus000.c:426:14: error: passing argument 1 of ‘atomic_inc’
>>   from incompatible pointer type [-Werror=incompatible-pointer-types]
>>    atomic_inc(dropped);
>>               ^~~~~~~
>> In file included from ./arch/x86/include/asm/atomic.h:265:0,
>>                  from ./arch/x86/include/asm/msr.h:67,
>>                  from ./arch/x86/include/asm/processor.h:21,
>>                  from ./arch/x86/include/asm/cpufeature.h:5,
>>                  from ./arch/x86/include/asm/thread_info.h:53,
>>                  from ./include/linux/thread_info.h:38,
>>                  from ./arch/x86/include/asm/preempt.h:7,
>>                  from ./include/linux/preempt.h:78,
>>                  from ./include/linux/spinlock.h:51,
>>                  from ./include/linux/seqlock.h:36,
>>                  from ./include/linux/time.h:6,
>>                  from ./include/linux/stat.h:19,
>>                  from ./include/linux/module.h:10,
>>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
>> ./include/asm-generic/atomic-instrumented.h:237:1: note: expected ‘atomic_t * {aka struct <anonymous> *}’ but argument is of type ‘int *’
>>  atomic_inc(atomic_t *v)
>>  ^~~~~~~~~~
>> In file included from ./include/linux/export.h:45:0,
>>                  from ./include/linux/linkage.h:7,
>>                  from ./include/linux/kernel.h:8,
>>                  from ./include/linux/list.h:9,
>>                  from ./include/linux/module.h:9,
>>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
>> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘thread0’:
>> ./include/linux/compiler.h:187:26: warning: ‘rLenPtr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>   case 4: *(__u32 *)res = *(volatile __u32 *)p; break;  \
>>                           ^
>> /home/akira/bpf-rb/klitmus/litmus000.c:365:7: note: ‘rLenPtr’ was declared here
>>   int *rLenPtr;
>>        ^~~~~~~
>> In file included from ./include/linux/export.h:45:0,
>>                  from ./include/linux/linkage.h:7,
>>                  from ./include/linux/kernel.h:8,
>>                  from ./include/linux/list.h:9,
>>                  from ./include/linux/module.h:9,
>>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
>> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘thread1’:
>> ./include/linux/compiler.h:225:31: warning: ‘rLenPtr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>   case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
>>           ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
>> /home/akira/bpf-rb/klitmus/litmus000.c:417:7: note: ‘rLenPtr’ was declared here
>>   int *rLenPtr;
>>        ^~~~~~~
>> cc1: some warnings being treated as errors
>> scripts/Makefile.build:288: recipe for target '/home/akira/bpf-rb/klitmus/litmus000.o' failed
>> make[2]: *** [/home/akira/bpf-rb/klitmus/litmus000.o] Error 1
>> Makefile:1656: recipe for target '_module_/home/akira/bpf-rb/klitmus' failed
>> make[1]: *** [_module_/home/akira/bpf-rb/klitmus] Error 2
>> make[1]: Leaving directory '/usr/src/linux-headers-5.3.0-53-generic'
>> Makefile:8: recipe for target 'all' failed
>> make: *** [all] Error 2
>>
>> Appended below is a patch I applied to mpsc-rb+1p1c+bounded.litmus to make
>> herd7 HEAD and klitmus7 happy. (Give or take the redundant memory barrier.)
>>
>> The other variants need similar changes.
> 
> Ok, cool, thanks for letting me know. I'll see if I can upgrade
> everything and test on my side (if you have a pointer to instructions
> how to use klitmus7, that would be greatly appreaciated!)
> 
>>
>> What I did here are:
>>
>>     - Remove unnecessary initialization (shared variables are 0 by default)
>>     - Declare "dropped" as atomic_t
> 
> These two look good.
> 
>>     - Promote rLenPtr to a shared variable LenPtr
> 
> This one might work for single producer litmus tests, but it's wrong
> for 2- and 3-producer cases, because it has to be local to producer.

Ah, I knew I had missed something...

> But I think I can work around unitialized read warning by assigning it
> to 0 in failure case.

Yes, that should work.

You can find a basic introduction of klitmus7 in tools/memory-model/README.

        Thanks, Akira

> 
>>
>> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
>> klitmus7 due to a change in kernel API.
>>
>> Any question is welcome!
>>
>>         Thanks, Akira
>>
>> -----------------------
>> diff --git a/mpsc-rb+1p1c+bounded.litmus b/mpsc-rb+1p1c+bounded.litmus
>> index cafd17a..5af43c1 100644
>> --- a/mpsc-rb+1p1c+bounded.litmus
>> +++ b/mpsc-rb+1p1c+bounded.litmus
>> @@ -17,15 +17,11 @@ C mpsc-rb+1p1c+bounded
>>
>>  {
>>         max_len = 1;
>> -       len1 = 0;
>> -       px = 0;
>> -       cx = 0;
>> -       dropped = 0;
>> +       atomic_t dropped;
>>  }
>>
>> -P0(int *len1, int *cx, int *px)
>> +P0(int *len1, int *cx, int *px, int *LenPtr)
>>  {
>> -       int *rLenPtr;
>>         int rLen;
>>         int rPx;
>>         int rCx;
>> @@ -37,11 +33,11 @@ P0(int *len1, int *cx, int *px)
>>         rPx = smp_load_acquire(px);
>>         if (rCx < rPx) {
>>                 if (rCx == 0)
>> -                       rLenPtr = len1;
>> +                       LenPtr = len1;
>>                 else
>>                         rFail = 1;
>>
>> -               rLen = smp_load_acquire(rLenPtr);
>> +               rLen = smp_load_acquire(LenPtr);
>>                 if (rLen == 0) {
>>                         rFail = 1;
>>                 } else if (rLen == 1) {
>> @@ -51,12 +47,11 @@ P0(int *len1, int *cx, int *px)
>>         }
>>  }
>>
>> -P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, int *dropped, int *max_len)
>> +P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, atomic_t *dropped, int *max_len, int *LenPtr)
>>  {
>>         int rPx;
>>         int rCx;
>>         int rFail;
>> -       int *rLenPtr;
>>
>>         rFail = 0;
>>         rCx = smp_load_acquire(cx);
>> @@ -69,17 +64,17 @@ P1(int *len1, spinlock_t *rb_lock, int *px, int *cx, int *dropped, int *max_len)
>>                 spin_unlock(rb_lock);
>>         } else {
>>                 if (rPx == 0)
>> -                       rLenPtr = len1;
>> +                       LenPtr = len1;
>>                 else
>>                         rFail = 1;
>>
>> -               *rLenPtr = -1;
>> +               *LenPtr = -1;
>>                 smp_wmb();
>>                 smp_store_release(px, rPx + 1);
>>
>>                 spin_unlock(rb_lock);
>>
>> -               smp_store_release(rLenPtr, 1);
>> +               smp_store_release(LenPtr, 1);
>>         }
>>  }
>>
>> ----------------
>>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 22:01             ` Akira Yokosawa
@ 2020-05-25 23:31               ` Andrii Nakryiko
  2020-05-26 10:50                 ` Akira Yokosawa
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-25 23:31 UTC (permalink / raw)
  To: Akira Yokosawa
  Cc: Andrii Nakryiko, Paul E . McKenney, Alan Stern, Peter Zijlstra,
	parri.andrea, will, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, dlustig, Joel Fernandes, open list, linux-arch

On Mon, May 25, 2020 at 3:01 PM Akira Yokosawa <akiyks@gmail.com> wrote:
>
> On Mon, 25 May 2020 11:31:27 -0700, Andrii Nakryiko wrote:
> > On Sun, May 24, 2020 at 5:09 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>
> >> On Fri, 22 May 2020 12:38:21 -0700, Andrii Nakryiko wrote:
> >>> On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> >>>> On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> >>>>> On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> >>>>>> On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> >>>>>>> Hello!
> >>>>>>>
> >>>>>>> Just wanted to call your attention to some pretty cool and pretty serious
> >>>>>>> litmus tests that Andrii did as part of his BPF ring-buffer work:
> >>>>>>>
> >>>>>>> https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>
> >>>>>> I find:
> >>>>>>
> >>>>>>     smp_wmb()
> >>>>>>     smp_store_release()
> >>>>>>
> >>>>>> a _very_ weird construct. What is that supposed to even do?
> >>>>>
> >>>>> Indeed, it looks like one or the other of those is redundant (depending
> >>>>> on the context).
> >>>>
> >>>> Probably.  Peter instead asked what it was supposed to even do.  ;-)
> >>>
> >>> I agree, I think smp_wmb() is redundant here. Can't remember why I thought that it's necessary, this algorithm went through a bunch of iterations, starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe there was some reason, but might be that I was just over-cautious. See reply on patch thread as well ([0]).
> >>>
> >>>   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> >>>
> >>>
> >>>>
> >>>>> Also, what use is a spinlock that is accessed in only one thread?
> >>>>
> >>>> Multiple writers synchronize via the spinlock in this case.  I am
> >>>> guessing that his larger 16-hour test contended this spinlock.
> >>>
> >>> Yes, spinlock is for coordinating multiple producers. 2p1c cases (bounded and unbounded) rely on this already. 1p1c cases are sort of subsets (but very fast to verify) checking only consumer/producer interaction.
> >>>
> >>>>
> >>>>> Finally, I doubt that these tests belong under tools/memory-model.
> >>>>> Shouldn't they go under the new Documentation/ directory for litmus
> >>>>> tests?  And shouldn't the patch update a README file?
> >>>>
> >>>> Agreed, and I responded to that effect to his original patch:
> >>>>
> >>>> https://lore.kernel.org/bpf/20200522003433.GG2869@paulmck-ThinkPad-P72/
> >>>
> >>> Yep, makes sense, I'll will move.
> >>
> >> Hi Andrii,
> >>
> >> Andrea reported off-the-list that your litmus tests are incompatible
> >> with the to-be-released version 7.56 of herd7 and currently available
> >> versions of klitmus7.
> >>
> >> This is due to a couple of C-language level issues.
> >>
> >> herd7 used to be fairly generous in parsing C litmus tests.
> >> On the other hand, klitmus7 converts a litmus test into a
> >> kernel module.  The converted code is built by an actual C compiler
> >> with kernel headers included, and can fail to build due to syntax errors
> >> or serious warnings.
> >> herd7 HEAD is getting slightly stricter on uninitialized variable and
> >> it emits an error to mpsc-rb+1p1c+bounded.litmus:
> >>
> >> Warning: File "mpsc-rb+1p1c+bounded.litmus": read on location 0 does not match any write
> >>
> >> Converted code by klitmus7 fails to build with the following warning messages:
> >>
> >> $ make
> >> make -C /lib/modules/5.3.0-53-generic/build/ M=/home/akira/bpf-rb/klitmus modules
> >> make[1]: Entering directory '/usr/src/linux-headers-5.3.0-53-generic'
> >>   CC [M]  /home/akira/bpf-rb/klitmus/litmus000.o
> >> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘code1’:
> >> /home/akira/bpf-rb/klitmus/litmus000.c:426:14: error: passing argument 1 of ‘atomic_inc’
> >>   from incompatible pointer type [-Werror=incompatible-pointer-types]
> >>    atomic_inc(dropped);
> >>               ^~~~~~~
> >> In file included from ./arch/x86/include/asm/atomic.h:265:0,
> >>                  from ./arch/x86/include/asm/msr.h:67,
> >>                  from ./arch/x86/include/asm/processor.h:21,
> >>                  from ./arch/x86/include/asm/cpufeature.h:5,
> >>                  from ./arch/x86/include/asm/thread_info.h:53,
> >>                  from ./include/linux/thread_info.h:38,
> >>                  from ./arch/x86/include/asm/preempt.h:7,
> >>                  from ./include/linux/preempt.h:78,
> >>                  from ./include/linux/spinlock.h:51,
> >>                  from ./include/linux/seqlock.h:36,
> >>                  from ./include/linux/time.h:6,
> >>                  from ./include/linux/stat.h:19,
> >>                  from ./include/linux/module.h:10,
> >>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
> >> ./include/asm-generic/atomic-instrumented.h:237:1: note: expected ‘atomic_t * {aka struct <anonymous> *}’ but argument is of type ‘int *’
> >>  atomic_inc(atomic_t *v)
> >>  ^~~~~~~~~~
> >> In file included from ./include/linux/export.h:45:0,
> >>                  from ./include/linux/linkage.h:7,
> >>                  from ./include/linux/kernel.h:8,
> >>                  from ./include/linux/list.h:9,
> >>                  from ./include/linux/module.h:9,
> >>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
> >> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘thread0’:
> >> ./include/linux/compiler.h:187:26: warning: ‘rLenPtr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >>   case 4: *(__u32 *)res = *(volatile __u32 *)p; break;  \
> >>                           ^
> >> /home/akira/bpf-rb/klitmus/litmus000.c:365:7: note: ‘rLenPtr’ was declared here
> >>   int *rLenPtr;
> >>        ^~~~~~~
> >> In file included from ./include/linux/export.h:45:0,
> >>                  from ./include/linux/linkage.h:7,
> >>                  from ./include/linux/kernel.h:8,
> >>                  from ./include/linux/list.h:9,
> >>                  from ./include/linux/module.h:9,
> >>                  from /home/akira/bpf-rb/klitmus/litmus000.c:11:
> >> /home/akira/bpf-rb/klitmus/litmus000.c: In function ‘thread1’:
> >> ./include/linux/compiler.h:225:31: warning: ‘rLenPtr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >>   case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
> >>           ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
> >> /home/akira/bpf-rb/klitmus/litmus000.c:417:7: note: ‘rLenPtr’ was declared here
> >>   int *rLenPtr;
> >>        ^~~~~~~
> >> cc1: some warnings being treated as errors
> >> scripts/Makefile.build:288: recipe for target '/home/akira/bpf-rb/klitmus/litmus000.o' failed
> >> make[2]: *** [/home/akira/bpf-rb/klitmus/litmus000.o] Error 1
> >> Makefile:1656: recipe for target '_module_/home/akira/bpf-rb/klitmus' failed
> >> make[1]: *** [_module_/home/akira/bpf-rb/klitmus] Error 2
> >> make[1]: Leaving directory '/usr/src/linux-headers-5.3.0-53-generic'
> >> Makefile:8: recipe for target 'all' failed
> >> make: *** [all] Error 2
> >>
> >> Appended below is a patch I applied to mpsc-rb+1p1c+bounded.litmus to make
> >> herd7 HEAD and klitmus7 happy. (Give or take the redundant memory barrier.)
> >>
> >> The other variants need similar changes.
> >
> > Ok, cool, thanks for letting me know. I'll see if I can upgrade
> > everything and test on my side (if you have a pointer to instructions
> > how to use klitmus7, that would be greatly appreaciated!)
> >
> >>
> >> What I did here are:
> >>
> >>     - Remove unnecessary initialization (shared variables are 0 by default)
> >>     - Declare "dropped" as atomic_t
> >
> > These two look good.
> >
> >>     - Promote rLenPtr to a shared variable LenPtr
> >
> > This one might work for single producer litmus tests, but it's wrong
> > for 2- and 3-producer cases, because it has to be local to producer.
>
> Ah, I knew I had missed something...
>
> > But I think I can work around unitialized read warning by assigning it
> > to 0 in failure case.
>
> Yes, that should work.

Ok, assigning to zero didn't work (it still complained about
uninitialized read), but using a separate int *lenFail to assign to
rLenPtr worked. Curiously, if I used rLenPtr = len1; in error case, it
actually takes a bit more time to verify.

So I've converted everything else as you suggested. I compiled latest
herd7 and it doesn't produce any warnings. But it's also extremely
slow, compared to the herd7 that I get by default. Validating simple
1p1c cases takes about 2.5x times longer (0.03s vs 0.07), but trying
to validate 2p1c case, which normally validates in 42s (unbounded) and
110s (bounded), it took more than 20 minutes and hasn't finished,
before I gave up. So I don't know what's going on there...

As for klitmus7, I managed to generate everything without warnings,
but couldn't make it build completely due to:

$ make
make -C /lib/modules/5.6.13-01802-g938d64da97c6/build/
M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
  CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
In function ‘zyva’:
/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:507:12:
warning: ISO C90 forbids variable length array ‘th’ [-Wvla]
     struct task_struct *th[nth];
            ^~~~~~~~~~~
/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
In function ‘litmus_init’:
/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:605:67:
error: passing argument 4 of ‘proc_create’ from incompatible pointer
type [-Werror=incompatible-pointer-types]
   struct proc_dir_entry *litmus_pde =
proc_create("litmus",0,NULL,&litmus_proc_fops);

^~~~~~~~~~~~~~~~~
In file included from
/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
/data/users/andriin/linux-fb/include/linux/proc_fs.h:64:24: note:
expected ‘const struct proc_ops *’ but argument is of type ‘const
struct file_operations *’
 struct proc_dir_entry *proc_create(const char *name, umode_t mode,
struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
                        ^~~~~~~~~~~
cc1: some warnings being treated as errors
make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
Error 1
make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
Error 2
make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
make: *** [all] Error 2


But at least it doesn't complain about atomic_t anymore. So anyways,
I'm going to post updated litmus tests separately from BPF ringbuf
patches, because Documentation/litmus-tests is not yet present in
bpf-next.

>
> You can find a basic introduction of klitmus7 in tools/memory-model/README.
>
>         Thanks, Akira
>
> >
> >>
> >> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
> >> klitmus7 due to a change in kernel API.
> >>
> >> Any question is welcome!
> >>
> >>         Thanks, Akira
> >>

[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 23:31               ` Andrii Nakryiko
@ 2020-05-26 10:50                 ` Akira Yokosawa
  2020-05-26 14:02                   ` Akira Yokosawa
  2020-05-26 20:15                   ` Andrii Nakryiko
  0 siblings, 2 replies; 36+ messages in thread
From: Akira Yokosawa @ 2020-05-26 10:50 UTC (permalink / raw)
  To: Andrii Nakryiko, luc.maranget
  Cc: Andrii Nakryiko, Paul E . McKenney, Alan Stern, Peter Zijlstra,
	parri.andrea, will, boqun.feng, npiggin, dhowells, j.alglave,
	dlustig, Joel Fernandes, open list, linux-arch, Akira Yokosawa

On Mon, 25 May 2020 16:31:05 -0700, Andrii Nakryiko wrote:
> On Mon, May 25, 2020 at 3:01 PM Akira Yokosawa <akiyks@gmail.com> wrote:
>>
[...]
>> Yes, that should work.
> 
> Ok, assigning to zero didn't work (it still complained about
> uninitialized read), but using a separate int *lenFail to assign to
> rLenPtr worked. Curiously, if I used rLenPtr = len1; in error case, it
> actually takes a bit more time to verify.
> 
> So I've converted everything else as you suggested. I compiled latest
> herd7 and it doesn't produce any warnings. But it's also extremely
> slow, compared to the herd7 that I get by default. Validating simple
> 1p1c cases takes about 2.5x times longer (0.03s vs 0.07), but trying
> to validate 2p1c case, which normally validates in 42s (unbounded) and
> 110s (bounded), it took more than 20 minutes and hasn't finished,
> before I gave up. So I don't know what's going on there...

herdtools7 has recently been heavily restructured.
On the performance regression, I must defer to Luc.

Luc, do you have any idea?

> 
> As for klitmus7, I managed to generate everything without warnings,
> but couldn't make it build completely due to:
> 
> $ make
> make -C /lib/modules/5.6.13-01802-g938d64da97c6/build/

So you are on Linux 5.6.x which requires cutting-edge klitmus7.

> M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
> make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
> make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
>   CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> In function ‘zyva’:
> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:507:12:
> warning: ISO C90 forbids variable length array ‘th’ [-Wvla]
>      struct task_struct *th[nth];
>             ^~~~~~~~~~~
> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> In function ‘litmus_init’:
> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:605:67:
> error: passing argument 4 of ‘proc_create’ from incompatible pointer
> type [-Werror=incompatible-pointer-types]
>    struct proc_dir_entry *litmus_pde =
> proc_create("litmus",0,NULL,&litmus_proc_fops);
> 
> ^~~~~~~~~~~~~~~~~
> In file included from
> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
> /data/users/andriin/linux-fb/include/linux/proc_fs.h:64:24: note:
> expected ‘const struct proc_ops *’ but argument is of type ‘const
> struct file_operations *’
>  struct proc_dir_entry *proc_create(const char *name, umode_t mode,
> struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
>                         ^~~~~~~~~~~
> cc1: some warnings being treated as errors
> make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
> Error 1
> make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
> Error 2
> make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
> make[1]: *** [sub-make] Error 2
> make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
> make: *** [all] Error 2
> 

These errors suggest the klitmus7 you used is version 7.52 or some such.
You said you have built herd7 from the source.  Have you also built klitmus7?

The up-to-date klitmus7 should generate code compatible with Linux 5.6.x.

Could you try with the latest one?

        Thanks, Akira

> 
> But at least it doesn't complain about atomic_t anymore. So anyways,
> I'm going to post updated litmus tests separately from BPF ringbuf
> patches, because Documentation/litmus-tests is not yet present in
> bpf-next.
> 
>>
>> You can find a basic introduction of klitmus7 in tools/memory-model/README.
>>
>>         Thanks, Akira
>>
>>>
>>>>
>>>> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
>>>> klitmus7 due to a change in kernel API.
>>>>
>>>> Any question is welcome!
>>>>
>>>>         Thanks, Akira
>>>>
> 
> [...]
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-26 10:50                 ` Akira Yokosawa
@ 2020-05-26 14:02                   ` Akira Yokosawa
  2020-05-26 20:19                     ` Andrii Nakryiko
  2020-05-26 20:15                   ` Andrii Nakryiko
  1 sibling, 1 reply; 36+ messages in thread
From: Akira Yokosawa @ 2020-05-26 14:02 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: luc.maranget, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, boqun.feng, npiggin,
	dhowells, j.alglave, dlustig, Joel Fernandes, open list,
	linux-arch

On Tue, 26 May 2020 19:50:47 +0900, Akira Yokosawa wrote:
> On Mon, 25 May 2020 16:31:05 -0700, Andrii Nakryiko wrote:
>> On Mon, May 25, 2020 at 3:01 PM Akira Yokosawa <akiyks@gmail.com> wrote:
>>>
> [...]
>>> Yes, that should work.
>>
>> Ok, assigning to zero didn't work (it still complained about
>> uninitialized read), but using a separate int *lenFail to assign to
>> rLenPtr worked. Curiously, if I used rLenPtr = len1; in error case, it
>> actually takes a bit more time to verify.
>>
>> So I've converted everything else as you suggested. I compiled latest
>> herd7 and it doesn't produce any warnings. But it's also extremely
>> slow, compared to the herd7 that I get by default. Validating simple
>> 1p1c cases takes about 2.5x times longer (0.03s vs 0.07),

Wait a moment!

This 0.03s was the run time of the original 1p1c litmus test, wasn't it?
Then you are comparing apples and oranges.

How long does your default herd7 take to complete the updated 1p1c test?

        Thanks, Akira

>>                                                           but trying
>> to validate 2p1c case, which normally validates in 42s (unbounded) and
>> 110s (bounded), it took more than 20 minutes and hasn't finished,
>> before I gave up. So I don't know what's going on there...
> 
> herdtools7 has recently been heavily restructured.
> On the performance regression, I must defer to Luc.
> 
> Luc, do you have any idea?
> 
>>
>> As for klitmus7, I managed to generate everything without warnings,
>> but couldn't make it build completely due to:
>>
>> $ make
>> make -C /lib/modules/5.6.13-01802-g938d64da97c6/build/
> 
> So you are on Linux 5.6.x which requires cutting-edge klitmus7.
> 
>> M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
>> make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
>> make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
>>   CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
>> In function ‘zyva’:
>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:507:12:
>> warning: ISO C90 forbids variable length array ‘th’ [-Wvla]
>>      struct task_struct *th[nth];
>>             ^~~~~~~~~~~
>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
>> In function ‘litmus_init’:
>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:605:67:
>> error: passing argument 4 of ‘proc_create’ from incompatible pointer
>> type [-Werror=incompatible-pointer-types]
>>    struct proc_dir_entry *litmus_pde =
>> proc_create("litmus",0,NULL,&litmus_proc_fops);
>>
>> ^~~~~~~~~~~~~~~~~
>> In file included from
>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
>> /data/users/andriin/linux-fb/include/linux/proc_fs.h:64:24: note:
>> expected ‘const struct proc_ops *’ but argument is of type ‘const
>> struct file_operations *’
>>  struct proc_dir_entry *proc_create(const char *name, umode_t mode,
>> struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
>>                         ^~~~~~~~~~~
>> cc1: some warnings being treated as errors
>> make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
>> Error 1
>> make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
>> Error 2
>> make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
>> make[1]: *** [sub-make] Error 2
>> make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
>> make: *** [all] Error 2
>>
> 
> These errors suggest the klitmus7 you used is version 7.52 or some such.
> You said you have built herd7 from the source.  Have you also built klitmus7?
> 
> The up-to-date klitmus7 should generate code compatible with Linux 5.6.x.
> 
> Could you try with the latest one?
> 
>         Thanks, Akira
> 
>>
>> But at least it doesn't complain about atomic_t anymore. So anyways,
>> I'm going to post updated litmus tests separately from BPF ringbuf
>> patches, because Documentation/litmus-tests is not yet present in
>> bpf-next.
>>
>>>
>>> You can find a basic introduction of klitmus7 in tools/memory-model/README.
>>>
>>>         Thanks, Akira
>>>
>>>>
>>>>>
>>>>> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
>>>>> klitmus7 due to a change in kernel API.
>>>>>
>>>>> Any question is welcome!
>>>>>
>>>>>         Thanks, Akira
>>>>>
>>
>> [...]
>>
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-26 10:50                 ` Akira Yokosawa
  2020-05-26 14:02                   ` Akira Yokosawa
@ 2020-05-26 20:15                   ` Andrii Nakryiko
  2020-05-26 22:23                     ` Akira Yokosawa
  1 sibling, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-26 20:15 UTC (permalink / raw)
  To: Akira Yokosawa
  Cc: luc.maranget, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, Boqun Feng, npiggin,
	dhowells, j.alglave, dlustig, Joel Fernandes, open list,
	linux-arch

On Tue, May 26, 2020 at 3:50 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>
> On Mon, 25 May 2020 16:31:05 -0700, Andrii Nakryiko wrote:
> > On Mon, May 25, 2020 at 3:01 PM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>
> [...]
> >> Yes, that should work.
> >
> > Ok, assigning to zero didn't work (it still complained about
> > uninitialized read), but using a separate int *lenFail to assign to
> > rLenPtr worked. Curiously, if I used rLenPtr = len1; in error case, it
> > actually takes a bit more time to verify.
> >
> > So I've converted everything else as you suggested. I compiled latest
> > herd7 and it doesn't produce any warnings. But it's also extremely
> > slow, compared to the herd7 that I get by default. Validating simple
> > 1p1c cases takes about 2.5x times longer (0.03s vs 0.07), but trying
> > to validate 2p1c case, which normally validates in 42s (unbounded) and
> > 110s (bounded), it took more than 20 minutes and hasn't finished,
> > before I gave up. So I don't know what's going on there...
>
> herdtools7 has recently been heavily restructured.
> On the performance regression, I must defer to Luc.
>
> Luc, do you have any idea?
>
> >
> > As for klitmus7, I managed to generate everything without warnings,
> > but couldn't make it build completely due to:
> >
> > $ make
> > make -C /lib/modules/5.6.13-01802-g938d64da97c6/build/
>
> So you are on Linux 5.6.x which requires cutting-edge klitmus7.
>

Right, so I retried with the klitmus7 built from sources:

$ klitmus7 -version

                                    7.55+01(dev)

Still can't compile, though task_struct problem went away, proc_ops
error is still present:

$ make
grep: /lib/modules/5.7.0-rc5-02014-gb16540c748e9/build/include/linux/proc_fs.h:
No such file or directory
make -C /lib/modules/5.7.0-rc5-02014-gb16540c748e9/build/
M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
grep: /lib/modules/5.7.0-rc5-02014-gb16540c748e9/build/include/linux/proc_fs.h:
No such file or directory
  CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
In function ‘litmus_init’:
/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:621:67:
error: passing argument 4 of ‘proc_create’ from incompatible pointer
type [-Werror=incompatible-pointer-types]
   struct proc_dir_entry *litmus_pde =
proc_create("litmus",0,NULL,&litmus_proc_ops);

^~~~~~~~~~~~~~~~
In file included from
/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
/data/users/andriin/linux/include/linux/proc_fs.h:79:24: note:
expected ‘const struct proc_ops *’ but argument is of type ‘const
struct file_operations *’
 struct proc_dir_entry *proc_create(const char *name, umode_t mode,
struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
                        ^~~~~~~~~~~
cc1: some warnings being treated as errors
make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
Error 1
make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
Error 2
make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
make: *** [all] Error 2

Don't know if I'm missing some headers or whatever.

> > M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
> > make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
> > make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
> >   CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
> > /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> > In function ‘zyva’:
> > /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:507:12:
> > warning: ISO C90 forbids variable length array ‘th’ [-Wvla]
> >      struct task_struct *th[nth];
> >             ^~~~~~~~~~~
> > /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> > In function ‘litmus_init’:
> > /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:605:67:
> > error: passing argument 4 of ‘proc_create’ from incompatible pointer
> > type [-Werror=incompatible-pointer-types]
> >    struct proc_dir_entry *litmus_pde =
> > proc_create("litmus",0,NULL,&litmus_proc_fops);
> >
> > ^~~~~~~~~~~~~~~~~
> > In file included from
> > /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
> > /data/users/andriin/linux-fb/include/linux/proc_fs.h:64:24: note:
> > expected ‘const struct proc_ops *’ but argument is of type ‘const
> > struct file_operations *’
> >  struct proc_dir_entry *proc_create(const char *name, umode_t mode,
> > struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
> >                         ^~~~~~~~~~~
> > cc1: some warnings being treated as errors
> > make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
> > Error 1
> > make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
> > Error 2
> > make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
> > make[1]: *** [sub-make] Error 2
> > make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
> > make: *** [all] Error 2
> >
>
> These errors suggest the klitmus7 you used is version 7.52 or some such.
> You said you have built herd7 from the source.  Have you also built klitmus7?

I did, but it wasn't in the PATH. I retried with latest klitmus7 and
still run into problems.

>
> The up-to-date klitmus7 should generate code compatible with Linux 5.6.x.
>
> Could you try with the latest one?
>
>         Thanks, Akira
>
> >
> > But at least it doesn't complain about atomic_t anymore. So anyways,
> > I'm going to post updated litmus tests separately from BPF ringbuf
> > patches, because Documentation/litmus-tests is not yet present in
> > bpf-next.
> >
> >>
> >> You can find a basic introduction of klitmus7 in tools/memory-model/README.
> >>
> >>         Thanks, Akira
> >>
> >>>
> >>>>
> >>>> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
> >>>> klitmus7 due to a change in kernel API.
> >>>>
> >>>> Any question is welcome!
> >>>>
> >>>>         Thanks, Akira
> >>>>
> >
> > [...]
> >
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-26 14:02                   ` Akira Yokosawa
@ 2020-05-26 20:19                     ` Andrii Nakryiko
  2020-05-26 23:00                       ` Akira Yokosawa
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-26 20:19 UTC (permalink / raw)
  To: Akira Yokosawa
  Cc: luc.maranget, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, Boqun Feng, npiggin,
	dhowells, j.alglave, dlustig, Joel Fernandes, open list,
	linux-arch

On Tue, May 26, 2020 at 7:02 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>
> On Tue, 26 May 2020 19:50:47 +0900, Akira Yokosawa wrote:
> > On Mon, 25 May 2020 16:31:05 -0700, Andrii Nakryiko wrote:
> >> On Mon, May 25, 2020 at 3:01 PM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>>
> > [...]
> >>> Yes, that should work.
> >>
> >> Ok, assigning to zero didn't work (it still complained about
> >> uninitialized read), but using a separate int *lenFail to assign to
> >> rLenPtr worked. Curiously, if I used rLenPtr = len1; in error case, it
> >> actually takes a bit more time to verify.
> >>
> >> So I've converted everything else as you suggested. I compiled latest
> >> herd7 and it doesn't produce any warnings. But it's also extremely
> >> slow, compared to the herd7 that I get by default. Validating simple
> >> 1p1c cases takes about 2.5x times longer (0.03s vs 0.07),
>
> Wait a moment!
>
> This 0.03s was the run time of the original 1p1c litmus test, wasn't it?
> Then you are comparing apples and oranges.
>
> How long does your default herd7 take to complete the updated 1p1c test?
>
>         Thanks, Akira

It could be new test vs old test, so I re-ran again. Identical
1p1c-unbound test:

OLD version:

$ herd7 -version && herd7 -unroll 0 -conf linux-kernel.cfg
../../Documentation/litmus-tests/bpf-rb/bpf-rb+1p1c+unbound.litmus
7.52, Rev: exported
Test bpf-rb+1p1c+unbound Allowed
States 2
0:rFail=0; 1:rFail=0; cx=0; len1=1; px=1;
0:rFail=0; 1:rFail=0; cx=1; len1=1; px=1;
Ok
Witnesses
Positive: 3 Negative: 0
Condition exists (0:rFail=0 /\ 1:rFail=0 /\ px=1 /\ len1=1 /\ (cx=0 \/ cx=1))
Observation bpf-rb+1p1c+unbound Always 3 0
Time bpf-rb+1p1c+unbound 0.03
Hash=20a68cc69b09fbb79f407f825c015623

LATEST from sources version:

$ herd7 -version && herd7 -unroll 0 -conf linux-kernel.cfg
../../Documentation/litmus-tests/bpf-rb/bpf-rb+1p1c+unbound.litmus
7.55+01(dev), Rev: 61e23aaee7bba87ccf4cdf1a620a3a9fa8f9a586
Test bpf-rb+1p1c+unbound Allowed
States 2
0:rFail=0; 1:rFail=0; cx=0; len1=1; px=1;
0:rFail=0; 1:rFail=0; cx=1; len1=1; px=1;
Ok
Witnesses
Positive: 3 Negative: 0
Condition exists (0:rFail=0 /\ 1:rFail=0 /\ px=1 /\ len1=1 /\ (cx=0 \/ cx=1))
Observation bpf-rb+1p1c+unbound Always 3 0
Time bpf-rb+1p1c+unbound 0.06
Hash=20a68cc69b09fbb79f407f825c015623

Still 2x difference.

>
> >>                                                           but trying
> >> to validate 2p1c case, which normally validates in 42s (unbounded) and
> >> 110s (bounded), it took more than 20 minutes and hasn't finished,
> >> before I gave up. So I don't know what's going on there...
> >
> > herdtools7 has recently been heavily restructured.
> > On the performance regression, I must defer to Luc.
> >
> > Luc, do you have any idea?
> >
> >>
> >> As for klitmus7, I managed to generate everything without warnings,
> >> but couldn't make it build completely due to:
> >>
> >> $ make
> >> make -C /lib/modules/5.6.13-01802-g938d64da97c6/build/
> >
> > So you are on Linux 5.6.x which requires cutting-edge klitmus7.
> >
> >> M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
> >> make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
> >> make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
> >>   CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
> >> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> >> In function ‘zyva’:
> >> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:507:12:
> >> warning: ISO C90 forbids variable length array ‘th’ [-Wvla]
> >>      struct task_struct *th[nth];
> >>             ^~~~~~~~~~~
> >> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> >> In function ‘litmus_init’:
> >> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:605:67:
> >> error: passing argument 4 of ‘proc_create’ from incompatible pointer
> >> type [-Werror=incompatible-pointer-types]
> >>    struct proc_dir_entry *litmus_pde =
> >> proc_create("litmus",0,NULL,&litmus_proc_fops);
> >>
> >> ^~~~~~~~~~~~~~~~~
> >> In file included from
> >> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
> >> /data/users/andriin/linux-fb/include/linux/proc_fs.h:64:24: note:
> >> expected ‘const struct proc_ops *’ but argument is of type ‘const
> >> struct file_operations *’
> >>  struct proc_dir_entry *proc_create(const char *name, umode_t mode,
> >> struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
> >>                         ^~~~~~~~~~~
> >> cc1: some warnings being treated as errors
> >> make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
> >> Error 1
> >> make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
> >> Error 2
> >> make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
> >> make[1]: *** [sub-make] Error 2
> >> make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
> >> make: *** [all] Error 2
> >>
> >
> > These errors suggest the klitmus7 you used is version 7.52 or some such.
> > You said you have built herd7 from the source.  Have you also built klitmus7?
> >
> > The up-to-date klitmus7 should generate code compatible with Linux 5.6.x.
> >
> > Could you try with the latest one?
> >
> >         Thanks, Akira
> >
> >>
> >> But at least it doesn't complain about atomic_t anymore. So anyways,
> >> I'm going to post updated litmus tests separately from BPF ringbuf
> >> patches, because Documentation/litmus-tests is not yet present in
> >> bpf-next.
> >>
> >>>
> >>> You can find a basic introduction of klitmus7 in tools/memory-model/README.
> >>>
> >>>         Thanks, Akira
> >>>
> >>>>
> >>>>>
> >>>>> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
> >>>>> klitmus7 due to a change in kernel API.
> >>>>>
> >>>>> Any question is welcome!
> >>>>>
> >>>>>         Thanks, Akira
> >>>>>
> >>
> >> [...]
> >>
> >
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-26 20:15                   ` Andrii Nakryiko
@ 2020-05-26 22:23                     ` Akira Yokosawa
  0 siblings, 0 replies; 36+ messages in thread
From: Akira Yokosawa @ 2020-05-26 22:23 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: luc.maranget, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, Boqun Feng, npiggin,
	dhowells, j.alglave, dlustig, Joel Fernandes, open list,
	linux-arch

On Tue, 26 May 2020 13:15:49 -0700, Andrii Nakryiko wrote:
> On Tue, May 26, 2020 at 3:50 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>>
>> On Mon, 25 May 2020 16:31:05 -0700, Andrii Nakryiko wrote:
>>> On Mon, May 25, 2020 at 3:01 PM Akira Yokosawa <akiyks@gmail.com> wrote:
>>>>
>> [...]
>>>> Yes, that should work.
>>>
>>> Ok, assigning to zero didn't work (it still complained about
>>> uninitialized read), but using a separate int *lenFail to assign to
>>> rLenPtr worked. Curiously, if I used rLenPtr = len1; in error case, it
>>> actually takes a bit more time to verify.
>>>
>>> So I've converted everything else as you suggested. I compiled latest
>>> herd7 and it doesn't produce any warnings. But it's also extremely
>>> slow, compared to the herd7 that I get by default. Validating simple
>>> 1p1c cases takes about 2.5x times longer (0.03s vs 0.07), but trying
>>> to validate 2p1c case, which normally validates in 42s (unbounded) and
>>> 110s (bounded), it took more than 20 minutes and hasn't finished,
>>> before I gave up. So I don't know what's going on there...
>>
>> herdtools7 has recently been heavily restructured.
>> On the performance regression, I must defer to Luc.
>>
>> Luc, do you have any idea?
>>
>>>
>>> As for klitmus7, I managed to generate everything without warnings,
>>> but couldn't make it build completely due to:
>>>
>>> $ make
>>> make -C /lib/modules/5.6.13-01802-g938d64da97c6/build/
>>
>> So you are on Linux 5.6.x which requires cutting-edge klitmus7.
>>
> 
> Right, so I retried with the klitmus7 built from sources:
> 
> $ klitmus7 -version
> 
>                                     7.55+01(dev)
> 
> Still can't compile, though task_struct problem went away, proc_ops
> error is still present:
> 
> $ make
> grep: /lib/modules/5.7.0-rc5-02014-gb16540c748e9/build/include/linux/proc_fs.h:
> No such file or directory

Hmm, have you installed kernel-headers on your system?
You are now on Linux 5.7-rc5 based system...

        Thanks, Akira

> make -C /lib/modules/5.7.0-rc5-02014-gb16540c748e9/build/
> M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
> make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
> make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
> grep: /lib/modules/5.7.0-rc5-02014-gb16540c748e9/build/include/linux/proc_fs.h:
> No such file or directory
>   CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> In function ‘litmus_init’:
> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:621:67:
> error: passing argument 4 of ‘proc_create’ from incompatible pointer
> type [-Werror=incompatible-pointer-types]
>    struct proc_dir_entry *litmus_pde =
> proc_create("litmus",0,NULL,&litmus_proc_ops);
> 
> ^~~~~~~~~~~~~~~~
> In file included from
> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
> /data/users/andriin/linux/include/linux/proc_fs.h:79:24: note:
> expected ‘const struct proc_ops *’ but argument is of type ‘const
> struct file_operations *’
>  struct proc_dir_entry *proc_create(const char *name, umode_t mode,
> struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
>                         ^~~~~~~~~~~
> cc1: some warnings being treated as errors
> make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
> Error 1
> make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
> Error 2
> make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
> make[1]: *** [sub-make] Error 2
> make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
> make: *** [all] Error 2
> 
> Don't know if I'm missing some headers or whatever.
> 
>>> M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
>>> make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
>>> make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
>>>   CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
>>> In function ‘zyva’:
>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:507:12:
>>> warning: ISO C90 forbids variable length array ‘th’ [-Wvla]
>>>      struct task_struct *th[nth];
>>>             ^~~~~~~~~~~
>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
>>> In function ‘litmus_init’:
>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:605:67:
>>> error: passing argument 4 of ‘proc_create’ from incompatible pointer
>>> type [-Werror=incompatible-pointer-types]
>>>    struct proc_dir_entry *litmus_pde =
>>> proc_create("litmus",0,NULL,&litmus_proc_fops);
>>>
>>> ^~~~~~~~~~~~~~~~~
>>> In file included from
>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
>>> /data/users/andriin/linux-fb/include/linux/proc_fs.h:64:24: note:
>>> expected ‘const struct proc_ops *’ but argument is of type ‘const
>>> struct file_operations *’
>>>  struct proc_dir_entry *proc_create(const char *name, umode_t mode,
>>> struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
>>>                         ^~~~~~~~~~~
>>> cc1: some warnings being treated as errors
>>> make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
>>> Error 1
>>> make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
>>> Error 2
>>> make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
>>> make[1]: *** [sub-make] Error 2
>>> make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
>>> make: *** [all] Error 2
>>>
>>
>> These errors suggest the klitmus7 you used is version 7.52 or some such.
>> You said you have built herd7 from the source.  Have you also built klitmus7?
> 
> I did, but it wasn't in the PATH. I retried with latest klitmus7 and
> still run into problems.
> 
>>
>> The up-to-date klitmus7 should generate code compatible with Linux 5.6.x.
>>
>> Could you try with the latest one?
>>
>>         Thanks, Akira
>>
>>>
>>> But at least it doesn't complain about atomic_t anymore. So anyways,
>>> I'm going to post updated litmus tests separately from BPF ringbuf
>>> patches, because Documentation/litmus-tests is not yet present in
>>> bpf-next.
>>>
>>>>
>>>> You can find a basic introduction of klitmus7 in tools/memory-model/README.
>>>>
>>>>         Thanks, Akira
>>>>
>>>>>
>>>>>>
>>>>>> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
>>>>>> klitmus7 due to a change in kernel API.
>>>>>>
>>>>>> Any question is welcome!
>>>>>>
>>>>>>         Thanks, Akira
>>>>>>
>>>
>>> [...]
>>>
>>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-26 20:19                     ` Andrii Nakryiko
@ 2020-05-26 23:00                       ` Akira Yokosawa
  2020-05-27  0:09                         ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Akira Yokosawa @ 2020-05-26 23:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: luc.maranget, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, Boqun Feng, npiggin,
	dhowells, j.alglave, dlustig, Joel Fernandes, open list,
	linux-arch

On Tue, 26 May 2020 13:19:36 -0700, Andrii Nakryiko wrote:
> On Tue, May 26, 2020 at 7:02 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>>
>> On Tue, 26 May 2020 19:50:47 +0900, Akira Yokosawa wrote:
>>> On Mon, 25 May 2020 16:31:05 -0700, Andrii Nakryiko wrote:
>>>> On Mon, May 25, 2020 at 3:01 PM Akira Yokosawa <akiyks@gmail.com> wrote:
>>>>>
>>> [...]
>>>>> Yes, that should work.
>>>>
>>>> Ok, assigning to zero didn't work (it still complained about
>>>> uninitialized read), but using a separate int *lenFail to assign to
>>>> rLenPtr worked. Curiously, if I used rLenPtr = len1; in error case, it
>>>> actually takes a bit more time to verify.
>>>>
>>>> So I've converted everything else as you suggested. I compiled latest
>>>> herd7 and it doesn't produce any warnings. But it's also extremely
>>>> slow, compared to the herd7 that I get by default. Validating simple
>>>> 1p1c cases takes about 2.5x times longer (0.03s vs 0.07),
>>
>> Wait a moment!
>>
>> This 0.03s was the run time of the original 1p1c litmus test, wasn't it?
>> Then you are comparing apples and oranges.
>>
>> How long does your default herd7 take to complete the updated 1p1c test?
>>
>>         Thanks, Akira
> 
> It could be new test vs old test, so I re-ran again. Identical
> 1p1c-unbound test:
> 
> OLD version:
> 
> $ herd7 -version && herd7 -unroll 0 -conf linux-kernel.cfg
> ../../Documentation/litmus-tests/bpf-rb/bpf-rb+1p1c+unbound.litmus
> 7.52, Rev: exported
> Test bpf-rb+1p1c+unbound Allowed
> States 2
> 0:rFail=0; 1:rFail=0; cx=0; len1=1; px=1;
> 0:rFail=0; 1:rFail=0; cx=1; len1=1; px=1;
> Ok
> Witnesses
> Positive: 3 Negative: 0
> Condition exists (0:rFail=0 /\ 1:rFail=0 /\ px=1 /\ len1=1 /\ (cx=0 \/ cx=1))
> Observation bpf-rb+1p1c+unbound Always 3 0
> Time bpf-rb+1p1c+unbound 0.03
> Hash=20a68cc69b09fbb79f407f825c015623
> 
> LATEST from sources version:
> 
> $ herd7 -version && herd7 -unroll 0 -conf linux-kernel.cfg
> ../../Documentation/litmus-tests/bpf-rb/bpf-rb+1p1c+unbound.litmus
> 7.55+01(dev), Rev: 61e23aaee7bba87ccf4cdf1a620a3a9fa8f9a586
> Test bpf-rb+1p1c+unbound Allowed
> States 2
> 0:rFail=0; 1:rFail=0; cx=0; len1=1; px=1;
> 0:rFail=0; 1:rFail=0; cx=1; len1=1; px=1;
> Ok
> Witnesses
> Positive: 3 Negative: 0
> Condition exists (0:rFail=0 /\ 1:rFail=0 /\ px=1 /\ len1=1 /\ (cx=0 \/ cx=1))
> Observation bpf-rb+1p1c+unbound Always 3 0
> Time bpf-rb+1p1c+unbound 0.06
> Hash=20a68cc69b09fbb79f407f825c015623
> 
> Still 2x difference.

I see opposite tendency on a different set of time consuming
litmus tests comparing herd7 7.52 and HEAD.

                                                herd7 7.52     herd7 HEAD
C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u      8.44           6.12
C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u-C           77.19          69.92
C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u-CE         355.62         287.27
C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u-X          157.87         191.50
C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u              2.36           0.94
C-SB+l-o-o-u+l-o-o-u-+l-o-o-u-C                   2.32           0.93
C-SB+l-o-o-u+l-o-o-u-+l-o-o-u-CE                  5.64           3.52
C-SB+l-o-o-u+l-o-o-u+l-o-o-u-X                    3.18           2.52
C-SB+l-o-o-u+l-o-o-u+l-o-o-u-XE                  11.81          10.35
C-SB+l-o-o-u+l-o-o-u+l-o-o-u                      0.25           0.19
C-SB+l-o-o-u+l-o-o-u-C                            0.15           0.12
C-SB+l-o-o-u+l-o-o-u-CE                           0.26           0.20
C-SB+l-o-o-u+l-o-o-u-X                            0.17           0.14
C-SB+l-o-o-u+l-o-o-u-XE                           0.38           0.30
C-SB+l-o-o-u+l-o-o-u                              0.04           0.03

NOTE: These were taken on a fairly old PC, with power-saving mode enabled.

Did you used the original 1p1c unbound test?
I'd like you to compare the updated 1p1c unbound test.

        Thanks, Akira

> 
>>
>>>>                                                           but trying
>>>> to validate 2p1c case, which normally validates in 42s (unbounded) and
>>>> 110s (bounded), it took more than 20 minutes and hasn't finished,
>>>> before I gave up. So I don't know what's going on there...
>>>
>>> herdtools7 has recently been heavily restructured.
>>> On the performance regression, I must defer to Luc.
>>>
>>> Luc, do you have any idea?
>>>
>>>>
>>>> As for klitmus7, I managed to generate everything without warnings,
>>>> but couldn't make it build completely due to:
>>>>
>>>> $ make
>>>> make -C /lib/modules/5.6.13-01802-g938d64da97c6/build/
>>>
>>> So you are on Linux 5.6.x which requires cutting-edge klitmus7.
>>>
>>>> M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
>>>> make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
>>>> make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
>>>>   CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
>>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
>>>> In function ‘zyva’:
>>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:507:12:
>>>> warning: ISO C90 forbids variable length array ‘th’ [-Wvla]
>>>>      struct task_struct *th[nth];
>>>>             ^~~~~~~~~~~
>>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
>>>> In function ‘litmus_init’:
>>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:605:67:
>>>> error: passing argument 4 of ‘proc_create’ from incompatible pointer
>>>> type [-Werror=incompatible-pointer-types]
>>>>    struct proc_dir_entry *litmus_pde =
>>>> proc_create("litmus",0,NULL,&litmus_proc_fops);
>>>>
>>>> ^~~~~~~~~~~~~~~~~
>>>> In file included from
>>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
>>>> /data/users/andriin/linux-fb/include/linux/proc_fs.h:64:24: note:
>>>> expected ‘const struct proc_ops *’ but argument is of type ‘const
>>>> struct file_operations *’
>>>>  struct proc_dir_entry *proc_create(const char *name, umode_t mode,
>>>> struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
>>>>                         ^~~~~~~~~~~
>>>> cc1: some warnings being treated as errors
>>>> make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
>>>> Error 1
>>>> make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
>>>> Error 2
>>>> make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
>>>> make[1]: *** [sub-make] Error 2
>>>> make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
>>>> make: *** [all] Error 2
>>>>
>>>
>>> These errors suggest the klitmus7 you used is version 7.52 or some such.
>>> You said you have built herd7 from the source.  Have you also built klitmus7?
>>>
>>> The up-to-date klitmus7 should generate code compatible with Linux 5.6.x.
>>>
>>> Could you try with the latest one?
>>>
>>>         Thanks, Akira
>>>
>>>>
>>>> But at least it doesn't complain about atomic_t anymore. So anyways,
>>>> I'm going to post updated litmus tests separately from BPF ringbuf
>>>> patches, because Documentation/litmus-tests is not yet present in
>>>> bpf-next.
>>>>
>>>>>
>>>>> You can find a basic introduction of klitmus7 in tools/memory-model/README.
>>>>>
>>>>>         Thanks, Akira
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
>>>>>>> klitmus7 due to a change in kernel API.
>>>>>>>
>>>>>>> Any question is welcome!
>>>>>>>
>>>>>>>         Thanks, Akira
>>>>>>>
>>>>
>>>> [...]
>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-26 23:00                       ` Akira Yokosawa
@ 2020-05-27  0:09                         ` Andrii Nakryiko
  0 siblings, 0 replies; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-27  0:09 UTC (permalink / raw)
  To: Akira Yokosawa
  Cc: luc.maranget, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, Boqun Feng, npiggin,
	dhowells, j.alglave, dlustig, Joel Fernandes, open list,
	linux-arch

On Tue, May 26, 2020 at 4:00 PM Akira Yokosawa <akiyks@gmail.com> wrote:
>
> On Tue, 26 May 2020 13:19:36 -0700, Andrii Nakryiko wrote:
> > On Tue, May 26, 2020 at 7:02 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>
> >> On Tue, 26 May 2020 19:50:47 +0900, Akira Yokosawa wrote:
> >>> On Mon, 25 May 2020 16:31:05 -0700, Andrii Nakryiko wrote:
> >>>> On Mon, May 25, 2020 at 3:01 PM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>>>>
> >>> [...]
> >>>>> Yes, that should work.
> >>>>
> >>>> Ok, assigning to zero didn't work (it still complained about
> >>>> uninitialized read), but using a separate int *lenFail to assign to
> >>>> rLenPtr worked. Curiously, if I used rLenPtr = len1; in error case, it
> >>>> actually takes a bit more time to verify.
> >>>>
> >>>> So I've converted everything else as you suggested. I compiled latest
> >>>> herd7 and it doesn't produce any warnings. But it's also extremely
> >>>> slow, compared to the herd7 that I get by default. Validating simple
> >>>> 1p1c cases takes about 2.5x times longer (0.03s vs 0.07),
> >>
> >> Wait a moment!
> >>
> >> This 0.03s was the run time of the original 1p1c litmus test, wasn't it?
> >> Then you are comparing apples and oranges.
> >>
> >> How long does your default herd7 take to complete the updated 1p1c test?
> >>
> >>         Thanks, Akira
> >
> > It could be new test vs old test, so I re-ran again. Identical
> > 1p1c-unbound test:
> >
> > OLD version:
> >
> > $ herd7 -version && herd7 -unroll 0 -conf linux-kernel.cfg
> > ../../Documentation/litmus-tests/bpf-rb/bpf-rb+1p1c+unbound.litmus
> > 7.52, Rev: exported
> > Test bpf-rb+1p1c+unbound Allowed
> > States 2
> > 0:rFail=0; 1:rFail=0; cx=0; len1=1; px=1;
> > 0:rFail=0; 1:rFail=0; cx=1; len1=1; px=1;
> > Ok
> > Witnesses
> > Positive: 3 Negative: 0
> > Condition exists (0:rFail=0 /\ 1:rFail=0 /\ px=1 /\ len1=1 /\ (cx=0 \/ cx=1))
> > Observation bpf-rb+1p1c+unbound Always 3 0
> > Time bpf-rb+1p1c+unbound 0.03
> > Hash=20a68cc69b09fbb79f407f825c015623
> >
> > LATEST from sources version:
> >
> > $ herd7 -version && herd7 -unroll 0 -conf linux-kernel.cfg
> > ../../Documentation/litmus-tests/bpf-rb/bpf-rb+1p1c+unbound.litmus
> > 7.55+01(dev), Rev: 61e23aaee7bba87ccf4cdf1a620a3a9fa8f9a586
> > Test bpf-rb+1p1c+unbound Allowed
> > States 2
> > 0:rFail=0; 1:rFail=0; cx=0; len1=1; px=1;
> > 0:rFail=0; 1:rFail=0; cx=1; len1=1; px=1;
> > Ok
> > Witnesses
> > Positive: 3 Negative: 0
> > Condition exists (0:rFail=0 /\ 1:rFail=0 /\ px=1 /\ len1=1 /\ (cx=0 \/ cx=1))
> > Observation bpf-rb+1p1c+unbound Always 3 0
> > Time bpf-rb+1p1c+unbound 0.06
> > Hash=20a68cc69b09fbb79f407f825c015623
> >
> > Still 2x difference.
>
> I see opposite tendency on a different set of time consuming
> litmus tests comparing herd7 7.52 and HEAD.
>
>                                                 herd7 7.52     herd7 HEAD
> C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u      8.44           6.12
> C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u-C           77.19          69.92
> C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u-CE         355.62         287.27
> C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u-X          157.87         191.50
> C-SB+l-o-o-u+l-o-o-u+l-o-o-u+l-o-o-u              2.36           0.94
> C-SB+l-o-o-u+l-o-o-u-+l-o-o-u-C                   2.32           0.93
> C-SB+l-o-o-u+l-o-o-u-+l-o-o-u-CE                  5.64           3.52
> C-SB+l-o-o-u+l-o-o-u+l-o-o-u-X                    3.18           2.52
> C-SB+l-o-o-u+l-o-o-u+l-o-o-u-XE                  11.81          10.35
> C-SB+l-o-o-u+l-o-o-u+l-o-o-u                      0.25           0.19
> C-SB+l-o-o-u+l-o-o-u-C                            0.15           0.12
> C-SB+l-o-o-u+l-o-o-u-CE                           0.26           0.20
> C-SB+l-o-o-u+l-o-o-u-X                            0.17           0.14
> C-SB+l-o-o-u+l-o-o-u-XE                           0.38           0.30
> C-SB+l-o-o-u+l-o-o-u                              0.04           0.03
>
> NOTE: These were taken on a fairly old PC, with power-saving mode enabled.
>
> Did you used the original 1p1c unbound test?
> I'd like you to compare the updated 1p1c unbound test.

No, that was updated one. I'll try another kernel a bit later (with
proper kernel-devel package), currently re-running 3p1c test to see
how long it takes.

>
>         Thanks, Akira
>
> >
> >>
> >>>>                                                           but trying
> >>>> to validate 2p1c case, which normally validates in 42s (unbounded) and
> >>>> 110s (bounded), it took more than 20 minutes and hasn't finished,
> >>>> before I gave up. So I don't know what's going on there...
> >>>
> >>> herdtools7 has recently been heavily restructured.
> >>> On the performance regression, I must defer to Luc.
> >>>
> >>> Luc, do you have any idea?
> >>>
> >>>>
> >>>> As for klitmus7, I managed to generate everything without warnings,
> >>>> but couldn't make it build completely due to:
> >>>>
> >>>> $ make
> >>>> make -C /lib/modules/5.6.13-01802-g938d64da97c6/build/
> >>>
> >>> So you are on Linux 5.6.x which requires cutting-edge klitmus7.
> >>>
> >>>> M=/home/andriin/local/linux-trees/tools/memory-model/mymodules modules
> >>>> make[1]: Entering directory `/data/users/andriin/linux-build/fb-config'
> >>>> make[2]: Entering directory `/data/users/andriin/linux-build/default-x86_64'
> >>>>   CC [M]  /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o
> >>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> >>>> In function ‘zyva’:
> >>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:507:12:
> >>>> warning: ISO C90 forbids variable length array ‘th’ [-Wvla]
> >>>>      struct task_struct *th[nth];
> >>>>             ^~~~~~~~~~~
> >>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:
> >>>> In function ‘litmus_init’:
> >>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:605:67:
> >>>> error: passing argument 4 of ‘proc_create’ from incompatible pointer
> >>>> type [-Werror=incompatible-pointer-types]
> >>>>    struct proc_dir_entry *litmus_pde =
> >>>> proc_create("litmus",0,NULL,&litmus_proc_fops);
> >>>>
> >>>> ^~~~~~~~~~~~~~~~~
> >>>> In file included from
> >>>> /home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.c:15:
> >>>> /data/users/andriin/linux-fb/include/linux/proc_fs.h:64:24: note:
> >>>> expected ‘const struct proc_ops *’ but argument is of type ‘const
> >>>> struct file_operations *’
> >>>>  struct proc_dir_entry *proc_create(const char *name, umode_t mode,
> >>>> struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
> >>>>                         ^~~~~~~~~~~
> >>>> cc1: some warnings being treated as errors
> >>>> make[3]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules/litmus000.o]
> >>>> Error 1
> >>>> make[2]: *** [/home/andriin/local/linux-trees/tools/memory-model/mymodules]
> >>>> Error 2
> >>>> make[2]: Leaving directory `/data/users/andriin/linux-build/default-x86_64'
> >>>> make[1]: *** [sub-make] Error 2
> >>>> make[1]: Leaving directory `/data/users/andriin/linux-build/fb-config'
> >>>> make: *** [all] Error 2
> >>>>
> >>>
> >>> These errors suggest the klitmus7 you used is version 7.52 or some such.
> >>> You said you have built herd7 from the source.  Have you also built klitmus7?
> >>>
> >>> The up-to-date klitmus7 should generate code compatible with Linux 5.6.x.
> >>>
> >>> Could you try with the latest one?
> >>>
> >>>         Thanks, Akira
> >>>
> >>>>
> >>>> But at least it doesn't complain about atomic_t anymore. So anyways,
> >>>> I'm going to post updated litmus tests separately from BPF ringbuf
> >>>> patches, because Documentation/litmus-tests is not yet present in
> >>>> bpf-next.
> >>>>
> >>>>>
> >>>>> You can find a basic introduction of klitmus7 in tools/memory-model/README.
> >>>>>
> >>>>>         Thanks, Akira
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Please note that if you are on Linux 5.6 (or later), you need an up-to-date
> >>>>>>> klitmus7 due to a change in kernel API.
> >>>>>>>
> >>>>>>> Any question is welcome!
> >>>>>>>
> >>>>>>>         Thanks, Akira
> >>>>>>>
> >>>>
> >>>> [...]
> >>>>
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 18:38           ` Andrii Nakryiko
@ 2020-05-28 21:48             ` Joel Fernandes
  2020-05-29  4:38               ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Fernandes @ 2020-05-28 21:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Boqun Feng, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, open list, linux-arch

On Mon, May 25, 2020 at 11:38:23AM -0700, Andrii Nakryiko wrote:
> On Mon, May 25, 2020 at 7:53 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > Hi Andrii,
> >
> > On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> > > On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > > > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> > > > > On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > > > > > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > > > > > Hello!
> > > > > > >
> > > > > > > Just wanted to call your attention to some pretty cool and pretty serious
> > > > > > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > > > > >
> > > > > > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > > > > >
> > > > > > > Thoughts?
> > > > > >
> > > > > > I find:
> > > > > >
> > > > > >         smp_wmb()
> > > > > >         smp_store_release()
> > > > > >
> > > > > > a _very_ weird construct. What is that supposed to even do?
> > > > >
> > > > > Indeed, it looks like one or the other of those is redundant (depending
> > > > > on the context).
> > > >
> > > > Probably.  Peter instead asked what it was supposed to even do.  ;-)
> > >
> > > I agree, I think smp_wmb() is redundant here. Can't remember why I thought
> > > that it's necessary, this algorithm went through a bunch of iterations,
> > > starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some
> > > point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe
> > > there was some reason, but might be that I was just over-cautious. See reply
> > > on patch thread as well ([0]).
> > >
> > >   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> > >
> >
> > While we are at it, could you explain a bit on why you use
> > smp_store_release() on consumer_pos? I ask because IIUC, consumer_pos is
> > only updated at consumer side, and there is no other write at consumer
> > side that we want to order with the write to consumer_pos. So I fail
> > to find why smp_store_release() is necessary.
> >
> > I did the following modification on litmus tests, and I didn't see
> > different results (on States) between two versions of litmus tests.
> >
> 
> This is needed to ensure that producer can reliably detect whether it
> needs to trigger poll notification.

Boqun's question is on the consumer side though. Are you saying that on the
consumer side, the loads prior to the smp_store_release() on the consumer
side should have been seen by the consumer?  You are already using
smp_load_acquire() so that should be satisified already because the
smp_load_acquire() makes sure that the smp_load_acquire()'s happens before
any future loads and stores.

> Basically, consumer caught up at
> about same time as producer commits new record, we need to make sure
> that:
>   - either consumer sees updated producer_pos > consumer_pos, and thus
> knows that there is more data to consumer (but producer might not send
> notification of new data in this case);
>   - or producer sees that consumer already caught up (i.e.,
> consumer_pos == producer_pos before currently committed record), and
> in such case will definitely send notifications.

Could you set a variable on the producer side to emulate a notification, and
check that in the conditions at the end?

thanks,

 - Joel

> 
> This is critical for correctness of epoll notifications.
> Unfortunately, litmus tests don't test this notification aspect, as I
> haven't originally figured out the invariant that can be defined to
> validate this. I'll give it another thought, though, maybe this time
> I'll come up with something.
> 
> > Regards,
> > Boqun
> >
> 
> [...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-25 17:21               ` Paul E. McKenney
  2020-05-25 17:45                 ` Paul E. McKenney
@ 2020-05-28 22:00                 ` Joel Fernandes
  2020-05-28 22:16                   ` Peter Zijlstra
  1 sibling, 1 reply; 36+ messages in thread
From: Joel Fernandes @ 2020-05-28 22:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Andrii Nakryiko, Alan Stern, parri.andrea, will,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	dlustig, linux-kernel, linux-arch, andrii.nakryiko

On Mon, May 25, 2020 at 10:21:54AM -0700, Paul E. McKenney wrote:
> On Mon, May 25, 2020 at 07:02:57PM +0200, Peter Zijlstra wrote:
> > On Mon, May 25, 2020 at 08:47:30AM -0700, Paul E. McKenney wrote:
> > > On Mon, May 25, 2020 at 01:25:21PM +0200, Peter Zijlstra wrote:
> > 
> > > > That is; how can you use a spinlock on the producer side at all?
> > > 
> > > So even trylock is now forbidden in NMI handlers?  If so, why?
> > 
> > The litmus tests don't have trylock.
> 
> Fair point.
> 
> > But you made me look at the actual patch:
> > 
> > +static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
> > +{
> > +	unsigned long cons_pos, prod_pos, new_prod_pos, flags;
> > +	u32 len, pg_off;
> > +	struct bpf_ringbuf_hdr *hdr;
> > +
> > +	if (unlikely(size > RINGBUF_MAX_RECORD_SZ))
> > +		return NULL;
> > +
> > +	len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
> > +	cons_pos = smp_load_acquire(&rb->consumer_pos);
> > +
> > +	if (in_nmi()) {
> > +		if (!spin_trylock_irqsave(&rb->spinlock, flags))
> > +			return NULL;
> > +	} else {
> > +		spin_lock_irqsave(&rb->spinlock, flags);
> > +	}
> > 
> > And that is of course utter crap. That's like saying you don't care
> > about your NMI data.
> 
> Almost.  It is really saying that -if- there is sufficient lock
> contention, printk()s will be lost.  Just as they always have been if
> there is more printk() volume than can be accommodated.

Any idea why this choice of locking-based ring buffer implementation in BPF?
The ftrace ring buffer can support NMI interruptions as well for writes.

Also, is it possible for BPF to reuse the ftrace ring buffer implementation
or does it not meet the requirements?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-28 22:00                 ` Joel Fernandes
@ 2020-05-28 22:16                   ` Peter Zijlstra
  2020-05-29  5:14                     ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2020-05-28 22:16 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Andrii Nakryiko, Alan Stern, parri.andrea,
	will, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget,
	akiyks, dlustig, linux-kernel, linux-arch, andrii.nakryiko

On Thu, May 28, 2020 at 06:00:47PM -0400, Joel Fernandes wrote:

> Any idea why this choice of locking-based ring buffer implementation in BPF?
> The ftrace ring buffer can support NMI interruptions as well for writes.
> 
> Also, is it possible for BPF to reuse the ftrace ring buffer implementation
> or does it not meet the requirements?

Both perf and ftrace are per-cpu, which, according to the patch
description is too much memory overhead for them. Neither have ever
considered anything else, atomic ops are expensive.

On top of that, they want multi-producer support. Yes, doing that gets
interesting really fast, but using spinlocks gets you a trainwreck like
this.

This thing so readily wanting to drop data on the floor should worry
people, but apparently they've not spend enough time debugging stuff
with partial logs yet. Of course, bpf_prog_active already makes BPF
lossy, so maybe they went with that.

All reasons why I never bother with BPF, aside from it being more
difficult than hacking up a kernel in the first place.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-28 21:48             ` Joel Fernandes
@ 2020-05-29  4:38               ` Andrii Nakryiko
  2020-05-29 17:23                 ` Joel Fernandes
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-29  4:38 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Boqun Feng, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, npiggin, dhowells, j.alglave,
	luc.maranget, Akira Yokosawa, dlustig, open list, linux-arch

On Thu, May 28, 2020 at 2:48 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Mon, May 25, 2020 at 11:38:23AM -0700, Andrii Nakryiko wrote:
> > On Mon, May 25, 2020 at 7:53 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > >
> > > Hi Andrii,
> > >
> > > On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> > > > On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > > > > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> > > > > > On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > > > > > > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > > > > > > Hello!
> > > > > > > >
> > > > > > > > Just wanted to call your attention to some pretty cool and pretty serious
> > > > > > > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > >
> > > > > > > I find:
> > > > > > >
> > > > > > >         smp_wmb()
> > > > > > >         smp_store_release()
> > > > > > >
> > > > > > > a _very_ weird construct. What is that supposed to even do?
> > > > > >
> > > > > > Indeed, it looks like one or the other of those is redundant (depending
> > > > > > on the context).
> > > > >
> > > > > Probably.  Peter instead asked what it was supposed to even do.  ;-)
> > > >
> > > > I agree, I think smp_wmb() is redundant here. Can't remember why I thought
> > > > that it's necessary, this algorithm went through a bunch of iterations,
> > > > starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some
> > > > point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe
> > > > there was some reason, but might be that I was just over-cautious. See reply
> > > > on patch thread as well ([0]).
> > > >
> > > >   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> > > >
> > >
> > > While we are at it, could you explain a bit on why you use
> > > smp_store_release() on consumer_pos? I ask because IIUC, consumer_pos is
> > > only updated at consumer side, and there is no other write at consumer
> > > side that we want to order with the write to consumer_pos. So I fail
> > > to find why smp_store_release() is necessary.
> > >
> > > I did the following modification on litmus tests, and I didn't see
> > > different results (on States) between two versions of litmus tests.
> > >
> >
> > This is needed to ensure that producer can reliably detect whether it
> > needs to trigger poll notification.
>
> Boqun's question is on the consumer side though. Are you saying that on the
> consumer side, the loads prior to the smp_store_release() on the consumer
> side should have been seen by the consumer?  You are already using
> smp_load_acquire() so that should be satisified already because the
> smp_load_acquire() makes sure that the smp_load_acquire()'s happens before
> any future loads and stores.

Consumer is reading two things: producer_pos and each record's length
header, and writes consumer_pos. I re-read this paragraph many times,
but I'm still a bit confused on what exactly you are trying to say.
Can you please specify in each case release()/acquire() of which
variable you are talking about?

>
> > Basically, consumer caught up at
> > about same time as producer commits new record, we need to make sure
> > that:
> >   - either consumer sees updated producer_pos > consumer_pos, and thus
> > knows that there is more data to consumer (but producer might not send
> > notification of new data in this case);
> >   - or producer sees that consumer already caught up (i.e.,
> > consumer_pos == producer_pos before currently committed record), and
> > in such case will definitely send notifications.
>
> Could you set a variable on the producer side to emulate a notification, and
> check that in the conditions at the end?

Setting notification flag is easy, but determining when it has to or
shouldn't happen is the hard part and depends on exact interleaving of
memory operations. I haven't figured out reliable way to express
this... If you have a good idea, please post a snippet of code.

>
> thanks,
>
>  - Joel
>
> >
> > This is critical for correctness of epoll notifications.
> > Unfortunately, litmus tests don't test this notification aspect, as I
> > haven't originally figured out the invariant that can be defined to
> > validate this. I'll give it another thought, though, maybe this time
> > I'll come up with something.
> >
> > > Regards,
> > > Boqun
> > >
> >
> > [...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-28 22:16                   ` Peter Zijlstra
@ 2020-05-29  5:14                     ` Andrii Nakryiko
  2020-05-29 12:36                       ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-29  5:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joel Fernandes, Paul E. McKenney, Andrii Nakryiko, Alan Stern,
	parri.andrea, will, Boqun Feng, npiggin, dhowells, j.alglave,
	luc.maranget, Akira Yokosawa, dlustig, open list, linux-arch

On Thu, May 28, 2020 at 3:17 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, May 28, 2020 at 06:00:47PM -0400, Joel Fernandes wrote:
>
> > Any idea why this choice of locking-based ring buffer implementation in BPF?
> > The ftrace ring buffer can support NMI interruptions as well for writes.
> >
> > Also, is it possible for BPF to reuse the ftrace ring buffer implementation
> > or does it not meet the requirements?
>
> Both perf and ftrace are per-cpu, which, according to the patch
> description is too much memory overhead for them. Neither have ever
> considered anything else, atomic ops are expensive.

Right, as mentioned in commit description to [0], desire to use shared
ring buffer across multiple CPUs to save memory and absorb bigger
spikes with overall lower memory use was one of main motivations.
Ordered events was another one. Both perf buffer and ftrace buffer use
strictly per-CPU buffers, which in practice makes a lot of developers
make hard and non-obvious choice between using too much memory or
losing too much events due to running out of space in a buffer.

  [0] https://patchwork.ozlabs.org/project/netdev/patch/20200526063255.1675186-2-andriin@fb.com/

>
> On top of that, they want multi-producer support. Yes, doing that gets
> interesting really fast, but using spinlocks gets you a trainwreck like
> this.
>
> This thing so readily wanting to drop data on the floor should worry

It's true that *in NMI context*, if spinlock is already taken,
reservation will fail, so under high contention there will be
potentially high number of drops. So for such cases perfbuf might be a
better approach and BPF applications will have to deal with higher
memory usage. In practice, though, most BPF programs are not running
in NMI context, so there won't be any drop due to spinlock usage.
Having both perfbuf and this new BPF ringbuf gives people choice and
more options to tailor to their needs.

There is another cluster of applications which are unnecessarily more
complicated just for the fact that there is no ordering between
correlated events that happen on different CPUs. Per-CPU buffers are
not well suited for such cases, unfortunately.

> people, but apparently they've not spend enough time debugging stuff
> with partial logs yet. Of course, bpf_prog_active already makes BPF
> lossy, so maybe they went with that.

Not sure which "partial logs" you mean, I'll assume dropped samples?
All production BPF applications are already designed to handle data
loss, because it could and does happen due to running out of buffer
space. Of course, amount of such drops is minimized however possible,
but applications still need to be able to handle that.

Now, though, there will be more options. Now it's not just a question
of whether to allocate a tiny 64KB per-CPU buffer on 80 core server
and use reasonable 5MB for perfbuf overall, but suffer high and
frequent data loss whenever a burst of incoming events happen. Or bump
it up to, say, 256KB (or higher) and use 20MB+ now, which most of the
time will be completely unused, but be able to absorb 4x more events.
Now it might be more than enough to just allocate a single shared 5MB
buffer and be able to absorb much higher spikes (of course, assuming
not all CPUs will be spiking at the same time, in which case nothing
can really help you much w.r.t. data loss).

So many BPF users are acutely aware of data loss and care a lot, but
there are other constraints that they have to take into account.

As for expensiveness of spinlock and atomics, a lot of applications we
are seeing do not require huge throughput that per-CPU data structures
provide, so such overhead is acceptable. Even under high contention,
BPF ringbuf performance is pretty reasonable and will satisfy a lot of
applications, see [1].

  [1] https://patchwork.ozlabs.org/project/netdev/patch/20200526063255.1675186-5-andriin@fb.com/

>
> All reasons why I never bother with BPF, aside from it being more
> difficult than hacking up a kernel in the first place.

It's not my goal to pitch BPF here, but for a lot of real-world use
cases, hacking kernel is not an option at all, for many reasons. One
of them is that kernel upgrades across huge fleet of servers take a
long time, which teams can't afford to wait. In such cases, BPF is a
perfect solution, which can't be beaten, as evidenced by a wide
variety of BPF applications solving real problems.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-29  5:14                     ` Andrii Nakryiko
@ 2020-05-29 12:36                       ` Peter Zijlstra
  2020-05-29 20:01                         ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2020-05-29 12:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joel Fernandes, Paul E. McKenney, Andrii Nakryiko, Alan Stern,
	parri.andrea, will, Boqun Feng, npiggin, dhowells, j.alglave,
	luc.maranget, Akira Yokosawa, dlustig, open list, linux-arch

On Thu, May 28, 2020 at 10:14:21PM -0700, Andrii Nakryiko wrote:

> There is another cluster of applications which are unnecessarily more
> complicated just for the fact that there is no ordering between
> correlated events that happen on different CPUs. Per-CPU buffers are
> not well suited for such cases, unfortunately.

And yet, I've been debugging concurrency issues with ftrace for well
over a decade.

In fact, adding a spinlock in the mix will make a certain class of
concurrency problems go-away! because you introduce artificial
serialization between CPUs.

Heck, even the ftrace buffer can sometimes make problems go way just
because of the overhead of tracing itself changes the timing.

It is perfectly possible to reconstruct order with per-cpu buffers in so
far as that there is order to begin with. Esp on modern CPUs that have
synchronized TSC.

Computers are not SC, pretending that events are is just a lie.

> > people, but apparently they've not spend enough time debugging stuff
> > with partial logs yet. Of course, bpf_prog_active already makes BPF
> > lossy, so maybe they went with that.
> 
> Not sure which "partial logs" you mean, I'll assume dropped samples?

Yep. Trying to reconstruct what actually happened from incomplete logs
is one of the most frustrating things in the world.

> All production BPF applications are already designed to handle data
> loss, because it could and does happen due to running out of buffer
> space. Of course, amount of such drops is minimized however possible,
> but applications still need to be able to handle that.

Running out of space is fixable and 'obvious'. Missing random events in
the middle is bloody infuriating. Even more so if you can't tell there's
gaps in the midle.

AFAICT, you don't even mark a reservation fail.... ah, you leave that to
the user :-( And it can't tell if it's spurious or space related.

Same with bpf_prog_active, it's silent 'random' data loss. You can
easily tell where a CPU buffer starts and stops, and thus if the events
are contained within, but not if there's random bits missing from the
middle.

> Now, though, there will be more options. Now it's not just a question
> of whether to allocate a tiny 64KB per-CPU buffer on 80 core server
> and use reasonable 5MB for perfbuf overall, but suffer high and
> frequent data loss whenever a burst of incoming events happen. Or bump
> it up to, say, 256KB (or higher) and use 20MB+ now, which most of the
> time will be completely unused, but be able to absorb 4x more events.
> Now it might be more than enough to just allocate a single shared 5MB
> buffer and be able to absorb much higher spikes (of course, assuming
> not all CPUs will be spiking at the same time, in which case nothing
> can really help you much w.r.t. data loss).

Muwhahaha, a single shared buffer with 80 CPUs! That's bloody murder on
performance.

> So many BPF users are acutely aware of data loss and care a lot, but
> there are other constraints that they have to take into account.
> 
> As for expensiveness of spinlock and atomics, a lot of applications we
> are seeing do not require huge throughput that per-CPU data structures
> provide, so such overhead is acceptable. Even under high contention,
> BPF ringbuf performance is pretty reasonable and will satisfy a lot of
> applications, see [1].

I've done benchmarks on contended atomic ops, and they go from ~20
cycles (uncontended, cache hot) to well over 10k cycles when you jump on
them with say a dozen cores across a few nodes (more numa, more
horrible).

>   [1] https://patchwork.ozlabs.org/project/netdev/patch/20200526063255.1675186-5-andriin@fb.com/

From that: "Ringbuf, multi-producer contention", right? I read that as:
'performance is bloody horrible if you add contention'.

I suppose most of your users have very low event rates, otherwise I
can't see that working.

> > All reasons why I never bother with BPF, aside from it being more
> > difficult than hacking up a kernel in the first place.
> 
> It's not my goal to pitch BPF here, but for a lot of real-world use
> cases, hacking kernel is not an option at all, for many reasons. One
> of them is that kernel upgrades across huge fleet of servers take a
> long time, which teams can't afford to wait. In such cases, BPF is a
> perfect solution, which can't be beaten, as evidenced by a wide
> variety of BPF applications solving real problems.

Yeah; lots of people use it because they really have nothing better for
their situation.

As long as people understand the constraints (and that's a *BIG* if) I
suppose it's usable.

It's just things I don't want to have to worry about.

Anyway, all that said, I like how you did the commits, I should look to
see if I can retro-fit the perf buffer to have some of that. Once
question though; why are you using xchg() for the commit? Isn't that
more expensive than it should be?

That is, why isn't that:

  smp_store_release(&hdr->len, new_len);

? Or are you needing the smp_mb() for the store->load ordering for the
->consumer_pos load? That really needs a comment.

I think you can get rid of the smp_load_acquire() there, you're ordering
a load->store and could rely on the branch to do that:

	cons_pos = READ_ONCE(&rb->consumer_pos) & rb->mask;
	if ((flags & BPF_RB_FORCE_WAKEUP) || (cons_pos == rec_pos && !(flags &BPF_RB_NO_WAKEUP))
		irq_work_queue(&rq->work);

should be a control dependency.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-29  4:38               ` Andrii Nakryiko
@ 2020-05-29 17:23                 ` Joel Fernandes
  2020-05-29 20:10                   ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Fernandes @ 2020-05-29 17:23 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Boqun Feng, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, npiggin, dhowells, j.alglave,
	luc.maranget, Akira Yokosawa, dlustig, open list, linux-arch

On Thu, May 28, 2020 at 09:38:35PM -0700, Andrii Nakryiko wrote:
> On Thu, May 28, 2020 at 2:48 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Mon, May 25, 2020 at 11:38:23AM -0700, Andrii Nakryiko wrote:
> > > On Mon, May 25, 2020 at 7:53 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > > >
> > > > Hi Andrii,
> > > >
> > > > On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> > > > > On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > > > > > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> > > > > > > On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > > > > > > > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > > > > > > > Hello!
> > > > > > > > >
> > > > > > > > > Just wanted to call your attention to some pretty cool and pretty serious
> > > > > > > > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > > > > > > >
> > > > > > > > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > I find:
> > > > > > > >
> > > > > > > >         smp_wmb()
> > > > > > > >         smp_store_release()
> > > > > > > >
> > > > > > > > a _very_ weird construct. What is that supposed to even do?
> > > > > > >
> > > > > > > Indeed, it looks like one or the other of those is redundant (depending
> > > > > > > on the context).
> > > > > >
> > > > > > Probably.  Peter instead asked what it was supposed to even do.  ;-)
> > > > >
> > > > > I agree, I think smp_wmb() is redundant here. Can't remember why I thought
> > > > > that it's necessary, this algorithm went through a bunch of iterations,
> > > > > starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some
> > > > > point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe
> > > > > there was some reason, but might be that I was just over-cautious. See reply
> > > > > on patch thread as well ([0]).
> > > > >
> > > > >   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> > > > >
> > > >
> > > > While we are at it, could you explain a bit on why you use
> > > > smp_store_release() on consumer_pos? I ask because IIUC, consumer_pos is
> > > > only updated at consumer side, and there is no other write at consumer
> > > > side that we want to order with the write to consumer_pos. So I fail
> > > > to find why smp_store_release() is necessary.
> > > >
> > > > I did the following modification on litmus tests, and I didn't see
> > > > different results (on States) between two versions of litmus tests.
> > > >
> > >
> > > This is needed to ensure that producer can reliably detect whether it
> > > needs to trigger poll notification.
> >
> > Boqun's question is on the consumer side though. Are you saying that on the
> > consumer side, the loads prior to the smp_store_release() on the consumer
> > side should have been seen by the consumer?  You are already using
> > smp_load_acquire() so that should be satisified already because the
> > smp_load_acquire() makes sure that the smp_load_acquire()'s happens before
> > any future loads and stores.
> 
> Consumer is reading two things: producer_pos and each record's length
> header, and writes consumer_pos. I re-read this paragraph many times,
> but I'm still a bit confused on what exactly you are trying to say.

This is what I was saying in the other thread. I think you missed that
comment. If you are adding litmus documentation, at least it should be clear
what memory ordering is being verified. Both me and Boqun tried to remove a
memory barrier each and the test still passes. So what exactly are you
verifying from a memory consistency standpoint? I know you have those various
rFail things and conditions - but I am assuming the goal here is to verify
memory consistency as well. Or are we just throwing enough memory barriers at
the problem to make sure the test passes, without understanding exactly what
ordering is needed?

> Can you please specify in each case release()/acquire() of which
> variable you are talking about?

I don't want to speculate and confuse the thread more. I am afraid the burden
of specifying what the various release/acquire orders is on the author of the
code introducing the memory barriers ;-). That is, IMHO you should probably add
code comments in the test about why a certain memory barrier is needed.

That said, I need to do more diligence and read the actual BPF ring buffer
code to understand what you're modeling. I will try to make time to do that.

thanks!

 - Joel


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-29 12:36                       ` Peter Zijlstra
@ 2020-05-29 20:01                         ` Andrii Nakryiko
  2020-05-29 20:53                           ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-29 20:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joel Fernandes, Paul E. McKenney, Andrii Nakryiko, Alan Stern,
	parri.andrea, will, Boqun Feng, npiggin, dhowells, j.alglave,
	luc.maranget, Akira Yokosawa, dlustig, open list, linux-arch

On Fri, May 29, 2020 at 5:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, May 28, 2020 at 10:14:21PM -0700, Andrii Nakryiko wrote:
>
> > There is another cluster of applications which are unnecessarily more
> > complicated just for the fact that there is no ordering between
> > correlated events that happen on different CPUs. Per-CPU buffers are
> > not well suited for such cases, unfortunately.
>
> And yet, I've been debugging concurrency issues with ftrace for well
> over a decade.
>
> In fact, adding a spinlock in the mix will make a certain class of
> concurrency problems go-away! because you introduce artificial
> serialization between CPUs.
>
> Heck, even the ftrace buffer can sometimes make problems go way just
> because of the overhead of tracing itself changes the timing.
>

All true, but seems like you are looking at this from purely tracing
perspective, especially with very high-frequency events. And that's a
very challenging domain for sure. But *a lot* of BPF applications
(actually all I'm aware of at Facebook and outside of it) trace and
collect much less high-frequency events, so they don't need to
sacrifice so much to get ultimate performance. Performance profiling,
which comes closest to what you are describing, is just one of many
uses.

> It is perfectly possible to reconstruct order with per-cpu buffers in so
> far as that there is order to begin with. Esp on modern CPUs that have
> synchronized TSC.
>
> Computers are not SC, pretending that events are is just a lie.

Right, with precise enough clock or some atomic in-kernel counter, you
can re-construct order in user-space. But that has its own downsides:
you need to send this counter over the wire with every sample (takes
more space, reducing amounf of "useful" payload you can fit in a ring
buffer), plus user-space re-sorting takes engineering effort and isn't
exactly trivial. Now multiply that by many teams who need this, and it
becomes a problem worth solving.

Few examples of events that do occur sequentially in an ordered
manner, but due to per-cpu buffers might come back to user-space out
of order: fork/exec/exit events and tcp connection state tracking.
E.g., for short-lived process, fork can happen on one CPU, exit - on
another within tiny period of time between each other. In such case,
user-space might get exit event from one buffer before getting a fork
event on another one. Similarly for TCP connection state transitions.
For fork/exec/exit, typical rate of events, even on 80-core machines
is on the order of thousands of events per second at most (typically,
spikes do happen, unfortunately), so MPSC queue contention isn't a big
deal.

>
> > > people, but apparently they've not spend enough time debugging stuff
> > > with partial logs yet. Of course, bpf_prog_active already makes BPF
> > > lossy, so maybe they went with that.
> >
> > Not sure which "partial logs" you mean, I'll assume dropped samples?
>
> Yep. Trying to reconstruct what actually happened from incomplete logs
> is one of the most frustrating things in the world.
>
> > All production BPF applications are already designed to handle data
> > loss, because it could and does happen due to running out of buffer
> > space. Of course, amount of such drops is minimized however possible,
> > but applications still need to be able to handle that.
>
> Running out of space is fixable and 'obvious'. Missing random events in
> the middle is bloody infuriating. Even more so if you can't tell there's
> gaps in the midle.
>
> AFAICT, you don't even mark a reservation fail.... ah, you leave that to
> the user :-( And it can't tell if it's spurious or space related.

BPF program cannot ignore reservation failure, it has to check that
reserve() returned non-NULL pointer, verifier enforces that. It's true
that out-of-space vs locking failed in NMI is indistinguishable. We'll
see how important it is to distinguish in practice, there are very few
applications that do run in NMI context.

As for internal accounting of dropped samples, I've considered that
and there is 4 bytes per-sample I can use to communicate it back to
user-space, but it requires another atomic increment, which would just
reduce performance further. Modern BPF applications actually have a
very simple and straightforward ways to count that themselves with use
of global variables, memory-mapped into user-space. So it's simple and
fast to do. But again, we'll see in practice how that works for users
and will adjust/enhance as necessary.

>
> Same with bpf_prog_active, it's silent 'random' data loss. You can
> easily tell where a CPU buffer starts and stops, and thus if the events
> are contained within, but not if there's random bits missing from the
> middle.
>
> > Now, though, there will be more options. Now it's not just a question
> > of whether to allocate a tiny 64KB per-CPU buffer on 80 core server
> > and use reasonable 5MB for perfbuf overall, but suffer high and
> > frequent data loss whenever a burst of incoming events happen. Or bump
> > it up to, say, 256KB (or higher) and use 20MB+ now, which most of the
> > time will be completely unused, but be able to absorb 4x more events.
> > Now it might be more than enough to just allocate a single shared 5MB
> > buffer and be able to absorb much higher spikes (of course, assuming
> > not all CPUs will be spiking at the same time, in which case nothing
> > can really help you much w.r.t. data loss).
>
> Muwhahaha, a single shared buffer with 80 CPUs! That's bloody murder on
> performance.

Loved the laugh :) But see above, a lot of interesting events are
pretty low-frequency even on those giant servers.

>
> > So many BPF users are acutely aware of data loss and care a lot, but
> > there are other constraints that they have to take into account.
> >
> > As for expensiveness of spinlock and atomics, a lot of applications we
> > are seeing do not require huge throughput that per-CPU data structures
> > provide, so such overhead is acceptable. Even under high contention,
> > BPF ringbuf performance is pretty reasonable and will satisfy a lot of
> > applications, see [1].
>
> I've done benchmarks on contended atomic ops, and they go from ~20
> cycles (uncontended, cache hot) to well over 10k cycles when you jump on
> them with say a dozen cores across a few nodes (more numa, more
> horrible).
>
> >   [1] https://patchwork.ozlabs.org/project/netdev/patch/20200526063255.1675186-5-andriin@fb.com/
>
> From that: "Ringbuf, multi-producer contention", right? I read that as:
> 'performance is bloody horrible if you add contention'.
>
> I suppose most of your users have very low event rates, otherwise I
> can't see that working.

Yes and yes :) Good thing is that with MPSC ringbuf, if contention is
an issue, they can go with a model of 1 ringbuf for each cpu, similar
to perfbuf, or something in between with few ringbufs for larger
number of CPUs, again trading performance for memory, as necessary.
This is easy to do in BPF by combining ringbuf map with ARRAY_OF_MAPS
or HASH_OF_MAPS.

>
> > > All reasons why I never bother with BPF, aside from it being more
> > > difficult than hacking up a kernel in the first place.
> >
> > It's not my goal to pitch BPF here, but for a lot of real-world use
> > cases, hacking kernel is not an option at all, for many reasons. One
> > of them is that kernel upgrades across huge fleet of servers take a
> > long time, which teams can't afford to wait. In such cases, BPF is a
> > perfect solution, which can't be beaten, as evidenced by a wide
> > variety of BPF applications solving real problems.
>
> Yeah; lots of people use it because they really have nothing better for
> their situation.
>
> As long as people understand the constraints (and that's a *BIG* if) I
> suppose it's usable.
>
> It's just things I don't want to have to worry about.
>
> Anyway, all that said, I like how you did the commits, I should look to
> see if I can retro-fit the perf buffer to have some of that. Once

Thanks, and yeah, it would be actually great to have this
reserve/commit API for perfbuf as well. Internally it actually does
that, AFAIU, but all of that is enclosed in
preempt_disable/preempt_enable region, not sure how practical and easy
it is to split this up and let BPF verifier enforce that each
reservation is followed by commit. Having this kind of API would allow
to eliminate some unfortunate code patterns with using extra per-CPU
array just to prepare a record before sending it over perfbuf with
perf_event_output().

> question though; why are you using xchg() for the commit? Isn't that
> more expensive than it should be?
>
> That is, why isn't that:
>
>   smp_store_release(&hdr->len, new_len);
>
> ? Or are you needing the smp_mb() for the store->load ordering for the
> ->consumer_pos load? That really needs a comment.

Yeah, smp_store_release() is not strong enough, this memory barrier is
necessary. And yeah, I'll follow up with some more comments, that's
been what Joel requested as well.

>
> I think you can get rid of the smp_load_acquire() there, you're ordering
> a load->store and could rely on the branch to do that:
>
>         cons_pos = READ_ONCE(&rb->consumer_pos) & rb->mask;
>         if ((flags & BPF_RB_FORCE_WAKEUP) || (cons_pos == rec_pos && !(flags &BPF_RB_NO_WAKEUP))
>                 irq_work_queue(&rq->work);
>
> should be a control dependency.

Could be. I tried to keep consistent
smp_load_acquire/smp_store_release usage to keep it simpler. It might
not be the absolutely minimal amount of ordering that would still be
correct. We might be able to tweak and tune this without changing
correctness.

Anyways, thanks for taking a look and feedback. I hope some of my
examples explain why this might be a fine approach for a lot of
applications, even if not the most performant one.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-29 17:23                 ` Joel Fernandes
@ 2020-05-29 20:10                   ` Andrii Nakryiko
  0 siblings, 0 replies; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-29 20:10 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Boqun Feng, Andrii Nakryiko, Paul E . McKenney, Alan Stern,
	Peter Zijlstra, parri.andrea, will, npiggin, dhowells, j.alglave,
	luc.maranget, Akira Yokosawa, dlustig, open list, linux-arch

On Fri, May 29, 2020 at 10:23 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, May 28, 2020 at 09:38:35PM -0700, Andrii Nakryiko wrote:
> > On Thu, May 28, 2020 at 2:48 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > On Mon, May 25, 2020 at 11:38:23AM -0700, Andrii Nakryiko wrote:
> > > > On Mon, May 25, 2020 at 7:53 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > > > >
> > > > > Hi Andrii,
> > > > >
> > > > > On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> > > > > > On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > > > > > > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> > > > > > > > On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > > > > > > > > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > > > > > > > > Hello!
> > > > > > > > > >
> > > > > > > > > > Just wanted to call your attention to some pretty cool and pretty serious
> > > > > > > > > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > > > > > > > >
> > > > > > > > > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@fb.com/
> > > > > > > > > >
> > > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > I find:
> > > > > > > > >
> > > > > > > > >         smp_wmb()
> > > > > > > > >         smp_store_release()
> > > > > > > > >
> > > > > > > > > a _very_ weird construct. What is that supposed to even do?
> > > > > > > >
> > > > > > > > Indeed, it looks like one or the other of those is redundant (depending
> > > > > > > > on the context).
> > > > > > >
> > > > > > > Probably.  Peter instead asked what it was supposed to even do.  ;-)
> > > > > >
> > > > > > I agree, I think smp_wmb() is redundant here. Can't remember why I thought
> > > > > > that it's necessary, this algorithm went through a bunch of iterations,
> > > > > > starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some
> > > > > > point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe
> > > > > > there was some reason, but might be that I was just over-cautious. See reply
> > > > > > on patch thread as well ([0]).
> > > > > >
> > > > > >   [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@mail.gmail.com/
> > > > > >
> > > > >
> > > > > While we are at it, could you explain a bit on why you use
> > > > > smp_store_release() on consumer_pos? I ask because IIUC, consumer_pos is
> > > > > only updated at consumer side, and there is no other write at consumer
> > > > > side that we want to order with the write to consumer_pos. So I fail
> > > > > to find why smp_store_release() is necessary.
> > > > >
> > > > > I did the following modification on litmus tests, and I didn't see
> > > > > different results (on States) between two versions of litmus tests.
> > > > >
> > > >
> > > > This is needed to ensure that producer can reliably detect whether it
> > > > needs to trigger poll notification.
> > >
> > > Boqun's question is on the consumer side though. Are you saying that on the
> > > consumer side, the loads prior to the smp_store_release() on the consumer
> > > side should have been seen by the consumer?  You are already using
> > > smp_load_acquire() so that should be satisified already because the
> > > smp_load_acquire() makes sure that the smp_load_acquire()'s happens before
> > > any future loads and stores.
> >
> > Consumer is reading two things: producer_pos and each record's length
> > header, and writes consumer_pos. I re-read this paragraph many times,
> > but I'm still a bit confused on what exactly you are trying to say.
>
> This is what I was saying in the other thread. I think you missed that
> comment. If you are adding litmus documentation, at least it should be clear
> what memory ordering is being verified. Both me and Boqun tried to remove a
> memory barrier each and the test still passes. So what exactly are you
> verifying from a memory consistency standpoint? I know you have those various
> rFail things and conditions - but I am assuming the goal here is to verify
> memory consistency as well. Or are we just throwing enough memory barriers at
> the problem to make sure the test passes, without understanding exactly what
> ordering is needed?

High-level goal was to verify that producers and consumer don't see
intermediate states they are not supposed to and overall the flow of
records is correct. It wasn't an explicit goal for me to find the
absolute minimal/weakest memory ordering that make this work. I did my
best to write invariants in such a way as to capture violations, but
I'm sure it won't catch 100% of possible problems unfortunately. E.g.,
if busy bit (len = -1 part) ordering is buggy, I didn't find a perfect
way to differentiate between consumer being stuck because record is
"busy" or because consumer (which is in no way serialized with
producers) "ran sooner" and just didn't see the record being committed
yet. But on the other hand, it did capture few subtle issues, which
made writing these litmus tests worthwhile nevertheless :)

I'm sure litmus tests can be improved and expanded, but I tried to
strike a balance between practicality and perfection.

>
> > Can you please specify in each case release()/acquire() of which
> > variable you are talking about?
>
> I don't want to speculate and confuse the thread more. I am afraid the burden
> of specifying what the various release/acquire orders is on the author of the
> code introducing the memory barriers ;-). That is, IMHO you should probably add
> code comments in the test about why a certain memory barrier is needed.

Sure, I'll follow up with more comments clarifying this. I was
genuinely trying to understand all those ordering implications you
were trying to describe, it's a tricky business, unfortunately.

>
> That said, I need to do more diligence and read the actual BPF ring buffer
> code to understand what you're modeling. I will try to make time to do that.

Great, thanks!

>
> thanks!
>
>  - Joel
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Some -serious- BPF-related litmus tests
  2020-05-29 20:01                         ` Andrii Nakryiko
@ 2020-05-29 20:53                           ` Peter Zijlstra
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2020-05-29 20:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joel Fernandes, Paul E. McKenney, Andrii Nakryiko, Alan Stern,
	parri.andrea, will, Boqun Feng, npiggin, dhowells, j.alglave,
	luc.maranget, Akira Yokosawa, dlustig, open list, linux-arch

On Fri, May 29, 2020 at 01:01:51PM -0700, Andrii Nakryiko wrote:

> > question though; why are you using xchg() for the commit? Isn't that
> > more expensive than it should be?
> >
> > That is, why isn't that:
> >
> >   smp_store_release(&hdr->len, new_len);
> >
> > ? Or are you needing the smp_mb() for the store->load ordering for the
> > ->consumer_pos load? That really needs a comment.
> 
> Yeah, smp_store_release() is not strong enough, this memory barrier is
> necessary. And yeah, I'll follow up with some more comments, that's
> been what Joel requested as well.

Ok, great.

> > I think you can get rid of the smp_load_acquire() there, you're ordering
> > a load->store and could rely on the branch to do that:
> >
> >         cons_pos = READ_ONCE(&rb->consumer_pos) & rb->mask;
> >         if ((flags & BPF_RB_FORCE_WAKEUP) || (cons_pos == rec_pos && !(flags &BPF_RB_NO_WAKEUP))
> >                 irq_work_queue(&rq->work);
> >
> > should be a control dependency.
> 
> Could be. I tried to keep consistent
> smp_load_acquire/smp_store_release usage to keep it simpler. It might
> not be the absolutely minimal amount of ordering that would still be
> correct. We might be able to tweak and tune this without changing
> correctness.

We can even rely on the irq_work_queue() being an atomic, but sure, get
it all working and correct first before you wreck it ;-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-05-29 20:53 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-22  0:38 Some -serious- BPF-related litmus tests Paul E. McKenney
2020-05-22  9:44 ` Peter Zijlstra
2020-05-22 10:56   ` Paul E. McKenney
2020-05-22 14:36     ` Alan Stern
2020-05-22 17:45       ` Paul E. McKenney
2020-05-22 14:32   ` Alan Stern
2020-05-22 17:43     ` Paul E. McKenney
2020-05-22 19:38       ` Andrii Nakryiko
2020-05-24 12:09         ` Akira Yokosawa
2020-05-25 18:31           ` Andrii Nakryiko
2020-05-25 22:01             ` Akira Yokosawa
2020-05-25 23:31               ` Andrii Nakryiko
2020-05-26 10:50                 ` Akira Yokosawa
2020-05-26 14:02                   ` Akira Yokosawa
2020-05-26 20:19                     ` Andrii Nakryiko
2020-05-26 23:00                       ` Akira Yokosawa
2020-05-27  0:09                         ` Andrii Nakryiko
2020-05-26 20:15                   ` Andrii Nakryiko
2020-05-26 22:23                     ` Akira Yokosawa
2020-05-25 11:25         ` Peter Zijlstra
2020-05-25 15:47           ` Paul E. McKenney
2020-05-25 17:02             ` Peter Zijlstra
2020-05-25 17:21               ` Paul E. McKenney
2020-05-25 17:45                 ` Paul E. McKenney
2020-05-28 22:00                 ` Joel Fernandes
2020-05-28 22:16                   ` Peter Zijlstra
2020-05-29  5:14                     ` Andrii Nakryiko
2020-05-29 12:36                       ` Peter Zijlstra
2020-05-29 20:01                         ` Andrii Nakryiko
2020-05-29 20:53                           ` Peter Zijlstra
2020-05-25 14:53         ` Boqun Feng
2020-05-25 18:38           ` Andrii Nakryiko
2020-05-28 21:48             ` Joel Fernandes
2020-05-29  4:38               ` Andrii Nakryiko
2020-05-29 17:23                 ` Joel Fernandes
2020-05-29 20:10                   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).