linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: "Björn Töpel" <bjorn.topel@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	stern@rowland.harvard.edu, parri.andrea@gmail.com,
	"Will Deacon" <will@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	boqun.feng@gmail.com, npiggin@gmail.com, dhowells@redhat.com,
	j.alglave@ucl.ac.uk, luc.maranget@inria.fr, akiyks@gmail.com,
	dlustig@nvidia.com, joel@joelfernandes.org,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>
Subject: Re: XDP socket rings, and LKMM litmus tests
Date: Tue, 2 Mar 2021 12:04:41 -0800	[thread overview]
Message-ID: <20210302200441.GA3729@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20210302195758.GQ2696@paulmck-ThinkPad-P72>

On Tue, Mar 02, 2021 at 11:57:58AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
> > Hi!
> > 
> > Firstly; The long Cc-list is to reach the LKMM-folks.
> > 
> > Some background; the XDP sockets use a ring-buffer to communicate
> > between the kernel and userland. It's a
> > single-consumer/single-producer ring, and described in
> > net/xdp/xsk_queue.h.
> > 
> > --8<---
> > /* The structure of the shared state of the rings are the same as the
> >  * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> >  * ring, the kernel is the producer and user space is the consumer. For
> >  * the Tx and fill rings, the kernel is the consumer and user space is
> >  * the producer.
> >  *
> >  * producer                         consumer
> >  *
> >  * if (LOAD ->consumer) {           LOAD ->producer
> >  *                    (A)           smp_rmb()       (C)
> >  *    STORE $data                   LOAD $data
> >  *    smp_wmb()       (B)           smp_mb()        (D)
> >  *    STORE ->producer              STORE ->consumer
> >  * }
> >  *
> >  * (A) pairs with (D), and (B) pairs with (C).
> > ...
> > -->8---
> > 
> > I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
> > semantics [1], without breaking existing userspace applications.
> > 
> > So, I figured I'd use herd7 and the LKMM model to build a litmus test
> > for the barrier version, then for the acquire-release version, and
> > finally permutations of both.
> > 
> > The idea is to use a one element ring, with a state machine outlined
> > in the litmus test.
> > 
> > The basic test for the existing smp_{r,w,}mb() barriers looks like:
> > 
> > $ cat spsc-rb+1p1c.litmus
> > C spsc-rb+1p1c
> > 
> > // Stupid one entry ring:
> > // prod cons     allowed action       prod cons
> > //    0    0 =>       prod          =>   1    0
> > //    0    1 =>       cons          =>   0    0
> > //    1    0 =>       cons          =>   1    1
> > //    1    1 =>       prod          =>   0    1
> > 
> > { prod = 1; }
> > 
> > // Here, we start at prod==1,cons==0, data==0, i.e. producer has
> > // written data=0, so from here only the consumer can start, and should
> > // consume data==0. Afterwards, producer can continue and write 1 to
> > // data. Can we enter state prod==0, cons==1, but consumer observerd
> > // the write of 1?
> > 
> > P0(int *prod, int *cons, int *data)
> > {
> >     int p;
> >     int c;
> >     int cond = 0;
> > 
> >     p = READ_ONCE(*prod);
> >     c = READ_ONCE(*cons);
> >     if (p == 0)
> >         if (c == 0)
> >             cond = 1;
> >     if (p == 1)
> >         if (c == 1)
> >             cond = 1;
> > 
> >     if (cond) {
> >         smp_mb();
> >         WRITE_ONCE(*data, 1);
> >         smp_wmb();
> >         WRITE_ONCE(*prod, p ^ 1);
> >     }
> > }
> > 
> > P1(int *prod, int *cons, int *data)
> > {
> >     int p;
> >     int c;
> >     int d = -1;
> >     int cond = 0;
> > 
> >     p = READ_ONCE(*prod);
> >     c = READ_ONCE(*cons);
> >     if (p == 1)
> >         if (c == 0)
> >             cond = 1;
> >     if (p == 0)
> >         if (c == 1)
> >             cond = 1;
> > 
> >     if (cond == 1) {
> >         smp_rmb();
> >         d = READ_ONCE(*data);
> >         smp_mb();
> >         WRITE_ONCE(*cons, c ^ 1);
> >     }
> > }
> 
> Before digging in too deeply, does the following simplification
> still capture your intent?
> 
> P0(int *prod, int *cons, int *data)
> {
>     int p;
>     int cond = 0;
> 
>     p = READ_ONCE(*prod);
>     if (p == READ_ONCE(*cons))
>             cond = 1;
>     if (cond) {
> 	smp_mb();
>         WRITE_ONCE(*data, 1);
>         smp_wmb();
>         WRITE_ONCE(*prod, p ^ 1);
>     }
> }
> 
> P1(int *prod, int *cons, int *data)
> {
>     int c;
>     int d = -1;
>     int cond = 0;
> 
>     c = READ_ONCE(*cons);
>     if (READ_ONCE(*prod) == c)
>             cond = 1;
> 
>     if (cond == 1) {
>         smp_rmb();
>         d = READ_ONCE(*data);
>         smp_mb();
>         WRITE_ONCE(*cons, c ^ 1);
>     }
> }

And if the answer is "yes", how about this one?

P0(int *prod, int *cons, int *data)
{
    int p;

    p = READ_ONCE(*prod);
    if (p == READ_ONCE(*cons)) {
        WRITE_ONCE(*data, 1);
        smp_wmb();
        WRITE_ONCE(*prod, p ^ 1);
    }
}

P1(int *prod, int *cons, int *data)
{
    int c;
    int d = -1;

    c = READ_ONCE(*cons);
    if (READ_ONCE(*prod) == c) {
        smp_rmb();
        d = READ_ONCE(*data);
        smp_mb();
        WRITE_ONCE(*cons, c ^ 1);
    }
}

							Thanx, Paul

> > exists( 1:d=1 /\ prod=0 /\ cons=1 );
> > 
> > --
> > 
> > The weird state changing if-statements is because that I didn't get
> > '&&' and '||' to work with herd.
> > 
> > When this is run:
> > 
> > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> > Test spsc-rb+1p1c Allowed
> > States 2
> > 1:d=0; cons=1; prod=0;
> > 1:d=0; cons=1; prod=1;
> > No
> > Witnesses
> > Positive: 0 Negative: 2
> > Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> > Observation spsc-rb+1p1c Never 0 2
> > Time spsc-rb+1p1c 0.04
> > Hash=b399756d6a1301ca5bda042f32130791
> > 
> > Now to my question; In P0 there's an smp_mb(). Without that, the d==1
> > can be observed from P1 (consumer):
> > 
> > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> > Test spsc-rb+1p1c Allowed
> > States 3
> > 1:d=0; cons=1; prod=0;
> > 1:d=0; cons=1; prod=1;
> > 1:d=1; cons=1; prod=0;
> > Ok
> > Witnesses
> > Positive: 1 Negative: 2
> > Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> > Observation spsc-rb+1p1c Sometimes 1 2
> > Time spsc-rb+1p1c 0.04
> > Hash=0047fc21fa77da9a9aee15e35ec367ef
> > 
> > In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
> > on control dependencies") removes the corresponding smp_mb(), and also
> > the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
> > ("documentation: Update circular buffer for
> > load-acquire/store-release")) is missing the smp_mb() at the
> > producer-side.
> > 
> > I'm trying to wrap my head around why it's OK to remove the smp_mb()
> > in the cases above? I'm worried that the current XDP socket ring
> > implementation (which is missing smp_mb()) might be broken.
> > 
> > 
> > If you read this far, thanks! :-)
> > Björn
> > 
> > 
> > [1] https://lore.kernel.org/bpf/20210301104318.263262-2-bjorn.topel@gmail.com/

  reply	other threads:[~2021-03-02 22:37 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-02 18:46 XDP socket rings, and LKMM litmus tests Björn Töpel
2021-03-02 19:57 ` Paul E. McKenney
2021-03-02 20:04   ` Paul E. McKenney [this message]
2021-03-02 20:37     ` Björn Töpel
2021-03-02 20:24   ` Björn Töpel
2021-03-02 20:41     ` Paul E. McKenney
2021-03-02 20:51       ` Björn Töpel
2021-03-02 21:14 ` Alan Stern
2021-03-02 23:50   ` Paul E. McKenney
2021-03-03  6:37     ` maranget
2021-03-03 16:54       ` Paul E. McKenney
2021-03-03 17:12     ` Alan Stern
2021-03-03 17:37       ` maranget
2021-03-03 17:39         ` maranget
2021-03-03 21:56           ` Paul E. McKenney
2021-03-03 19:40         ` Alan Stern
2021-03-03 17:40       ` Paul E. McKenney
2021-03-03 20:22         ` Alan Stern
2021-03-03 22:03           ` Paul E. McKenney
2021-03-04  3:21             ` Alan Stern
2021-03-04  5:04               ` Paul E. McKenney
2021-03-04 15:35                 ` Alan Stern
2021-03-04 19:05                   ` Paul E. McKenney
2021-03-04 21:27                     ` Alan Stern
2021-03-04 22:05                       ` Paul E. McKenney
2021-03-04  1:26           ` Boqun Feng
2021-03-04  3:13             ` Alan Stern
2021-03-04  6:33               ` Boqun Feng
2021-03-04 16:11                 ` Alan Stern
2021-03-05  1:12                   ` Boqun Feng
2021-03-05 16:15                     ` Alan Stern
2021-03-04 15:44           ` maranget
2021-03-04 19:07             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210302200441.GA3729@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=akiyks@gmail.com \
    --cc=bjorn.topel@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=dhowells@redhat.com \
    --cc=dlustig@nvidia.com \
    --cc=j.alglave@ucl.ac.uk \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luc.maranget@inria.fr \
    --cc=magnus.karlsson@intel.com \
    --cc=npiggin@gmail.com \
    --cc=parri.andrea@gmail.com \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=toke@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).