All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Björn Töpel" <bjorn.topel@gmail.com>
To: bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>
Cc: stern@rowland.harvard.edu, parri.andrea@gmail.com,
	"Will Deacon" <will@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	boqun.feng@gmail.com, npiggin@gmail.com, dhowells@redhat.com,
	j.alglave@ucl.ac.uk, luc.maranget@inria.fr, paulmck@kernel.org,
	akiyks@gmail.com, dlustig@nvidia.com, joel@joelfernandes.org,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>
Subject: XDP socket rings, and LKMM litmus tests
Date: Tue, 2 Mar 2021 19:46:27 +0100	[thread overview]
Message-ID: <CAJ+HfNhxWFeKnn1aZw-YJmzpBuCaoeGkXXKn058GhY-6ZBDtZA@mail.gmail.com> (raw)

Hi!

Firstly; The long Cc-list is to reach the LKMM-folks.

Some background; the XDP sockets use a ring-buffer to communicate
between the kernel and userland. It's a
single-consumer/single-producer ring, and described in
net/xdp/xsk_queue.h.

--8<---
/* The structure of the shared state of the rings are the same as the
 * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
 * ring, the kernel is the producer and user space is the consumer. For
 * the Tx and fill rings, the kernel is the consumer and user space is
 * the producer.
 *
 * producer                         consumer
 *
 * if (LOAD ->consumer) {           LOAD ->producer
 *                    (A)           smp_rmb()       (C)
 *    STORE $data                   LOAD $data
 *    smp_wmb()       (B)           smp_mb()        (D)
 *    STORE ->producer              STORE ->consumer
 * }
 *
 * (A) pairs with (D), and (B) pairs with (C).
...
-->8---

I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
semantics [1], without breaking existing userspace applications.

So, I figured I'd use herd7 and the LKMM model to build a litmus test
for the barrier version, then for the acquire-release version, and
finally permutations of both.

The idea is to use a one element ring, with a state machine outlined
in the litmus test.

The basic test for the existing smp_{r,w,}mb() barriers looks like:

$ cat spsc-rb+1p1c.litmus
C spsc-rb+1p1c

// Stupid one entry ring:
// prod cons     allowed action       prod cons
//    0    0 =>       prod          =>   1    0
//    0    1 =>       cons          =>   0    0
//    1    0 =>       cons          =>   1    1
//    1    1 =>       prod          =>   0    1

{ prod = 1; }

// Here, we start at prod==1,cons==0, data==0, i.e. producer has
// written data=0, so from here only the consumer can start, and should
// consume data==0. Afterwards, producer can continue and write 1 to
// data. Can we enter state prod==0, cons==1, but consumer observerd
// the write of 1?

P0(int *prod, int *cons, int *data)
{
    int p;
    int c;
    int cond = 0;

    p = READ_ONCE(*prod);
    c = READ_ONCE(*cons);
    if (p == 0)
        if (c == 0)
            cond = 1;
    if (p == 1)
        if (c == 1)
            cond = 1;

    if (cond) {
        smp_mb();
        WRITE_ONCE(*data, 1);
        smp_wmb();
        WRITE_ONCE(*prod, p ^ 1);
    }
}

P1(int *prod, int *cons, int *data)
{
    int p;
    int c;
    int d = -1;
    int cond = 0;

    p = READ_ONCE(*prod);
    c = READ_ONCE(*cons);
    if (p == 1)
        if (c == 0)
            cond = 1;
    if (p == 0)
        if (c == 1)
            cond = 1;

    if (cond == 1) {
        smp_rmb();
        d = READ_ONCE(*data);
        smp_mb();
        WRITE_ONCE(*cons, c ^ 1);
    }
}

exists( 1:d=1 /\ prod=0 /\ cons=1 );

--

The weird state changing if-statements is because that I didn't get
'&&' and '||' to work with herd.

When this is run:

$ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
Test spsc-rb+1p1c Allowed
States 2
1:d=0; cons=1; prod=0;
1:d=0; cons=1; prod=1;
No
Witnesses
Positive: 0 Negative: 2
Condition exists (1:d=1 /\ prod=0 /\ cons=1)
Observation spsc-rb+1p1c Never 0 2
Time spsc-rb+1p1c 0.04
Hash=b399756d6a1301ca5bda042f32130791

Now to my question; In P0 there's an smp_mb(). Without that, the d==1
can be observed from P1 (consumer):

$ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
Test spsc-rb+1p1c Allowed
States 3
1:d=0; cons=1; prod=0;
1:d=0; cons=1; prod=1;
1:d=1; cons=1; prod=0;
Ok
Witnesses
Positive: 1 Negative: 2
Condition exists (1:d=1 /\ prod=0 /\ cons=1)
Observation spsc-rb+1p1c Sometimes 1 2
Time spsc-rb+1p1c 0.04
Hash=0047fc21fa77da9a9aee15e35ec367ef

In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
on control dependencies") removes the corresponding smp_mb(), and also
the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
("documentation: Update circular buffer for
load-acquire/store-release")) is missing the smp_mb() at the
producer-side.

I'm trying to wrap my head around why it's OK to remove the smp_mb()
in the cases above? I'm worried that the current XDP socket ring
implementation (which is missing smp_mb()) might be broken.


If you read this far, thanks! :-)
Björn


[1] https://lore.kernel.org/bpf/20210301104318.263262-2-bjorn.topel@gmail.com/

             reply	other threads:[~2021-03-02 21:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-02 18:46 Björn Töpel [this message]
2021-03-02 19:57 ` XDP socket rings, and LKMM litmus tests Paul E. McKenney
2021-03-02 20:04   ` Paul E. McKenney
2021-03-02 20:37     ` Björn Töpel
2021-03-02 20:24   ` Björn Töpel
2021-03-02 20:41     ` Paul E. McKenney
2021-03-02 20:51       ` Björn Töpel
2021-03-02 21:14 ` Alan Stern
2021-03-02 23:50   ` Paul E. McKenney
2021-03-03  6:37     ` maranget
2021-03-03 16:54       ` Paul E. McKenney
2021-03-03 17:12     ` Alan Stern
2021-03-03 17:37       ` maranget
2021-03-03 17:39         ` maranget
2021-03-03 21:56           ` Paul E. McKenney
2021-03-03 19:40         ` Alan Stern
2021-03-03 17:40       ` Paul E. McKenney
2021-03-03 20:22         ` Alan Stern
2021-03-03 22:03           ` Paul E. McKenney
2021-03-04  3:21             ` Alan Stern
2021-03-04  5:04               ` Paul E. McKenney
2021-03-04 15:35                 ` Alan Stern
2021-03-04 19:05                   ` Paul E. McKenney
2021-03-04 21:27                     ` Alan Stern
2021-03-04 22:05                       ` Paul E. McKenney
2021-03-04  1:26           ` Boqun Feng
2021-03-04  3:13             ` Alan Stern
2021-03-04  6:33               ` Boqun Feng
2021-03-04 16:11                 ` Alan Stern
2021-03-05  1:12                   ` Boqun Feng
2021-03-05 16:15                     ` Alan Stern
2021-03-04 15:44           ` maranget
2021-03-04 19:07             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ+HfNhxWFeKnn1aZw-YJmzpBuCaoeGkXXKn058GhY-6ZBDtZA@mail.gmail.com \
    --to=bjorn.topel@gmail.com \
    --cc=akiyks@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=dhowells@redhat.com \
    --cc=dlustig@nvidia.com \
    --cc=j.alglave@ucl.ac.uk \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luc.maranget@inria.fr \
    --cc=magnus.karlsson@intel.com \
    --cc=npiggin@gmail.com \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=toke@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.