Re: Can the Kernel Concurrency Sanitizer Own Rust Code?

From: Wedson Almeida Filho <wedsonaf@google.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>,
	Gary Guo <gary@garyguo.net>, Marco Elver <elver@google.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	kasan-dev <kasan-dev@googlegroups.com>,
	rust-for-linux <rust-for-linux@vger.kernel.org>
Subject: Re: Can the Kernel Concurrency Sanitizer Own Rust Code?
Date: Wed, 13 Oct 2021 18:50:24 +0100	[thread overview]
Message-ID: <YWccYPLUOH7t9JtB@google.com> (raw)
In-Reply-To: <20211013160707.GR880162@paulmck-ThinkPad-P17-Gen-1>

On Wed, Oct 13, 2021 at 09:07:07AM -0700, Paul E. McKenney wrote:
> On Wed, Oct 13, 2021 at 01:48:13PM +0200, Miguel Ojeda wrote:
> > On Mon, Oct 11, 2021 at 9:01 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > The main issue I was calling out was not justifying Rust, but rather
> > > making sure that the exact same build could be reproduced a decade later.
> > 
> > Yes, but that is quite trivial compared to other issues I was
> > mentioning like adapting and requalifying a testing tool. For
> > instance, if you already had a team maintaining the configuration
> > management (i.e. the versions etc.), adding one more tool is not a big
> > deal.
> 
> OK, close enough to fair enough.  ;-)
> 
> > > There are things that concurrent software would like to do that are
> > > made quite inconvenient due to large numbers of existing optimizations
> > > in the various compiler backends.  Yes, we have workarounds.  But I
> > > do not see how Rust is going to help with these inconveniences.
> > 
> > Sure, but C UB is unrelated to Rust UB. Thus, if you think it would be
> > valuable to be able to express particular algorithms in unsafe Rust,
> > then I would contact the Rust teams to let them know your needs --
> > perhaps we end up with something way better than C for that use case!
> 
> Sequence locks and RCU do seem to be posing some challenges.  I suppose
> this should not be too much of a surprise, given that there are people who
> have been in the Rust community for a long time who do understand both.
> If it were easy, they would have already come up with a solution.

(Hey Paul, I tried posting on your blog series, but I'm having difficulty so I
thought I'd reply here given that we mention seqlocks and RCU here.)

I spent a bit of time thinking about sequence locks and I think I have something
that is workable. (I remind you that we use the C implementation for the
synchronisation primitives). Suppose we had some struct like so:

struct X {
    a: AtomicU32,
    b: AtomicU32,
}

And suppose we have it protected by a sequence lock. If we wanted to return the
sum of the two fields, the code would look like this:

    let v = y.access(|x| {
        let a = x.a.load(Ordering::Relaxed);
	let b = x.b.load(Ordering::Relaxed);
	a + b
    });

It would be expanded to the following machine code in aarch64 (when LTO is
enabled):

  403fd4:       14000002        b       403fdc
  403fd8:       d503203f        yield
  403fdc:       b9400808        ldr     w8, [x0, #8]
  403fe0:       3707ffc8        tbnz    w8, #0, 403fd8
  403fe4:       d50339bf        dmb     ishld
  403fe8:       b9400c09        ldr     w9, [x0, #12]
  403fec:       b940100a        ldr     w10, [x0, #16]
  403ff0:       d50339bf        dmb     ishld
  403ff4:       b940080b        ldr     w11, [x0, #8]
  403ff8:       6b08017f        cmp     w11, w8
  403ffc:       54ffff01        b.ne    403fdc
  404000:       0b090148        add     w8, w10, w9

It is as efficient as the C version, though not as ergonomic. The
.load(Ordering::Relaxed) can of course be improved to something shorter like
.load_relaxed() or even new atomic types  with .load() being relaxed and
.load_ordered(Ordering) for other ordering.

I also have guard- and iterator-based methods for the read path that would look
like this (these can all co-exist if we so choose):

    let v = loop {
        let guard = y.read();
        let a = guard.a.load(Ordering::Relaxed);
        let b = guard.b.load(Ordering::Relaxed);
        if !guard.need_retry() {
            break a + b;
        }
    };

and

    let mut v = 0;
    for x in y {
        let a = x.a.load(Ordering::Relaxed);
	let b = x.b.load(Ordering::Relaxed);
	v = a + b;
    }

The former generates the exact same machine code as above though the latter
generates slightly worse code (it has instructions sequences like "mov w10,
#0x1; tbnz w10, #0, 403ffc" and , "mov w10, wzr; tbnz w10, #0, 403ffc", which
could be optimised but for some reason isn't).

Anyway, on to the write path. We need another primitive to ensure that only one
writer at a time attempts to acquire the sequence lock in write mode. We do this
by taking a guard for this other lock, for example, suppose we want to increment
each of the fields:

    let other_guard = other_lock.lock();
    let guard = y.write(&other_guard);
    guard.a.store(guard.a.load(Ordering::Relaxed) + 1, Ordering::Relaxed);
    guard.b.store(guard.b.load(Ordering::Relaxed) + 1, Ordering::Relaxed);

The part the relates to the sequence lock is compiled to the following:

  404058:       f9400009        ldr     x9, [x0]
  40405c:       eb08013f        cmp     x9, x8
  404060:       54000281        b.ne    4040b0

  404064:       b9400808        ldr     w8, [x0, #8]
  404068:       11000508        add     w8, w8, #0x1
  40406c:       b9000808        str     w8, [x0, #8]
  404070:       d5033abf        dmb     ishst
  404074:       b9400c08        ldr     w8, [x0, #12]
  404078:       11000508        add     w8, w8, #0x1
  40407c:       b9000c08        str     w8, [x0, #12]
  404080:       b9401008        ldr     w8, [x0, #16]
  404084:       11000508        add     w8, w8, #0x1
  404088:       b9001008        str     w8, [x0, #16]
  40408c:       d5033abf        dmb     ishst
  404090:       b9400808        ldr     w8, [x0, #8]
  404094:       11000508        add     w8, w8, #0x1
  404098:       b9000808        str     w8, [x0, #8]

If we ignore the first three instructions momentarily, the rest is as efficient
as C. The reason we need the first three instructions is to ensure that guard
that was passed into the `write` function is a guard to the correct lock. The
lock type already eliminates the vast majority of issues, but a developer could
accidentally lock the wrong lock and use it in the sequence lock, which would be
problematic. So we need this check in Rust that we don't need in C (although the
same mistake could happen in C).

We can provide an 'unsafe' version that doesn't perform this check, then the
onus is on the callers to convince themselves that they have acquired the
correct lock (and they'd be required to use an unsafe block). Then the
performance would be the same as the C version.

Now that I've presented how my proposal looks like from the PoV of a user,
here's its rationale: given that we only want one copy of the data and that
mutable references are always unique in the safe fragment of Rust, we can't (and
don't) return a mutable reference to what's protected by the sequence lock, we
always only allow shared access, even when the sequence lock is acquired in
write mode.

Then how does one change the fields? Interior mutability. In the examples above,
the fields are all atomic, so they can be changed with the `store` method. Any
type that provides interior mutability is suitable here.

If we need to use types with interior mutability, what's the point of the
sequence lock? The point is to allow a consistent view of the fields. In our
example, even though `a` and `b` are atomic, the sequence lock guarantees that
readers will get a consistent view of the values even though writers modify one
at a time.

Lastly, the fact we use a generic `Guard` as proof that a lock is held (for the
write path) means that we don't need to manually implement this for each
different lock we care about; any that implements the `Lock` trait can be used.
This is unlike the C code that uses fragile macros to generate code for
different types of locks (though the scenario is slightly different in that the
C code embeds a lock, which is also something we could do in Rust) -- the Rust
version uses generics, so it is type-checked by the compiler.

RCU pointers can be implemented with a similar technique in that read access is
protected by a 'global' RCU reader lock (and evidence of it being locked is
required to get read access), and writers require another lock to be held. The
only piece that I haven't thought through yet is how to ensure that pointers
that were exposed with RCU 'protection' cannot be freed before the grace period
has elapsed. But this is a discussion for another time.

I'll send out the patches for what I describe above in the next couple of days.

Does any of the above help answer the questions you have about seqlocks in Rust?

Thanks,
-Wedson

> So the trick is to stage things so as to allow people time to work on
> these sorts of issues.
> 
> > In any case, Rust does not necessarily need to help there. What is
> > important is whether Rust helps writing the majority of the kernel
> > code. If we need to call into C or use inline assembly for certain
> > bits -- so be it.
> > 
> > > But to be fair, much again depends on exactly where Rust is to be applied
> > > in the kernel.  If a given Linux-kernel feature is not used where Rust
> > > needs to be applied, then there is no need to solve the corresponding
> > > issues.
> > 
> > Exactly.
> 
> Thank you for bearing with me.
> 
> I will respond to your other email later,.  but the focus on memory
> safety in particular instead of undefined behavior in general does help
> me quite a bit.
> 
> My next step is to create a "TL;DR: Memory-Model Recommendations" post
> that is more specific, with both short-term ("do what is easy") and
> long-term suggestions.
> 
> 							Thanx, Paul