linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	carlos <carlos@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	paulmck <paulmck@linux.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
	linux-api <linux-api@vger.kernel.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	Neel Natu <neelnatu@google.com>
Subject: Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID
Date: Wed, 8 Jul 2020 13:34:48 -0400 (EDT)	[thread overview]
Message-ID: <1565638541.5051.1594229688015.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200708162247.txdleelcalxkrfjy@wittgenstein>

----- On Jul 8, 2020, at 12:22 PM, Christian Brauner christian.brauner@ubuntu.com wrote:
[...]
> I've been following this a little bit. The kernel version itself doesn't
> really mean anything and the kernel version is imho not at all
> interesting to userspace applications. Especially for cross-distro
> programs. We can't go around and ask Red Hat, SUSE, Ubuntu, Archlinux,
> openSUSE and god knows who what other distro what their fixed kernel
> version is. That's not feasible at all and not how must programs do it.
> Sure, a lot of programs name a minimal kernel version they require but
> realistically we can't keep bumping it all the time. So the best
> strategy for userspace imho has been to introduce a re-versioned flag or
> enum that indicates the fixed behavior.
> 
> So I would suggest to just introduce
> RSEQ_FLAG_REGISTER_2                      = (1 << 2),
> that's how these things are usually done (Netlink etc.). So not
> introducing a fix bit or whatever but simply reversion your flag/enum.
> We already deal with this today.

Because rseq is effectively a per-thread resource shared across application
and libraries, it is not practical to merge the notion of version with the
registration. Typically __rseq_abi is registered by libc, and can be used
by the application and by many libraries. Early adopter libraries and
applications (e.g. librseq, tcmalloc) can also choose to handle registration
if it's not already done by libc.

For instance, it is acceptable for glibc to register rseq for all threads,
even in the presence of the cpu_id field inaccuracy, for use by the
sched_getcpu(3) implementation. However, users of rseq which need to
implement critical sections performing per-cpu data updates may want
to know whether the cpu_id field is reliable to ensure they do not crash
the process due to per-cpu data corruption.

This led me to consider exposing a feature-specific flag which can be
queried by specific users to know whether rseq has specific set of correct
behaviors implemented.

> (Also, as a side-note. I see that you're passing struct rseq *rseq with
> a length argument but you are not versioning by size. Is that
> intentional? That basically somewhat locks you to the current struct
> rseq layout and means users might run into problems when you extend
> struct rseq in the future as they can't pass the new struct down to
> older kernels. The way we deal with this is now - rseq might preceed
> this - is copy_struct_from_user() (for example in sched_{get,set}attr(),
> openat2(), bpf(), clone3(), etc.). Maybe you want to switch to that to
> keep rseq extensible? Users can detect the new rseq version by just
> passing a larger struct down to the kernel with the extra bytes set to 0
> and if rseq doesn't complain they know they're dealing with an rseq that
> knows larger struct sizes. Might be worth it if you have any reason to
> belive that struct rseq might need to grow.)

In the initial iterations of the rseq patch set, we initially had the rseq_len
argument hoping we would eventually be able to extend struct rseq. However,
it was finally decided against making it extensible, so the rseq ABI merged
into the Linux kernel with a fixed-size.

One of the key reasons for this is explained in
commit 83b0b15bcb0f ("rseq: Remove superfluous rseq_len from task_struct")

    The rseq system call, when invoked with flags of "0" or
    "RSEQ_FLAG_UNREGISTER" values, expects the rseq_len parameter to
    be equal to sizeof(struct rseq), which is fixed-size and fixed-layout,
    specified in uapi linux/rseq.h.
    
    Expecting a fixed size for rseq_len is a design choice that ensures
    multiple libraries and application defining __rseq_abi in the same
    process agree on its exact size.

The issue here is caused by the fact that the __rseq_abi variable is
shared across application/libraries for a given thread. So it's not
enough to agree between kernel and user-space on the extensibility
scheme, but we'd also have to find a way for all users within a process
to agree on the layout.

The "rseq_len" parameter could eventually become useful when combined
with additional flags, but currently its size is fixed.

There are indeed clear use-cases for extending struct rseq (or add a new
similar TLS structure), for instance exposing the current numa node id,
or to implement a fast signal-disabling scheme. But the fact that struct rseq
is shared across application/libraries makes it tricky to "just extend" struct
rseq.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2020-07-08 17:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-06 20:49 [RFC PATCH for 5.8 0/4] rseq cpu_id ABI fix Mathieu Desnoyers
2020-07-06 20:49 ` [RFC PATCH for 5.8 1/4] sched: Fix unreliable rseq cpu_id for new tasks Mathieu Desnoyers
2020-07-07  7:30   ` Florian Weimer
2020-07-07 10:51     ` Mathieu Desnoyers
2020-07-06 20:49 ` [RFC PATCH for 5.8 2/4] rseq: Introduce RSEQ_FLAG_REGISTER Mathieu Desnoyers
2020-07-06 20:49 ` [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID Mathieu Desnoyers
2020-07-07  7:29   ` Florian Weimer
2020-07-07 10:48     ` Mathieu Desnoyers
2020-07-07 11:32       ` Florian Weimer
2020-07-07 12:06         ` Mathieu Desnoyers
2020-07-07 18:53           ` Carlos O'Donell
2020-07-07 18:59             ` Mathieu Desnoyers
2020-07-08  8:31               ` Florian Weimer
2020-07-07 19:55             ` Florian Weimer
2020-07-08 15:33               ` Mathieu Desnoyers
2020-07-08 16:22                 ` Christian Brauner
2020-07-08 16:36                   ` Florian Weimer
2020-07-08 17:34                   ` Mathieu Desnoyers [this message]
2020-07-09 12:49                     ` Christian Brauner
2020-07-09 15:15                       ` Mathieu Desnoyers
2020-07-11 15:54                         ` Christian Brauner
2020-07-13 18:40                           ` Mathieu Desnoyers
2020-07-06 20:49 ` [RFC PATCH for 5.8 4/4] rseq: selftests: Expect reliable cpu_id field Mathieu Desnoyers
2020-07-07  6:26 ` [RFC PATCH for 5.8 0/4] rseq cpu_id ABI fix Florian Weimer
2020-07-07 14:54 ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1565638541.5051.1594229688015.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=boqun.feng@gmail.com \
    --cc=carlos@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=dvyukov@google.com \
    --cc=fw@deneb.enyo.de \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neelnatu@google.com \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).