linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Kennelly <ckennelly@google.com>
To: Peter Oskolkov <posk@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Oskolkov <posk@posk.io>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	paulmck <paulmck@linux.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
	linux-api <linux-api@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Florian Weimer <fw@deneb.enyo.de>, carlos <carlos@redhat.com>
Subject: Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq
Date: Tue, 14 Jul 2020 22:34:38 -0400	[thread overview]
Message-ID: <CAEE+ybmt4BredezuTPdh-vf=FkKtu0yAhWuf+0daUe89AnbmPg@mail.gmail.com> (raw)
In-Reply-To: <CAPNVh5fiCCJpyeLj_ciWzFrO4fasVXZNhpfKXJhJWJirXdJOjQ@mail.gmail.com>

On Tue, Jul 14, 2020 at 2:33 PM Peter Oskolkov <posk@google.com> wrote:
>
> On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> >
> > ----- On Jul 14, 2020, at 1:24 PM, Peter Oskolkov posk@posk.io wrote:
> >
> > > At Google, we actually extended struct rseq (I will post the patches
> > > here once they are fully deployed and we have specific
> > > benefits/improvements to report). We did this by adding several fields
> > > below __u32 flags (the last field currently), and correspondingly
> > > increasing rseq_len in rseq() syscall. If the kernel does not know of
> > > this extension, it will return -EINVAL due to an unexpected rseq_len;
> > > then the application can either fall-back to the standard/upstream
> > > rseq, or bail. If the kernel does know of this extension, it accepts
> > > it. If the application passes the old rseq_len (32), the kernel knows
> > > that this is an old application and treats it as such.
> > >
> > > I looked through the archives, but I did not find specifically why the
> > > pretty standard approach described above is considered inferior to the
> > > one taken in this patch (freeze rseq_len at 32, add additional length
> > > fields to struct rseq). Can these be summarized?
> >
> > I think you don't face the issues I'm facing with libc rseq integration
> > because you control the entire user-space software ecosystem at Google.
> >
> > The main issue we face is that the library responsible for registering
> > rseq (either glibc 2.32+, an early-adopter librseq library, or the
> > application) may very well not be the same library defining the __rseq_abi
> > symbol used in the global symbol table. Interposition with ld preload or
> > by defining the __rseq_abi in the program's executable are good examples
> > of this kind of scenario, and those use-cases are supported.

Does this work if/when we run out of bytes in the current sizeof(__rseq_abi)?

Which library provides the TLS symbol (and N bytes of storage) seems
sensitive to the choices the linker makes for us, once the symbol
sizes diverge.

> > So the size of the __rseq_abi structure may be larger than the struct
> > rseq known by glibc (and eventually smaller, if future glibc versions
> > extend their __rseq_abi size but is loaded with an older program/library
> > doing __rseq_abi interposition).

When glibc provides registration, is the anticipated use case that a
library would unregister and reregister each thread to "upgrade" it to
the most modern version of interface it knows about provided by the
kernel?

> > So we need some way to allow code defining the __rseq_abi to let the kernel
> > know how much room is available, without necessarily requiring the code
> > responsible for rseq registration to be aware of that extended layout.
> > This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field
> > __rseq_abi.user_size.
> >
> > And we need some way to allow the kernel to let user-space rseq critical
> > sections (user code) know how much of those fields are actually populated
> > by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE
> > with __rseq_abi.kernel_size.

I authored the userspace component
(https://github.com/google/tcmalloc/commit/ad136d45f75a273b934446699cef8b278c34ec6e)
that consumes the extensions Peter mentions and found that minimizing
the performance impact of their potential absence was a bit of a
challenge.

There, I could assume an all-or-nothing registration of the new
feature--limited only by kernel availability for thread
homogeneity--but inconsistencies across early adopter libraries would
mean each thread would have to examine its own TLS to determine if a
feature were available.

Chris

  reply	other threads:[~2020-07-15  2:34 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-14  3:03 [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Mathieu Desnoyers
2020-07-14  3:03 ` [RFC PATCH 1/4] selftests: rseq: Use fixed value as rseq_len parameter Mathieu Desnoyers
2020-07-14  3:03 ` [RFC PATCH 2/4] rseq: Allow extending struct rseq Mathieu Desnoyers
2020-07-14  9:58   ` Florian Weimer
2020-07-14 12:50     ` Mathieu Desnoyers
2020-07-14 13:00       ` Florian Weimer
2020-07-14 13:19         ` Mathieu Desnoyers
2020-07-14 21:30           ` Carlos O'Donell
2020-07-15 13:12             ` Mathieu Desnoyers
2020-07-15 13:22               ` Florian Weimer
2020-07-15 13:31                 ` Mathieu Desnoyers
2020-07-15 13:42                   ` Florian Weimer
2020-07-15 13:55                     ` Christian Brauner
2020-07-15 14:20                       ` Mathieu Desnoyers
2020-07-15 14:54                     ` Mathieu Desnoyers
2020-07-15 14:58                       ` Florian Weimer
2020-07-15 15:26                         ` Mathieu Desnoyers
2020-07-14 17:24   ` Peter Oskolkov
2020-07-14 17:43     ` Mathieu Desnoyers
2020-07-14 18:33       ` Peter Oskolkov
2020-07-15  2:34         ` Chris Kennelly [this message]
2020-07-15  6:31           ` Florian Weimer
2020-07-15 10:59             ` Christian Brauner
2020-07-15 14:38             ` Mathieu Desnoyers
2020-07-15 14:50           ` Mathieu Desnoyers
2020-07-15 11:38   ` Christian Brauner
2020-07-15 12:33     ` Christian Brauner
2020-07-15 15:10       ` Mathieu Desnoyers
2020-07-15 15:33         ` Christian Brauner
2020-07-14  3:03 ` [RFC PATCH 3/4] selftests: rseq: define __rseq_abi with extensible size Mathieu Desnoyers
2020-07-14  3:03 ` [RFC PATCH 4/4] selftests: rseq: print rseq extensible size in basic test Mathieu Desnoyers
2020-07-14 20:55 ` [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Carlos O'Donell
2020-07-15 13:02   ` Mathieu Desnoyers
2020-07-16 13:39     ` Carlos O'Donell
2020-07-16 14:45       ` Mathieu Desnoyers
2020-07-15 15:12   ` Florian Weimer
2020-07-15 15:32     ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEE+ybmt4BredezuTPdh-vf=FkKtu0yAhWuf+0daUe89AnbmPg@mail.gmail.com' \
    --to=ckennelly@google.com \
    --cc=boqun.feng@gmail.com \
    --cc=carlos@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fw@deneb.enyo.de \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=posk@google.com \
    --cc=posk@posk.io \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).