All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Pavel Machek <pavel@ucw.cz>
Cc: carlos <carlos@redhat.com>, Florian Weimer <fweimer@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Dave Watson <davejwatson@fb.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-api <linux-api@vger.kernel.org>,
	Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Russell King <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Hunter <ahh@google.com>, Andi Kleen <andi@firstfloor.org>,
	Chris Lameter <cl@linux.com>, Ben Maurer <bmaurer@fb.com>,
	rostedt <rostedt@goodmis.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Joel Fernandes <joelaf@google.com>
Subject: Re: [RFC PATCH for 4.18 00/16] Restartable Sequences
Date: Mon, 30 Jul 2018 15:34:18 -0400 (EDT)	[thread overview]
Message-ID: <1259134501.7268.1532979258894.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20180730190715.GA29926@amd>

----- On Jul 30, 2018, at 3:07 PM, Pavel Machek pavel@ucw.cz wrote:

> Hi!
> 
>> > Thanks for pointer.
>> > 
>> > +Restartable sequences are atomic with respect to preemption (making
>> > it
>> > +atomic with respect to other threads running on the same CPU), as
>> > well
>> > +as signal delivery (user-space execution contexts nested over the
>> > same
>> > +thread).
>> > 
>> > So the threads are protected against sigkill when running the
>> > restartable sequence?
>> 
>> In that scenario, SIGKILL _will_ be delivered, hence execution of the
>> rseq critical section will never reach the commit instruction. This
>> follows the guarantee provided that the rseq c.s. either executes
>> completely "atomically" wrt preemption/signal delivery, *or* gets
>> aborted. In this case, sigkill will reap the entire process, so
> 
> The text above does not mention abort -- so I was just making
> sure. Maybe mentioning it would be good idea?

How about this ?

Restartable sequences are atomic with respect to preemption (making it
atomic with respect to other threads running on the same CPU), as well
as signal delivery (user-space execution contexts nested over the same
thread). They either complete atomically with respect to preemption on
the current CPU and signal delivery, or they are aborted.

[...]

> 
>> > +Optimistic cache of the CPU number on which the current thread is
>> > +running. Its value is guaranteed to always be a possible CPU number,
>> > +even when rseq is not initialized. The value it contains should
>> > always
>> > +be confirmed by reading the cpu_id field.
>> > 
>> > I'm not sure what "optimistic cache" is...
>> 
>> Perhaps we can find a better wording.
>> 
>> It's "optimistic" in the sense that it's always guaranteed to hold a
>> valid CPU number within the range [ 0 .. nr_possible_cpus - 1 ]. It can
>> therefore be loaded by user-space and then used as an offset, without
>> having to check whether it is within valid bounds compared to the number
>> of possible CPUs in the system.
>> 
>> This works even if the kernel on which the application runs on does not
>> support rseq at all: the __rseq_abi->cpu_id_start field stays initialized at
>> 0, which is indeed a valid CPU number. It's therefore valid to use it as an
>> offset in per-cpu data structures, and only validate whether it's actually the
>> current CPU number by comparing it with the __rseq_abi->cpu_id field
>> within the rseq critical section. If rseq is not available in the kernel,
>> that cpu_id field stays initialized at -1, so the comparison always fails,
>> as intended.
>> 
>> It's then up to user-space to use a fall-back mechanism, considering that
>> rseq is not available.
>> 
>> Advice on improved wording would be welcome.
> 
> Ok, that makes sense, but I'd not understand it from the man
> page. Perhaps your text should be put there?

How about this ?

.TP
.in +4n
.I cpu_id_start
Optimistic cache of the CPU number on which the current thread is
running. Its value is guaranteed to always be a possible CPU number,
even when rseq is not initialized. The value it contains should always
be confirmed by reading the cpu_id field.

This field is an optimistic cache in the sense that it is always
guaranteed to hold a valid CPU number in the range [ 0 ..
nr_possible_cpus - 1 ]. It can therefore be loaded by user-space and
used as an offset in per-cpu data structures without having to
check whether its value is within the valid bounds compared to the
number of possible CPUs in the system.

For user-space applications executed on a kernel without rseq support,
the cpu_id_start field stays initialized at 0, which is indeed a valid
CPU number. It is therefore valid to use it as an offset in per-cpu data
structures, and only validate whether it's actually the current CPU
number by comparing it with the cpu_id field within the rseq critical
section. If the kernel does not provide rseq support, that cpu_id field
stays initialized at -1, so the comparison always fails, as intended.

It is then up to user-space to use a fall-back mechanism, considering
that rseq is not available.

[...]

> 
>> > (Will not
>> > this need to be bigger on machines with bigger cache sizes?)
>> > 
>> > above it says:
>> > 
>> > +.B Structure size
>> > +This structure is extensible. Its size is passed as parameter to the
>> > +rseq system call.
>> > 
>> > I'm reading source, so maybe it refers to different structure.
>> 
>> It can be aligned on a larger multiple. This requirement of 32 bytes
>> is a minimum. Therefore, if we ever extend struct rseq, or if an
>> architecture shows benefit from aligning struct rseq on larger boundaries,
>> it is free to do so. It will still respect the requirement of alignment on
>> 32 bytes boundaries.
> 
> Well, elsewhere it said "This structure has a fixed size of 32 bytes."
> Now it says structure size is passed with every syscalls. Now I'm
> confused (but maybe that's caused by reading source, not formatted
> document).

This is the layout for struct rseq_cs version 0.

The variable-sized structure is struct rseq.

struct rseq is typically in a TLS, and contains a "rseq_cs" field
which is a pointer to the struct rseq_cs descriptor describing the
currently active rseq critical section.

Hoping this clears up the confusion.

Thanks for the review!

Mathieu


> 
> Best regards,
>									Pavel
> 
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

WARNING: multiple messages have this Message-ID (diff)
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Pavel Machek <pavel@ucw.cz>
Cc: carlos <carlos@redhat.com>, Florian Weimer <fweimer@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Dave Watson <davejwatson@fb.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-api <linux-api@vger.kernel.org>,
	Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Russell King <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Hunter <ahh@google.com>, Andi Kleen <andi@firstfloor.org>,
	Chris Lameter <cl@linux.com>, Ben Maurer <bmaurer@fb.com>,
	rostedt <rostedt@goodmis.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Linu
Subject: Re: [RFC PATCH for 4.18 00/16] Restartable Sequences
Date: Mon, 30 Jul 2018 15:34:18 -0400 (EDT)	[thread overview]
Message-ID: <1259134501.7268.1532979258894.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20180730190715.GA29926@amd>

----- On Jul 30, 2018, at 3:07 PM, Pavel Machek pavel@ucw.cz wrote:

> Hi!
> 
>> > Thanks for pointer.
>> > 
>> > +Restartable sequences are atomic with respect to preemption (making
>> > it
>> > +atomic with respect to other threads running on the same CPU), as
>> > well
>> > +as signal delivery (user-space execution contexts nested over the
>> > same
>> > +thread).
>> > 
>> > So the threads are protected against sigkill when running the
>> > restartable sequence?
>> 
>> In that scenario, SIGKILL _will_ be delivered, hence execution of the
>> rseq critical section will never reach the commit instruction. This
>> follows the guarantee provided that the rseq c.s. either executes
>> completely "atomically" wrt preemption/signal delivery, *or* gets
>> aborted. In this case, sigkill will reap the entire process, so
> 
> The text above does not mention abort -- so I was just making
> sure. Maybe mentioning it would be good idea?

How about this ?

Restartable sequences are atomic with respect to preemption (making it
atomic with respect to other threads running on the same CPU), as well
as signal delivery (user-space execution contexts nested over the same
thread). They either complete atomically with respect to preemption on
the current CPU and signal delivery, or they are aborted.

[...]

> 
>> > +Optimistic cache of the CPU number on which the current thread is
>> > +running. Its value is guaranteed to always be a possible CPU number,
>> > +even when rseq is not initialized. The value it contains should
>> > always
>> > +be confirmed by reading the cpu_id field.
>> > 
>> > I'm not sure what "optimistic cache" is...
>> 
>> Perhaps we can find a better wording.
>> 
>> It's "optimistic" in the sense that it's always guaranteed to hold a
>> valid CPU number within the range [ 0 .. nr_possible_cpus - 1 ]. It can
>> therefore be loaded by user-space and then used as an offset, without
>> having to check whether it is within valid bounds compared to the number
>> of possible CPUs in the system.
>> 
>> This works even if the kernel on which the application runs on does not
>> support rseq at all: the __rseq_abi->cpu_id_start field stays initialized at
>> 0, which is indeed a valid CPU number. It's therefore valid to use it as an
>> offset in per-cpu data structures, and only validate whether it's actually the
>> current CPU number by comparing it with the __rseq_abi->cpu_id field
>> within the rseq critical section. If rseq is not available in the kernel,
>> that cpu_id field stays initialized at -1, so the comparison always fails,
>> as intended.
>> 
>> It's then up to user-space to use a fall-back mechanism, considering that
>> rseq is not available.
>> 
>> Advice on improved wording would be welcome.
> 
> Ok, that makes sense, but I'd not understand it from the man
> page. Perhaps your text should be put there?

How about this ?

.TP
.in +4n
.I cpu_id_start
Optimistic cache of the CPU number on which the current thread is
running. Its value is guaranteed to always be a possible CPU number,
even when rseq is not initialized. The value it contains should always
be confirmed by reading the cpu_id field.

This field is an optimistic cache in the sense that it is always
guaranteed to hold a valid CPU number in the range [ 0 ..
nr_possible_cpus - 1 ]. It can therefore be loaded by user-space and
used as an offset in per-cpu data structures without having to
check whether its value is within the valid bounds compared to the
number of possible CPUs in the system.

For user-space applications executed on a kernel without rseq support,
the cpu_id_start field stays initialized at 0, which is indeed a valid
CPU number. It is therefore valid to use it as an offset in per-cpu data
structures, and only validate whether it's actually the current CPU
number by comparing it with the cpu_id field within the rseq critical
section. If the kernel does not provide rseq support, that cpu_id field
stays initialized at -1, so the comparison always fails, as intended.

It is then up to user-space to use a fall-back mechanism, considering
that rseq is not available.

[...]

> 
>> > (Will not
>> > this need to be bigger on machines with bigger cache sizes?)
>> > 
>> > above it says:
>> > 
>> > +.B Structure size
>> > +This structure is extensible. Its size is passed as parameter to the
>> > +rseq system call.
>> > 
>> > I'm reading source, so maybe it refers to different structure.
>> 
>> It can be aligned on a larger multiple. This requirement of 32 bytes
>> is a minimum. Therefore, if we ever extend struct rseq, or if an
>> architecture shows benefit from aligning struct rseq on larger boundaries,
>> it is free to do so. It will still respect the requirement of alignment on
>> 32 bytes boundaries.
> 
> Well, elsewhere it said "This structure has a fixed size of 32 bytes."
> Now it says structure size is passed with every syscalls. Now I'm
> confused (but maybe that's caused by reading source, not formatted
> document).

This is the layout for struct rseq_cs version 0.

The variable-sized structure is struct rseq.

struct rseq is typically in a TLS, and contains a "rseq_cs" field
which is a pointer to the struct rseq_cs descriptor describing the
currently active rseq critical section.

Hoping this clears up the confusion.

Thanks for the review!

Mathieu


> 
> Best regards,
>									Pavel
> 
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2018-07-30 19:34 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-02 12:43 [RFC PATCH for 4.18 00/16] Restartable Sequences Mathieu Desnoyers
2018-06-02 12:43 ` [RFC PATCH for 4.18 01/16] uapi headers: Provide types_32_64.h (v2) Mathieu Desnoyers
2018-06-02 12:43 ` [RFC PATCH for 4.18 02/16] rseq: Introduce restartable sequences system call (v13) Mathieu Desnoyers
2018-06-02 12:43   ` Mathieu Desnoyers
2018-06-02 12:43 ` [RFC PATCH for 4.18 03/16] arm: Add restartable sequences support Mathieu Desnoyers
2018-06-02 12:43 ` [RFC PATCH for 4.18 04/16] arm: Add syscall detection for restartable sequences Mathieu Desnoyers
2018-06-02 12:43 ` [RFC PATCH for 4.18 05/16] arm: Wire up restartable sequences system call Mathieu Desnoyers
2018-06-02 12:43 ` [RFC PATCH for 4.18 06/16] x86: Add support for restartable sequences (v2) Mathieu Desnoyers
2018-06-02 12:43 ` [RFC PATCH for 4.18 07/16] x86: Wire up restartable sequence system call Mathieu Desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 08/16] powerpc: Add support for restartable sequences Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 09/16] powerpc: Add syscall detection " Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-05  5:21   ` Michael Ellerman
2018-06-05  5:21     ` Michael Ellerman
2018-06-05 12:50     ` Mathieu Desnoyers
2018-06-05 12:50       ` Mathieu Desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 10/16] powerpc: Wire up restartable sequences system call Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-05  5:18   ` Michael Ellerman
2018-06-05  5:18     ` Michael Ellerman
2018-06-05 12:51     ` Mathieu Desnoyers
2018-06-05 12:51       ` Mathieu Desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 11/16] selftests: lib.mk: Introduce OVERRIDE_TARGETS Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` mathieu.desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 12/16] rseq: selftests: Provide rseq library (v5) Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` mathieu.desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 13/16] rseq: selftests: Provide basic test Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` mathieu.desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 14/16] rseq: selftests: Provide basic percpu ops test (v2) Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` mathieu.desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 15/16] rseq: selftests: Provide parametrized tests (v2) Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` mathieu.desnoyers
2018-06-02 12:44 ` [RFC PATCH for 4.18 16/16] rseq: selftests: Provide Makefile, scripts, gitignore (v2) Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` Mathieu Desnoyers
2018-06-02 12:44   ` mathieu.desnoyers
2018-07-27 22:01 ` [RFC PATCH for 4.18 00/16] Restartable Sequences Pavel Machek
2018-07-27 22:01   ` Pavel Machek
2018-07-28 13:49   ` Mathieu Desnoyers
2018-07-28 13:49     ` Mathieu Desnoyers
2018-07-28 14:13     ` Pavel Machek
2018-07-28 14:13       ` Pavel Machek
2018-07-30 18:42       ` Mathieu Desnoyers
2018-07-30 18:42         ` Mathieu Desnoyers
2018-07-30 19:07         ` Pavel Machek
2018-07-30 19:07           ` Pavel Machek
2018-07-30 19:34           ` Mathieu Desnoyers [this message]
2018-07-30 19:34             ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1259134501.7268.1532979258894.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bmaurer@fb.com \
    --cc=boqun.feng@gmail.com \
    --cc=carlos@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=davejwatson@fb.com \
    --cc=fweimer@redhat.com \
    --cc=hpa@zytor.com \
    --cc=joelaf@google.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.