All of lore.kernel.org
 help / color / mirror / Atom feed
* Section 9.5: Nobody expects the Spanish Acquisition!
@ 2021-12-21 15:20 Elad Lahav
  2021-12-22  8:21 ` Akira Yokosawa
  0 siblings, 1 reply; 16+ messages in thread
From: Elad Lahav @ 2021-12-21 15:20 UTC (permalink / raw)
  To: perfbook

Hi Paul,

As promised, I have a potentially more interesting comment regarding
Section 9.5. Throughout this section, from the very first example, the
writer uses release semantics, but the readers are not obligated to
use acquire semantics, at least on sensible architectures (with
apologies to employees of HP nee Compaq nee Digital).

If I understand correctly, these relaxed semantics are the result of
an address dependency, with the data protected by the RCU critical
section residing in a structure whose address is stored by the pointer
dereferenced by a reader. The reader cannot consider any data outside
of this structure as protected by the critical section. This is a
critical point without which the examples won't work reliably.

Am I missing something? If not, I think that this point should be
emphasized early on, with a link to Section 15.2.3.

--Elad

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-21 15:20 Section 9.5: Nobody expects the Spanish Acquisition! Elad Lahav
@ 2021-12-22  8:21 ` Akira Yokosawa
  2021-12-22 12:15   ` Elad Lahav
  0 siblings, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2021-12-22  8:21 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook, Paul E. McKenney

(+explicit Cc: Paul)

Hi Elad,

On Tue, 21 Dec 2021 10:20:52 -0500, Elad Lahav wrote:
> Hi Paul,
> 
> As promised, I have a potentially more interesting comment regarding
> Section 9.5. Throughout this section, from the very first example, the
> writer uses release semantics, but the readers are not obligated to
> use acquire semantics, at least on sensible architectures (with
> apologies to employees of HP nee Compaq nee Digital).

  ;-)

> 
> If I understand correctly, these relaxed semantics are the result of
> an address dependency, with the data protected by the RCU critical
> section residing in a structure whose address is stored by the pointer
> dereferenced by a reader. The reader cannot consider any data outside
> of this structure as protected by the critical section. This is a
> critical point without which the examples won't work reliably.

I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
Mechanism" and Figure 9.10 "Publication/Subscription Constraints".

> 
> Am I missing something? If not, I think that this point should be
> emphasized early on, with a link to Section 15.2.3.

There is such a reference in Section 9.5.2.1 just below Quick Quiz
9.28.  You mean you find this too late?

        Thanks, Akira

BTW, I couldn't figure out what you meant by "the Spanish
Acquisition"...

> 
> --Elad
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-22  8:21 ` Akira Yokosawa
@ 2021-12-22 12:15   ` Elad Lahav
  2021-12-22 14:35     ` Akira Yokosawa
  0 siblings, 1 reply; 16+ messages in thread
From: Elad Lahav @ 2021-12-22 12:15 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook, Paul E. McKenney

Hi Akira,

> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".

I don't believe it is. I think you can infer that from reading sections 
9.5 and chapter 15, but it is never made explicit.

Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a 
pointer, but it doesn't go very deep into how that's implemented. You 
would think that the writer's use of store-release semantics would 
necessitate the reader's use of load-acquire semantics, but the 
discussion here, the previous examples, and a quick inspection of how 
rcu_dereference() is implemented, suggest that a READ_ONCE() is 
sufficient (or some less Linux-kernel-specific equivalent).

If I am a naive reader (and a lazy one, who haven't read chapter 15 and 
made the necessary connection), I could write something like:

     struct foo_s {
         int a;
     } foo;
     int b;
     struct foo_s *g_foop = &foo;

     // Reader
     rcu_read_lock();
     struct foo *l_foop = rcu_dereference(g_foop);
     use(l_foop->a);
     use(b);
     rcu_read_unlock();

     // Writer
     struct foo *l_foop = malloc(sizeof(struct foo));
     l_foop->a = 1;
     b = 2;
     // Release semantics ensure previous stores are observed
     rcu_assign_pointer(g_foop, l_foop);

But since b has no address dependency to g_foop and since neither 
rcu_read_lock() nor rcu_dereference() impose acquire semantics, then the 
reader may not observe the new value of b.

Again, I may be missing something, but this seems to be a major point 
that needs to be explained and emphasized.

> BTW, I couldn't figure out what you meant by "the Spanish
> Acquisition"...
It's a reference to an old Monty Python skit. Apologies for the silly pun...

--Elad

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-22 12:15   ` Elad Lahav
@ 2021-12-22 14:35     ` Akira Yokosawa
  2021-12-22 18:19       ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2021-12-22 14:35 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook, Paul E. McKenney

Hi,

On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
> Hi Akira,
> 
>> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
>> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
> 
> I don't believe it is. I think you can infer that from reading sections 9.5
> and chapter 15, but it is never made explicit.

I agree it is not explicit.

> 
> Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
> pointer, but it doesn't go very deep into how that's implemented. You would
> think that the writer's use of store-release semantics would necessitate
> the reader's use of load-acquire semantics, but the discussion here, the
> previous examples, and a quick inspection of how rcu_dereference() is
> implemented, suggest that a READ_ONCE() is sufficient (or some less
> Linux-kernel-specific equivalent).
> 
> If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
> the necessary connection), I could write something like:
> 
>     struct foo_s {
>         int a;
>     } foo;
>     int b;
>     struct foo_s *g_foop = &foo;
> 
>     // Reader
>     rcu_read_lock();
>     struct foo *l_foop = rcu_dereference(g_foop);
>     use(l_foop->a);
>     use(b);
>     rcu_read_unlock();
> 
>     // Writer
>     struct foo *l_foop = malloc(sizeof(struct foo));
>     l_foop->a = 1;
>     b = 2;
>     // Release semantics ensure previous stores are observed
>     rcu_assign_pointer(g_foop, l_foop);
> 
> But since b has no address dependency to g_foop and since neither
> rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
> the reader may not observe the new value of b.

No, it may not.
So what you want is the mention of "address dependency" somewhere in
Section 9.5.2.1?

> 
> Again, I may be missing something, but this seems to be a major point
> that needs to be explained and emphasized.

I guess Paul is intentionally avoiding discussions on memory ordering
here in Section 9.5.

Such a discussion would easily frighten naive readers...

Anyway, I guess Paul would have some good compromise to satisfy you. 

> 
>> BTW, I couldn't figure out what you meant by "the Spanish
>> Acquisition"...
> It's a reference to an old Monty Python skit. Apologies for the silly pun...

Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)

        Thanks, Akira

> 
> --Elad


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-22 14:35     ` Akira Yokosawa
@ 2021-12-22 18:19       ` Paul E. McKenney
       [not found]         ` <CAJbg=FUHcqEXE+MgXif0n=e09xYFoGFfmhjvY1=pnC6QCCRh2w@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-12-22 18:19 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Elad Lahav, perfbook

On Wed, Dec 22, 2021 at 11:35:02PM +0900, Akira Yokosawa wrote:
> Hi,
> 
> On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
> > Hi Akira,
> > 
> >> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
> >> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
> > 
> > I don't believe it is. I think you can infer that from reading sections 9.5
> > and chapter 15, but it is never made explicit.
> 
> I agree it is not explicit.
> 
> > 
> > Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
> > pointer, but it doesn't go very deep into how that's implemented. You would
> > think that the writer's use of store-release semantics would necessitate
> > the reader's use of load-acquire semantics, but the discussion here, the
> > previous examples, and a quick inspection of how rcu_dereference() is
> > implemented, suggest that a READ_ONCE() is sufficient (or some less
> > Linux-kernel-specific equivalent).
> > 
> > If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
> > the necessary connection), I could write something like:
> > 
> >     struct foo_s {
> >         int a;
> >     } foo;
> >     int b;
> >     struct foo_s *g_foop = &foo;
> > 
> >     // Reader
> >     rcu_read_lock();
> >     struct foo *l_foop = rcu_dereference(g_foop);
> >     use(l_foop->a);
> >     use(b);
> >     rcu_read_unlock();
> > 
> >     // Writer
> >     struct foo *l_foop = malloc(sizeof(struct foo));
> >     l_foop->a = 1;
> >     b = 2;
> >     // Release semantics ensure previous stores are observed
> >     rcu_assign_pointer(g_foop, l_foop);
> > 
> > But since b has no address dependency to g_foop and since neither
> > rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
> > the reader may not observe the new value of b.
> 
> No, it may not.
> So what you want is the mention of "address dependency" somewhere in
> Section 9.5.2.1?
> 
> > 
> > Again, I may be missing something, but this seems to be a major point
> > that needs to be explained and emphasized.
> 
> I guess Paul is intentionally avoiding discussions on memory ordering
> here in Section 9.5.
> 
> Such a discussion would easily frighten naive readers...
> 
> Anyway, I guess Paul would have some good compromise to satisfy you. 

Actually, I am going to ask Elad to propose the location and wording of an
addition to Section 9.5 covering this, so that we can discuss and refine
it.  This addition might be a new paragraph, a footnote, a quick quiz,
a citation, or what have you.  And the refining might switch back and
forth among these options a few times.  But we do have to start somewhere.

I am thinking in terms of this going in after the release that I was
naively planning to do yesterday.  (The objective universe had other
ideas.)

> >> BTW, I couldn't figure out what you meant by "the Spanish
> >> Acquisition"...
> > It's a reference to an old Monty Python skit. Apologies for the silly pun...
> 
> Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)

And here I was hoping that Elad was purchasing a house in Barcelona
or some such.  ;-)

(Sorry, couldn't resist!)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
       [not found]         ` <CAJbg=FUHcqEXE+MgXif0n=e09xYFoGFfmhjvY1=pnC6QCCRh2w@mail.gmail.com>
@ 2021-12-23  2:29           ` Paul E. McKenney
  2021-12-23 12:22             ` Akira Yokosawa
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2021-12-23  2:29 UTC (permalink / raw)
  To: Elad Lahav; +Cc: Akira Yokosawa, perfbook

Strictly speaking, all that the rcu_read_lock()/rcu_read_unlock() does
is to ensure that any RCU-protected linked data objects referenced by
the corresponding critical section remain in existence for the full
duration of that critical section.

And even that overstates things a bit, as this describes not the
underlying RCU API itself, but rather some of that API's use cases.
(Though to be fair, these are the most popular of RCU's use cases.)
So even more strictly speaking, all the rcu_read_lock()/rcu_read_unlock()
does is to make any synchronize_rcu() or call_rcu() invocations starting
after a given rcu_read_lock() wait (synchronously or asynchronously,
respectively) until the corresponding rcu_read_unlock() is reached.

Returning back to the first paragraph, additional protections can be
arranged, depending on the RCU use case.  For example, if the non-pointer
fields of objects added to an RCU-protected linked data structure remain
unchanged while that object is accessible to readers, then the protection
includes not just existence, but also value.  And there are other use
cases where those values might change while accessible to readers.

So the most accurate answer to your question is "That is a design
choice, based on the RCU use case in question."

My guess is that you are thinking in terms of designs where the
non-pointer fields of an object are constant while that object is
accessible to readers.  Is my guess correct?

							Thanx, Paul

On Wed, Dec 22, 2021 at 04:35:24PM -0500, Elad Lahav wrote:
> I will try to come up with something, but first I wanted to get your
> opinion as to whether this design pattern, where you need to
> dereference a pointer to a data structure containing all of the data
> considered to be protected by the critical section, is inherent to
> RCU, or is it just an idiosyncrasy of the implementation used in the
> Linux kernel?
> 
> --Elad
> 
> On Wed, 22 Dec 2021 at 13:19, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Dec 22, 2021 at 11:35:02PM +0900, Akira Yokosawa wrote:
> > > Hi,
> > >
> > > On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
> > > > Hi Akira,
> > > >
> > > >> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
> > > >> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
> > > >
> > > > I don't believe it is. I think you can infer that from reading sections 9.5
> > > > and chapter 15, but it is never made explicit.
> > >
> > > I agree it is not explicit.
> > >
> > > >
> > > > Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
> > > > pointer, but it doesn't go very deep into how that's implemented. You would
> > > > think that the writer's use of store-release semantics would necessitate
> > > > the reader's use of load-acquire semantics, but the discussion here, the
> > > > previous examples, and a quick inspection of how rcu_dereference() is
> > > > implemented, suggest that a READ_ONCE() is sufficient (or some less
> > > > Linux-kernel-specific equivalent).
> > > >
> > > > If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
> > > > the necessary connection), I could write something like:
> > > >
> > > >     struct foo_s {
> > > >         int a;
> > > >     } foo;
> > > >     int b;
> > > >     struct foo_s *g_foop = &foo;
> > > >
> > > >     // Reader
> > > >     rcu_read_lock();
> > > >     struct foo *l_foop = rcu_dereference(g_foop);
> > > >     use(l_foop->a);
> > > >     use(b);
> > > >     rcu_read_unlock();
> > > >
> > > >     // Writer
> > > >     struct foo *l_foop = malloc(sizeof(struct foo));
> > > >     l_foop->a = 1;
> > > >     b = 2;
> > > >     // Release semantics ensure previous stores are observed
> > > >     rcu_assign_pointer(g_foop, l_foop);
> > > >
> > > > But since b has no address dependency to g_foop and since neither
> > > > rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
> > > > the reader may not observe the new value of b.
> > >
> > > No, it may not.
> > > So what you want is the mention of "address dependency" somewhere in
> > > Section 9.5.2.1?
> > >
> > > >
> > > > Again, I may be missing something, but this seems to be a major point
> > > > that needs to be explained and emphasized.
> > >
> > > I guess Paul is intentionally avoiding discussions on memory ordering
> > > here in Section 9.5.
> > >
> > > Such a discussion would easily frighten naive readers...
> > >
> > > Anyway, I guess Paul would have some good compromise to satisfy you.
> >
> > Actually, I am going to ask Elad to propose the location and wording of an
> > addition to Section 9.5 covering this, so that we can discuss and refine
> > it.  This addition might be a new paragraph, a footnote, a quick quiz,
> > a citation, or what have you.  And the refining might switch back and
> > forth among these options a few times.  But we do have to start somewhere.
> >
> > I am thinking in terms of this going in after the release that I was
> > naively planning to do yesterday.  (The objective universe had other
> > ideas.)
> >
> > > >> BTW, I couldn't figure out what you meant by "the Spanish
> > > >> Acquisition"...
> > > > It's a reference to an old Monty Python skit. Apologies for the silly pun...
> > >
> > > Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)
> >
> > And here I was hoping that Elad was purchasing a house in Barcelona
> > or some such.  ;-)
> >
> > (Sorry, couldn't resist!)
> >
> >                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-23  2:29           ` Paul E. McKenney
@ 2021-12-23 12:22             ` Akira Yokosawa
  2021-12-23 12:46               ` Elad Lahav
  0 siblings, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2021-12-23 12:22 UTC (permalink / raw)
  To: Paul E. McKenney, Elad Lahav; +Cc: perfbook, Akira Yokosawa

Hi,

It sounds to me like Paul is missing the main point of Elad's question.
So I'm chiming in again.  See inline comment below Elad's question.

On Wed, 22 Dec 2021 18:29:57 -0800, Paul E. McKenney wrote:
> Strictly speaking, all that the rcu_read_lock()/rcu_read_unlock() does
> is to ensure that any RCU-protected linked data objects referenced by
> the corresponding critical section remain in existence for the full
> duration of that critical section.
> 
> And even that overstates things a bit, as this describes not the
> underlying RCU API itself, but rather some of that API's use cases.
> (Though to be fair, these are the most popular of RCU's use cases.)
> So even more strictly speaking, all the rcu_read_lock()/rcu_read_unlock()
> does is to make any synchronize_rcu() or call_rcu() invocations starting
> after a given rcu_read_lock() wait (synchronously or asynchronously,
> respectively) until the corresponding rcu_read_unlock() is reached.
> 
> Returning back to the first paragraph, additional protections can be
> arranged, depending on the RCU use case.  For example, if the non-pointer
> fields of objects added to an RCU-protected linked data structure remain
> unchanged while that object is accessible to readers, then the protection
> includes not just existence, but also value.  And there are other use
> cases where those values might change while accessible to readers.
> 
> So the most accurate answer to your question is "That is a design
> choice, based on the RCU use case in question."
> 
> My guess is that you are thinking in terms of designs where the
> non-pointer fields of an object are constant while that object is
> accessible to readers.  Is my guess correct?
> 
> 							Thanx, Paul
> 
> On Wed, Dec 22, 2021 at 04:35:24PM -0500, Elad Lahav wrote:
>> I will try to come up with something, but first I wanted to get your
>> opinion as to whether this design pattern, where you need to
>> dereference a pointer to a data structure containing all of the data
>> considered to be protected by the critical section, is inherent to
>> RCU, or is it just an idiosyncrasy of the implementation used in the
>> Linux kernel?
>>
>> --Elad

I guess Elad is asking more of the design choice of rcu_assign_pointer()
and rcu_dereference().

A naive reader might easily miss the asymmetry of rcu_assign_pointer()
with store-release and rcu_defeference() *without* load-acquire.

IIUC, this asymmetry comes from the use cases where RCU performs best,
that is read-mostly.  Therefore, rcu_dereference() need to be
light-weight as possible and doesn't imply load-acquire.
Thankfully quite a lot of use cases can be covered by the pattern
where address-dependency is sufficient for read-side.

So one approach for Elad's concern would be to add a Quick Quiz on
this asymmetry, I suppose.

Elad, am I guessing right?

Sidenote: RCU and memory accesses are orthogonal.  You can use whatever
memory access primitives for your needed memory ordering. (Which would
be quite likely a tricky thing to do for a naive reader, though.)

        Thanks, Akira

>>
>> On Wed, 22 Dec 2021 at 13:19, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On Wed, Dec 22, 2021 at 11:35:02PM +0900, Akira Yokosawa wrote:
>>>> Hi,
>>>>
>>>> On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
>>>>> Hi Akira,
>>>>>
>>>>>> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
>>>>>> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
>>>>>
>>>>> I don't believe it is. I think you can infer that from reading sections 9.5
>>>>> and chapter 15, but it is never made explicit.
>>>>
>>>> I agree it is not explicit.
>>>>
>>>>>
>>>>> Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
>>>>> pointer, but it doesn't go very deep into how that's implemented. You would
>>>>> think that the writer's use of store-release semantics would necessitate
>>>>> the reader's use of load-acquire semantics, but the discussion here, the
>>>>> previous examples, and a quick inspection of how rcu_dereference() is
>>>>> implemented, suggest that a READ_ONCE() is sufficient (or some less
>>>>> Linux-kernel-specific equivalent).
>>>>>
>>>>> If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
>>>>> the necessary connection), I could write something like:
>>>>>
>>>>>     struct foo_s {
>>>>>         int a;
>>>>>     } foo;
>>>>>     int b;
>>>>>     struct foo_s *g_foop = &foo;
>>>>>
>>>>>     // Reader
>>>>>     rcu_read_lock();
>>>>>     struct foo *l_foop = rcu_dereference(g_foop);
>>>>>     use(l_foop->a);
>>>>>     use(b);
>>>>>     rcu_read_unlock();
>>>>>
>>>>>     // Writer
>>>>>     struct foo *l_foop = malloc(sizeof(struct foo));
>>>>>     l_foop->a = 1;
>>>>>     b = 2;
>>>>>     // Release semantics ensure previous stores are observed
>>>>>     rcu_assign_pointer(g_foop, l_foop);
>>>>>
>>>>> But since b has no address dependency to g_foop and since neither
>>>>> rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
>>>>> the reader may not observe the new value of b.
>>>>
>>>> No, it may not.
>>>> So what you want is the mention of "address dependency" somewhere in
>>>> Section 9.5.2.1?
>>>>
>>>>>
>>>>> Again, I may be missing something, but this seems to be a major point
>>>>> that needs to be explained and emphasized.
>>>>
>>>> I guess Paul is intentionally avoiding discussions on memory ordering
>>>> here in Section 9.5.
>>>>
>>>> Such a discussion would easily frighten naive readers...
>>>>
>>>> Anyway, I guess Paul would have some good compromise to satisfy you.
>>>
>>> Actually, I am going to ask Elad to propose the location and wording of an
>>> addition to Section 9.5 covering this, so that we can discuss and refine
>>> it.  This addition might be a new paragraph, a footnote, a quick quiz,
>>> a citation, or what have you.  And the refining might switch back and
>>> forth among these options a few times.  But we do have to start somewhere.
>>>
>>> I am thinking in terms of this going in after the release that I was
>>> naively planning to do yesterday.  (The objective universe had other
>>> ideas.)
>>>
>>>>>> BTW, I couldn't figure out what you meant by "the Spanish
>>>>>> Acquisition"...
>>>>> It's a reference to an old Monty Python skit. Apologies for the silly pun...
>>>>
>>>> Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)
>>>
>>> And here I was hoping that Elad was purchasing a house in Barcelona
>>> or some such.  ;-)
>>>
>>> (Sorry, couldn't resist!)
>>>
>>>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-23 12:22             ` Akira Yokosawa
@ 2021-12-23 12:46               ` Elad Lahav
  2021-12-23 12:59                 ` Akira Yokosawa
  2021-12-23 17:26                 ` Paul E. McKenney
  0 siblings, 2 replies; 16+ messages in thread
From: Elad Lahav @ 2021-12-23 12:46 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Paul E. McKenney, perfbook

Yes, Akira, that's where I was going. Nevertheless, my question is a
bit more profound than that: is the asymmetry a result of the
implementation of RCU, as presented here (and the way it is used in
the Linux kernel), or is it inherent to RCU as a data access pattern?
I believe it is the former, and have an idea on how to phrase this
notion more formally. I will write something up and you can then
decide whether:

1. I'm wrong
2. I'm right, but it's not worth mentioning
3. I'm right, and the point should be added to the book
4. None of the above ;-)

--Elad

On Thu, 23 Dec 2021 at 07:22, Akira Yokosawa <akiyks@gmail.com> wrote:
>
> Hi,
>
> It sounds to me like Paul is missing the main point of Elad's question.
> So I'm chiming in again.  See inline comment below Elad's question.
>
> On Wed, 22 Dec 2021 18:29:57 -0800, Paul E. McKenney wrote:
> > Strictly speaking, all that the rcu_read_lock()/rcu_read_unlock() does
> > is to ensure that any RCU-protected linked data objects referenced by
> > the corresponding critical section remain in existence for the full
> > duration of that critical section.
> >
> > And even that overstates things a bit, as this describes not the
> > underlying RCU API itself, but rather some of that API's use cases.
> > (Though to be fair, these are the most popular of RCU's use cases.)
> > So even more strictly speaking, all the rcu_read_lock()/rcu_read_unlock()
> > does is to make any synchronize_rcu() or call_rcu() invocations starting
> > after a given rcu_read_lock() wait (synchronously or asynchronously,
> > respectively) until the corresponding rcu_read_unlock() is reached.
> >
> > Returning back to the first paragraph, additional protections can be
> > arranged, depending on the RCU use case.  For example, if the non-pointer
> > fields of objects added to an RCU-protected linked data structure remain
> > unchanged while that object is accessible to readers, then the protection
> > includes not just existence, but also value.  And there are other use
> > cases where those values might change while accessible to readers.
> >
> > So the most accurate answer to your question is "That is a design
> > choice, based on the RCU use case in question."
> >
> > My guess is that you are thinking in terms of designs where the
> > non-pointer fields of an object are constant while that object is
> > accessible to readers.  Is my guess correct?
> >
> >                                                       Thanx, Paul
> >
> > On Wed, Dec 22, 2021 at 04:35:24PM -0500, Elad Lahav wrote:
> >> I will try to come up with something, but first I wanted to get your
> >> opinion as to whether this design pattern, where you need to
> >> dereference a pointer to a data structure containing all of the data
> >> considered to be protected by the critical section, is inherent to
> >> RCU, or is it just an idiosyncrasy of the implementation used in the
> >> Linux kernel?
> >>
> >> --Elad
>
> I guess Elad is asking more of the design choice of rcu_assign_pointer()
> and rcu_dereference().
>
> A naive reader might easily miss the asymmetry of rcu_assign_pointer()
> with store-release and rcu_defeference() *without* load-acquire.
>
> IIUC, this asymmetry comes from the use cases where RCU performs best,
> that is read-mostly.  Therefore, rcu_dereference() need to be
> light-weight as possible and doesn't imply load-acquire.
> Thankfully quite a lot of use cases can be covered by the pattern
> where address-dependency is sufficient for read-side.
>
> So one approach for Elad's concern would be to add a Quick Quiz on
> this asymmetry, I suppose.
>
> Elad, am I guessing right?
>
> Sidenote: RCU and memory accesses are orthogonal.  You can use whatever
> memory access primitives for your needed memory ordering. (Which would
> be quite likely a tricky thing to do for a naive reader, though.)
>
>         Thanks, Akira
>
> >>
> >> On Wed, 22 Dec 2021 at 13:19, Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>
> >>> On Wed, Dec 22, 2021 at 11:35:02PM +0900, Akira Yokosawa wrote:
> >>>> Hi,
> >>>>
> >>>> On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
> >>>>> Hi Akira,
> >>>>>
> >>>>>> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
> >>>>>> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
> >>>>>
> >>>>> I don't believe it is. I think you can infer that from reading sections 9.5
> >>>>> and chapter 15, but it is never made explicit.
> >>>>
> >>>> I agree it is not explicit.
> >>>>
> >>>>>
> >>>>> Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
> >>>>> pointer, but it doesn't go very deep into how that's implemented. You would
> >>>>> think that the writer's use of store-release semantics would necessitate
> >>>>> the reader's use of load-acquire semantics, but the discussion here, the
> >>>>> previous examples, and a quick inspection of how rcu_dereference() is
> >>>>> implemented, suggest that a READ_ONCE() is sufficient (or some less
> >>>>> Linux-kernel-specific equivalent).
> >>>>>
> >>>>> If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
> >>>>> the necessary connection), I could write something like:
> >>>>>
> >>>>>     struct foo_s {
> >>>>>         int a;
> >>>>>     } foo;
> >>>>>     int b;
> >>>>>     struct foo_s *g_foop = &foo;
> >>>>>
> >>>>>     // Reader
> >>>>>     rcu_read_lock();
> >>>>>     struct foo *l_foop = rcu_dereference(g_foop);
> >>>>>     use(l_foop->a);
> >>>>>     use(b);
> >>>>>     rcu_read_unlock();
> >>>>>
> >>>>>     // Writer
> >>>>>     struct foo *l_foop = malloc(sizeof(struct foo));
> >>>>>     l_foop->a = 1;
> >>>>>     b = 2;
> >>>>>     // Release semantics ensure previous stores are observed
> >>>>>     rcu_assign_pointer(g_foop, l_foop);
> >>>>>
> >>>>> But since b has no address dependency to g_foop and since neither
> >>>>> rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
> >>>>> the reader may not observe the new value of b.
> >>>>
> >>>> No, it may not.
> >>>> So what you want is the mention of "address dependency" somewhere in
> >>>> Section 9.5.2.1?
> >>>>
> >>>>>
> >>>>> Again, I may be missing something, but this seems to be a major point
> >>>>> that needs to be explained and emphasized.
> >>>>
> >>>> I guess Paul is intentionally avoiding discussions on memory ordering
> >>>> here in Section 9.5.
> >>>>
> >>>> Such a discussion would easily frighten naive readers...
> >>>>
> >>>> Anyway, I guess Paul would have some good compromise to satisfy you.
> >>>
> >>> Actually, I am going to ask Elad to propose the location and wording of an
> >>> addition to Section 9.5 covering this, so that we can discuss and refine
> >>> it.  This addition might be a new paragraph, a footnote, a quick quiz,
> >>> a citation, or what have you.  And the refining might switch back and
> >>> forth among these options a few times.  But we do have to start somewhere.
> >>>
> >>> I am thinking in terms of this going in after the release that I was
> >>> naively planning to do yesterday.  (The objective universe had other
> >>> ideas.)
> >>>
> >>>>>> BTW, I couldn't figure out what you meant by "the Spanish
> >>>>>> Acquisition"...
> >>>>> It's a reference to an old Monty Python skit. Apologies for the silly pun...
> >>>>
> >>>> Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)
> >>>
> >>> And here I was hoping that Elad was purchasing a house in Barcelona
> >>> or some such.  ;-)
> >>>
> >>> (Sorry, couldn't resist!)
> >>>
> >>>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-23 12:46               ` Elad Lahav
@ 2021-12-23 12:59                 ` Akira Yokosawa
  2021-12-23 14:26                   ` Elad Lahav
  2021-12-23 17:26                 ` Paul E. McKenney
  1 sibling, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2021-12-23 12:59 UTC (permalink / raw)
  To: Elad Lahav; +Cc: Paul E. McKenney, perfbook, Akira Yokosawa

On Thu, 23 Dec 2021 07:46:02 -0500, Elad Lahav wrote:
> Yes, Akira, that's where I was going. Nevertheless, my question is a
> bit more profound than that: is the asymmetry a result of the
> implementation of RCU, as presented here (and the way it is used in
> the Linux kernel), or is it inherent to RCU as a data access pattern?
> I believe it is the former, and have an idea on how to phrase this
> notion more formally. I will write something up and you can then
> decide whether:
> 
> 1. I'm wrong
> 2. I'm right, but it's not worth mentioning
> 3. I'm right, and the point should be added to the book
> 4. None of the above ;-)

Nice!
Looking forward to seeing your write-up.

        Thanks, Akira

> 
> --Elad
> 
> On Thu, 23 Dec 2021 at 07:22, Akira Yokosawa <akiyks@gmail.com> wrote:
>>
>> Hi,
>>
>> It sounds to me like Paul is missing the main point of Elad's question.
>> So I'm chiming in again.  See inline comment below Elad's question.
>>
>> On Wed, 22 Dec 2021 18:29:57 -0800, Paul E. McKenney wrote:
>>> Strictly speaking, all that the rcu_read_lock()/rcu_read_unlock() does
>>> is to ensure that any RCU-protected linked data objects referenced by
>>> the corresponding critical section remain in existence for the full
>>> duration of that critical section.
>>>
>>> And even that overstates things a bit, as this describes not the
>>> underlying RCU API itself, but rather some of that API's use cases.
>>> (Though to be fair, these are the most popular of RCU's use cases.)
>>> So even more strictly speaking, all the rcu_read_lock()/rcu_read_unlock()
>>> does is to make any synchronize_rcu() or call_rcu() invocations starting
>>> after a given rcu_read_lock() wait (synchronously or asynchronously,
>>> respectively) until the corresponding rcu_read_unlock() is reached.
>>>
>>> Returning back to the first paragraph, additional protections can be
>>> arranged, depending on the RCU use case.  For example, if the non-pointer
>>> fields of objects added to an RCU-protected linked data structure remain
>>> unchanged while that object is accessible to readers, then the protection
>>> includes not just existence, but also value.  And there are other use
>>> cases where those values might change while accessible to readers.
>>>
>>> So the most accurate answer to your question is "That is a design
>>> choice, based on the RCU use case in question."
>>>
>>> My guess is that you are thinking in terms of designs where the
>>> non-pointer fields of an object are constant while that object is
>>> accessible to readers.  Is my guess correct?
>>>
>>>                                                       Thanx, Paul
>>>
>>> On Wed, Dec 22, 2021 at 04:35:24PM -0500, Elad Lahav wrote:
>>>> I will try to come up with something, but first I wanted to get your
>>>> opinion as to whether this design pattern, where you need to
>>>> dereference a pointer to a data structure containing all of the data
>>>> considered to be protected by the critical section, is inherent to
>>>> RCU, or is it just an idiosyncrasy of the implementation used in the
>>>> Linux kernel?
>>>>
>>>> --Elad
>>
>> I guess Elad is asking more of the design choice of rcu_assign_pointer()
>> and rcu_dereference().
>>
>> A naive reader might easily miss the asymmetry of rcu_assign_pointer()
>> with store-release and rcu_defeference() *without* load-acquire.
>>
>> IIUC, this asymmetry comes from the use cases where RCU performs best,
>> that is read-mostly.  Therefore, rcu_dereference() need to be
>> light-weight as possible and doesn't imply load-acquire.
>> Thankfully quite a lot of use cases can be covered by the pattern
>> where address-dependency is sufficient for read-side.
>>
>> So one approach for Elad's concern would be to add a Quick Quiz on
>> this asymmetry, I suppose.
>>
>> Elad, am I guessing right?
>>
>> Sidenote: RCU and memory accesses are orthogonal.  You can use whatever
>> memory access primitives for your needed memory ordering. (Which would
>> be quite likely a tricky thing to do for a naive reader, though.)
>>
>>         Thanks, Akira
>>
>>>>
>>>> On Wed, 22 Dec 2021 at 13:19, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>>
>>>>> On Wed, Dec 22, 2021 at 11:35:02PM +0900, Akira Yokosawa wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
>>>>>>> Hi Akira,
>>>>>>>
>>>>>>>> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
>>>>>>>> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
>>>>>>>
>>>>>>> I don't believe it is. I think you can infer that from reading sections 9.5
>>>>>>> and chapter 15, but it is never made explicit.
>>>>>>
>>>>>> I agree it is not explicit.
>>>>>>
>>>>>>>
>>>>>>> Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
>>>>>>> pointer, but it doesn't go very deep into how that's implemented. You would
>>>>>>> think that the writer's use of store-release semantics would necessitate
>>>>>>> the reader's use of load-acquire semantics, but the discussion here, the
>>>>>>> previous examples, and a quick inspection of how rcu_dereference() is
>>>>>>> implemented, suggest that a READ_ONCE() is sufficient (or some less
>>>>>>> Linux-kernel-specific equivalent).
>>>>>>>
>>>>>>> If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
>>>>>>> the necessary connection), I could write something like:
>>>>>>>
>>>>>>>     struct foo_s {
>>>>>>>         int a;
>>>>>>>     } foo;
>>>>>>>     int b;
>>>>>>>     struct foo_s *g_foop = &foo;
>>>>>>>
>>>>>>>     // Reader
>>>>>>>     rcu_read_lock();
>>>>>>>     struct foo *l_foop = rcu_dereference(g_foop);
>>>>>>>     use(l_foop->a);
>>>>>>>     use(b);
>>>>>>>     rcu_read_unlock();
>>>>>>>
>>>>>>>     // Writer
>>>>>>>     struct foo *l_foop = malloc(sizeof(struct foo));
>>>>>>>     l_foop->a = 1;
>>>>>>>     b = 2;
>>>>>>>     // Release semantics ensure previous stores are observed
>>>>>>>     rcu_assign_pointer(g_foop, l_foop);
>>>>>>>
>>>>>>> But since b has no address dependency to g_foop and since neither
>>>>>>> rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
>>>>>>> the reader may not observe the new value of b.
>>>>>>
>>>>>> No, it may not.
>>>>>> So what you want is the mention of "address dependency" somewhere in
>>>>>> Section 9.5.2.1?
>>>>>>
>>>>>>>
>>>>>>> Again, I may be missing something, but this seems to be a major point
>>>>>>> that needs to be explained and emphasized.
>>>>>>
>>>>>> I guess Paul is intentionally avoiding discussions on memory ordering
>>>>>> here in Section 9.5.
>>>>>>
>>>>>> Such a discussion would easily frighten naive readers...
>>>>>>
>>>>>> Anyway, I guess Paul would have some good compromise to satisfy you.
>>>>>
>>>>> Actually, I am going to ask Elad to propose the location and wording of an
>>>>> addition to Section 9.5 covering this, so that we can discuss and refine
>>>>> it.  This addition might be a new paragraph, a footnote, a quick quiz,
>>>>> a citation, or what have you.  And the refining might switch back and
>>>>> forth among these options a few times.  But we do have to start somewhere.
>>>>>
>>>>> I am thinking in terms of this going in after the release that I was
>>>>> naively planning to do yesterday.  (The objective universe had other
>>>>> ideas.)
>>>>>
>>>>>>>> BTW, I couldn't figure out what you meant by "the Spanish
>>>>>>>> Acquisition"...
>>>>>>> It's a reference to an old Monty Python skit. Apologies for the silly pun...
>>>>>>
>>>>>> Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)
>>>>>
>>>>> And here I was hoping that Elad was purchasing a house in Barcelona
>>>>> or some such.  ;-)
>>>>>
>>>>> (Sorry, couldn't resist!)
>>>>>
>>>>>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-23 12:59                 ` Akira Yokosawa
@ 2021-12-23 14:26                   ` Elad Lahav
  2022-01-04  0:05                     ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Elad Lahav @ 2021-12-23 14:26 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Paul E. McKenney, perfbook

[-- Attachment #1: Type: text/plain, Size: 9519 bytes --]

OK, here goes.
The attached code corresponds to my understanding of the generic form
of RCU. Note the complete absence of any pointers or a single data
structure to hold the variables protected by the RCU critical section.
Section 9.5 deals (for the most part) with one possible
implementation, albeit one that seems to be the easiest and probably
most efficient: the version identifier is an address of a data
structure allocated on a heap and the tags correspond to fields within
this data structure. Nevertheless, it is possible to come up with a
different implementation, e.g., a database-like set of records
consisting of <version, tag, value> tuples.
It is clear that rcu_set_latest_version() needs to impose memory order
to prevent the new version being observed by a reader before the tags
have been given the new values. It should also be clear that the read
side requires ordering to prevent the values being read before the
latest version is established. That said, even in my generic
implementation this order is already present because of the data
dependency between the version number and the rcu_read() calls. Unless
there is a way to get rid of that dependency without losing the
semantics of RCU it appears as though there really is no need for
acquire semantics in general.

Thoughts?

--Elad

On Thu, 23 Dec 2021 at 07:59, Akira Yokosawa <akiyks@gmail.com> wrote:
>
> On Thu, 23 Dec 2021 07:46:02 -0500, Elad Lahav wrote:
> > Yes, Akira, that's where I was going. Nevertheless, my question is a
> > bit more profound than that: is the asymmetry a result of the
> > implementation of RCU, as presented here (and the way it is used in
> > the Linux kernel), or is it inherent to RCU as a data access pattern?
> > I believe it is the former, and have an idea on how to phrase this
> > notion more formally. I will write something up and you can then
> > decide whether:
> >
> > 1. I'm wrong
> > 2. I'm right, but it's not worth mentioning
> > 3. I'm right, and the point should be added to the book
> > 4. None of the above ;-)
>
> Nice!
> Looking forward to seeing your write-up.
>
>         Thanks, Akira
>
> >
> > --Elad
> >
> > On Thu, 23 Dec 2021 at 07:22, Akira Yokosawa <akiyks@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> It sounds to me like Paul is missing the main point of Elad's question.
> >> So I'm chiming in again.  See inline comment below Elad's question.
> >>
> >> On Wed, 22 Dec 2021 18:29:57 -0800, Paul E. McKenney wrote:
> >>> Strictly speaking, all that the rcu_read_lock()/rcu_read_unlock() does
> >>> is to ensure that any RCU-protected linked data objects referenced by
> >>> the corresponding critical section remain in existence for the full
> >>> duration of that critical section.
> >>>
> >>> And even that overstates things a bit, as this describes not the
> >>> underlying RCU API itself, but rather some of that API's use cases.
> >>> (Though to be fair, these are the most popular of RCU's use cases.)
> >>> So even more strictly speaking, all the rcu_read_lock()/rcu_read_unlock()
> >>> does is to make any synchronize_rcu() or call_rcu() invocations starting
> >>> after a given rcu_read_lock() wait (synchronously or asynchronously,
> >>> respectively) until the corresponding rcu_read_unlock() is reached.
> >>>
> >>> Returning back to the first paragraph, additional protections can be
> >>> arranged, depending on the RCU use case.  For example, if the non-pointer
> >>> fields of objects added to an RCU-protected linked data structure remain
> >>> unchanged while that object is accessible to readers, then the protection
> >>> includes not just existence, but also value.  And there are other use
> >>> cases where those values might change while accessible to readers.
> >>>
> >>> So the most accurate answer to your question is "That is a design
> >>> choice, based on the RCU use case in question."
> >>>
> >>> My guess is that you are thinking in terms of designs where the
> >>> non-pointer fields of an object are constant while that object is
> >>> accessible to readers.  Is my guess correct?
> >>>
> >>>                                                       Thanx, Paul
> >>>
> >>> On Wed, Dec 22, 2021 at 04:35:24PM -0500, Elad Lahav wrote:
> >>>> I will try to come up with something, but first I wanted to get your
> >>>> opinion as to whether this design pattern, where you need to
> >>>> dereference a pointer to a data structure containing all of the data
> >>>> considered to be protected by the critical section, is inherent to
> >>>> RCU, or is it just an idiosyncrasy of the implementation used in the
> >>>> Linux kernel?
> >>>>
> >>>> --Elad
> >>
> >> I guess Elad is asking more of the design choice of rcu_assign_pointer()
> >> and rcu_dereference().
> >>
> >> A naive reader might easily miss the asymmetry of rcu_assign_pointer()
> >> with store-release and rcu_defeference() *without* load-acquire.
> >>
> >> IIUC, this asymmetry comes from the use cases where RCU performs best,
> >> that is read-mostly.  Therefore, rcu_dereference() need to be
> >> light-weight as possible and doesn't imply load-acquire.
> >> Thankfully quite a lot of use cases can be covered by the pattern
> >> where address-dependency is sufficient for read-side.
> >>
> >> So one approach for Elad's concern would be to add a Quick Quiz on
> >> this asymmetry, I suppose.
> >>
> >> Elad, am I guessing right?
> >>
> >> Sidenote: RCU and memory accesses are orthogonal.  You can use whatever
> >> memory access primitives for your needed memory ordering. (Which would
> >> be quite likely a tricky thing to do for a naive reader, though.)
> >>
> >>         Thanks, Akira
> >>
> >>>>
> >>>> On Wed, 22 Dec 2021 at 13:19, Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>>>
> >>>>> On Wed, Dec 22, 2021 at 11:35:02PM +0900, Akira Yokosawa wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
> >>>>>>> Hi Akira,
> >>>>>>>
> >>>>>>>> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
> >>>>>>>> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
> >>>>>>>
> >>>>>>> I don't believe it is. I think you can infer that from reading sections 9.5
> >>>>>>> and chapter 15, but it is never made explicit.
> >>>>>>
> >>>>>> I agree it is not explicit.
> >>>>>>
> >>>>>>>
> >>>>>>> Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
> >>>>>>> pointer, but it doesn't go very deep into how that's implemented. You would
> >>>>>>> think that the writer's use of store-release semantics would necessitate
> >>>>>>> the reader's use of load-acquire semantics, but the discussion here, the
> >>>>>>> previous examples, and a quick inspection of how rcu_dereference() is
> >>>>>>> implemented, suggest that a READ_ONCE() is sufficient (or some less
> >>>>>>> Linux-kernel-specific equivalent).
> >>>>>>>
> >>>>>>> If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
> >>>>>>> the necessary connection), I could write something like:
> >>>>>>>
> >>>>>>>     struct foo_s {
> >>>>>>>         int a;
> >>>>>>>     } foo;
> >>>>>>>     int b;
> >>>>>>>     struct foo_s *g_foop = &foo;
> >>>>>>>
> >>>>>>>     // Reader
> >>>>>>>     rcu_read_lock();
> >>>>>>>     struct foo *l_foop = rcu_dereference(g_foop);
> >>>>>>>     use(l_foop->a);
> >>>>>>>     use(b);
> >>>>>>>     rcu_read_unlock();
> >>>>>>>
> >>>>>>>     // Writer
> >>>>>>>     struct foo *l_foop = malloc(sizeof(struct foo));
> >>>>>>>     l_foop->a = 1;
> >>>>>>>     b = 2;
> >>>>>>>     // Release semantics ensure previous stores are observed
> >>>>>>>     rcu_assign_pointer(g_foop, l_foop);
> >>>>>>>
> >>>>>>> But since b has no address dependency to g_foop and since neither
> >>>>>>> rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
> >>>>>>> the reader may not observe the new value of b.
> >>>>>>
> >>>>>> No, it may not.
> >>>>>> So what you want is the mention of "address dependency" somewhere in
> >>>>>> Section 9.5.2.1?
> >>>>>>
> >>>>>>>
> >>>>>>> Again, I may be missing something, but this seems to be a major point
> >>>>>>> that needs to be explained and emphasized.
> >>>>>>
> >>>>>> I guess Paul is intentionally avoiding discussions on memory ordering
> >>>>>> here in Section 9.5.
> >>>>>>
> >>>>>> Such a discussion would easily frighten naive readers...
> >>>>>>
> >>>>>> Anyway, I guess Paul would have some good compromise to satisfy you.
> >>>>>
> >>>>> Actually, I am going to ask Elad to propose the location and wording of an
> >>>>> addition to Section 9.5 covering this, so that we can discuss and refine
> >>>>> it.  This addition might be a new paragraph, a footnote, a quick quiz,
> >>>>> a citation, or what have you.  And the refining might switch back and
> >>>>> forth among these options a few times.  But we do have to start somewhere.
> >>>>>
> >>>>> I am thinking in terms of this going in after the release that I was
> >>>>> naively planning to do yesterday.  (The objective universe had other
> >>>>> ideas.)
> >>>>>
> >>>>>>>> BTW, I couldn't figure out what you meant by "the Spanish
> >>>>>>>> Acquisition"...
> >>>>>>> It's a reference to an old Monty Python skit. Apologies for the silly pun...
> >>>>>>
> >>>>>> Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)
> >>>>>
> >>>>> And here I was hoping that Elad was purchasing a house in Barcelona
> >>>>> or some such.  ;-)
> >>>>>
> >>>>> (Sorry, couldn't resist!)
> >>>>>
> >>>>>                                                         Thanx, Paul

[-- Attachment #2: rcu_generic.c --]
[-- Type: text/x-csrc, Size: 1502 bytes --]

#include <rcu.h>

// These tags represent variables protected by the RCU critical section.
// The variables are versioned: every version associates each variable with a
// value.
rcu_data_tag_t tag1;
rcu_data_tag_t tag2;

void
reader()
{
    // Enter the critical section.
    rcu_reader_enter();

    // Get the values corresponding to the latest version this reader observes.
    rcu_version_t latest = rcu_get_latest_version();
    rcu_value_t value1 = rcu_read(tag1, latest);
    rcu_value_t value2 = rcu_read(tag1, latest);

    // Use value1 and value2

    // Let any writer know that this reader no longer requires this version of
    // the values.
    rcu_release_version(latest);

    // Exit the critical section.
    rcu_reader_exit();
}

void
writer()
{
    // Enter the critical section.
    rcu_writer_enter();

    // Create a new version identifier.
    rcu_version_t old_version = rcu_get_latest_version();
    rcu_version_t new_version = rcu_create_version();

    // Provide new values for this version.
    rcu_value_t new_value1 = SOME_VALUE_1;
    rcu_value_t new_value2 = SOME_VALUE_2;
    rcu_write(tag1, new_value1, new_version);
    rcu_write(tag2, new_value2, new_version);

    // Publish the new version.
    rcu_set_latest_version(new_version);

    // Optional:
    // Wait for readers to stop using the old version and then discard it.
    rcu_wait_readers(old_version);
    rcu_remove_version(old_version);

    // Exit the critical section.
    rcu_writer_exit();
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-23 12:46               ` Elad Lahav
  2021-12-23 12:59                 ` Akira Yokosawa
@ 2021-12-23 17:26                 ` Paul E. McKenney
  1 sibling, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2021-12-23 17:26 UTC (permalink / raw)
  To: Elad Lahav; +Cc: Akira Yokosawa, perfbook

On Thu, Dec 23, 2021 at 07:46:02AM -0500, Elad Lahav wrote:
> Yes, Akira, that's where I was going. Nevertheless, my question is a
> bit more profound than that: is the asymmetry a result of the
> implementation of RCU, as presented here (and the way it is used in
> the Linux kernel), or is it inherent to RCU as a data access pattern?
> I believe it is the former, and have an idea on how to phrase this
> notion more formally. I will write something up and you can then
> decide whether:
> 
> 1. I'm wrong
> 2. I'm right, but it's not worth mentioning
> 3. I'm right, and the point should be added to the book
> 4. None of the above ;-)

5?  ;-)

The quick answer is that the asymmetry is due to properties of
weakly ordered hardware, for example, ARM, PowerPC, and Itanium.
This hardware provides limited but quite useful ordering for address
and data dependencies, as in if you load a pointer and then dereference
that pointer, the hardware will maintain limited ordering between the
load and the dereference.

Sadly, the C/C++ standards make no such guarantees.  But careful coding
restrictions make things work with all actual compilers, as can be seen
from Documentation/RCU/rcu_dereference.txt.

If you hate asymmetry and don't mind a bit of performance loss on
weakly ordered hardware, you could use an acquire load instead of
rcu_dereference().  You will get yelled at within the context of the
Linux kernel (ARM folks don't like things that slow them down), but from
a functional viewpoint, it would work.

I will take a look at your proposed text later today, Pacific Time.
Now to go on a hike to Silver Falls State Park with my son.  ;-)

							Thanx, Paul

> --Elad
> 
> On Thu, 23 Dec 2021 at 07:22, Akira Yokosawa <akiyks@gmail.com> wrote:
> >
> > Hi,
> >
> > It sounds to me like Paul is missing the main point of Elad's question.
> > So I'm chiming in again.  See inline comment below Elad's question.
> >
> > On Wed, 22 Dec 2021 18:29:57 -0800, Paul E. McKenney wrote:
> > > Strictly speaking, all that the rcu_read_lock()/rcu_read_unlock() does
> > > is to ensure that any RCU-protected linked data objects referenced by
> > > the corresponding critical section remain in existence for the full
> > > duration of that critical section.
> > >
> > > And even that overstates things a bit, as this describes not the
> > > underlying RCU API itself, but rather some of that API's use cases.
> > > (Though to be fair, these are the most popular of RCU's use cases.)
> > > So even more strictly speaking, all the rcu_read_lock()/rcu_read_unlock()
> > > does is to make any synchronize_rcu() or call_rcu() invocations starting
> > > after a given rcu_read_lock() wait (synchronously or asynchronously,
> > > respectively) until the corresponding rcu_read_unlock() is reached.
> > >
> > > Returning back to the first paragraph, additional protections can be
> > > arranged, depending on the RCU use case.  For example, if the non-pointer
> > > fields of objects added to an RCU-protected linked data structure remain
> > > unchanged while that object is accessible to readers, then the protection
> > > includes not just existence, but also value.  And there are other use
> > > cases where those values might change while accessible to readers.
> > >
> > > So the most accurate answer to your question is "That is a design
> > > choice, based on the RCU use case in question."
> > >
> > > My guess is that you are thinking in terms of designs where the
> > > non-pointer fields of an object are constant while that object is
> > > accessible to readers.  Is my guess correct?
> > >
> > >                                                       Thanx, Paul
> > >
> > > On Wed, Dec 22, 2021 at 04:35:24PM -0500, Elad Lahav wrote:
> > >> I will try to come up with something, but first I wanted to get your
> > >> opinion as to whether this design pattern, where you need to
> > >> dereference a pointer to a data structure containing all of the data
> > >> considered to be protected by the critical section, is inherent to
> > >> RCU, or is it just an idiosyncrasy of the implementation used in the
> > >> Linux kernel?
> > >>
> > >> --Elad
> >
> > I guess Elad is asking more of the design choice of rcu_assign_pointer()
> > and rcu_dereference().
> >
> > A naive reader might easily miss the asymmetry of rcu_assign_pointer()
> > with store-release and rcu_defeference() *without* load-acquire.
> >
> > IIUC, this asymmetry comes from the use cases where RCU performs best,
> > that is read-mostly.  Therefore, rcu_dereference() need to be
> > light-weight as possible and doesn't imply load-acquire.
> > Thankfully quite a lot of use cases can be covered by the pattern
> > where address-dependency is sufficient for read-side.
> >
> > So one approach for Elad's concern would be to add a Quick Quiz on
> > this asymmetry, I suppose.
> >
> > Elad, am I guessing right?
> >
> > Sidenote: RCU and memory accesses are orthogonal.  You can use whatever
> > memory access primitives for your needed memory ordering. (Which would
> > be quite likely a tricky thing to do for a naive reader, though.)
> >
> >         Thanks, Akira
> >
> > >>
> > >> On Wed, 22 Dec 2021 at 13:19, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >>>
> > >>> On Wed, Dec 22, 2021 at 11:35:02PM +0900, Akira Yokosawa wrote:
> > >>>> Hi,
> > >>>>
> > >>>> On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
> > >>>>> Hi Akira,
> > >>>>>
> > >>>>>> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
> > >>>>>> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
> > >>>>>
> > >>>>> I don't believe it is. I think you can infer that from reading sections 9.5
> > >>>>> and chapter 15, but it is never made explicit.
> > >>>>
> > >>>> I agree it is not explicit.
> > >>>>
> > >>>>>
> > >>>>> Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
> > >>>>> pointer, but it doesn't go very deep into how that's implemented. You would
> > >>>>> think that the writer's use of store-release semantics would necessitate
> > >>>>> the reader's use of load-acquire semantics, but the discussion here, the
> > >>>>> previous examples, and a quick inspection of how rcu_dereference() is
> > >>>>> implemented, suggest that a READ_ONCE() is sufficient (or some less
> > >>>>> Linux-kernel-specific equivalent).
> > >>>>>
> > >>>>> If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
> > >>>>> the necessary connection), I could write something like:
> > >>>>>
> > >>>>>     struct foo_s {
> > >>>>>         int a;
> > >>>>>     } foo;
> > >>>>>     int b;
> > >>>>>     struct foo_s *g_foop = &foo;
> > >>>>>
> > >>>>>     // Reader
> > >>>>>     rcu_read_lock();
> > >>>>>     struct foo *l_foop = rcu_dereference(g_foop);
> > >>>>>     use(l_foop->a);
> > >>>>>     use(b);
> > >>>>>     rcu_read_unlock();
> > >>>>>
> > >>>>>     // Writer
> > >>>>>     struct foo *l_foop = malloc(sizeof(struct foo));
> > >>>>>     l_foop->a = 1;
> > >>>>>     b = 2;
> > >>>>>     // Release semantics ensure previous stores are observed
> > >>>>>     rcu_assign_pointer(g_foop, l_foop);
> > >>>>>
> > >>>>> But since b has no address dependency to g_foop and since neither
> > >>>>> rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
> > >>>>> the reader may not observe the new value of b.
> > >>>>
> > >>>> No, it may not.
> > >>>> So what you want is the mention of "address dependency" somewhere in
> > >>>> Section 9.5.2.1?
> > >>>>
> > >>>>>
> > >>>>> Again, I may be missing something, but this seems to be a major point
> > >>>>> that needs to be explained and emphasized.
> > >>>>
> > >>>> I guess Paul is intentionally avoiding discussions on memory ordering
> > >>>> here in Section 9.5.
> > >>>>
> > >>>> Such a discussion would easily frighten naive readers...
> > >>>>
> > >>>> Anyway, I guess Paul would have some good compromise to satisfy you.
> > >>>
> > >>> Actually, I am going to ask Elad to propose the location and wording of an
> > >>> addition to Section 9.5 covering this, so that we can discuss and refine
> > >>> it.  This addition might be a new paragraph, a footnote, a quick quiz,
> > >>> a citation, or what have you.  And the refining might switch back and
> > >>> forth among these options a few times.  But we do have to start somewhere.
> > >>>
> > >>> I am thinking in terms of this going in after the release that I was
> > >>> naively planning to do yesterday.  (The objective universe had other
> > >>> ideas.)
> > >>>
> > >>>>>> BTW, I couldn't figure out what you meant by "the Spanish
> > >>>>>> Acquisition"...
> > >>>>> It's a reference to an old Monty Python skit. Apologies for the silly pun...
> > >>>>
> > >>>> Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)
> > >>>
> > >>> And here I was hoping that Elad was purchasing a house in Barcelona
> > >>> or some such.  ;-)
> > >>>
> > >>> (Sorry, couldn't resist!)
> > >>>
> > >>>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2021-12-23 14:26                   ` Elad Lahav
@ 2022-01-04  0:05                     ` Paul E. McKenney
  2022-01-04  0:48                       ` Elad Lahav
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2022-01-04  0:05 UTC (permalink / raw)
  To: Elad Lahav; +Cc: Akira Yokosawa, perfbook

First, apologies for the delay.  And happy new year!

On Thu, Dec 23, 2021 at 09:26:23AM -0500, Elad Lahav wrote:
> OK, here goes.
> The attached code corresponds to my understanding of the generic form
> of RCU. Note the complete absence of any pointers or a single data
> structure to hold the variables protected by the RCU critical section.
> Section 9.5 deals (for the most part) with one possible
> implementation, albeit one that seems to be the easiest and probably
> most efficient: the version identifier is an address of a data
> structure allocated on a heap and the tags correspond to fields within
> this data structure. Nevertheless, it is possible to come up with a
> different implementation, e.g., a database-like set of records
> consisting of <version, tag, value> tuples.

You lost me here.  How is the address of a data structure different
than a pointer?  Conceptually, from a high-level-language viewpoint,
sure, I can see it (not that I always like it, pointer zap being a prime
offender), but at the machine level I do not.

All that aside, there have been tuple-like data structures built on RCU,
for example:

@inproceedings{Mao:2012:CCF:2168836.2168855,
 author = {Mao, Yandong and Kohler, Eddie and Morris, Robert Tappan},
 title = {Cache Craftiness for Fast Multicore Key-value Storage},
 booktitle = {Proceedings of the 7th ACM European Conference on Computer Systems},
 series = {EuroSys '12},
 year = {2012},
 isbn = {978-1-4503-1223-3},
 location = {Bern, Switzerland},
 pages = {183--196},
 numpages = {14},
 url = {http://doi.acm.org/10.1145/2168836.2168855},
 doi = {10.1145/2168836.2168855},
 acmid = {2168855},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {in-memory, key-value, multicore, persistent},
}

> It is clear that rcu_set_latest_version() needs to impose memory order
> to prevent the new version being observed by a reader before the tags
> have been given the new values. It should also be clear that the read
> side requires ordering to prevent the values being read before the
> latest version is established. That said, even in my generic
> implementation this order is already present because of the data
> dependency between the version number and the rcu_read() calls. Unless
> there is a way to get rid of that dependency without losing the
> semantics of RCU it appears as though there really is no need for
> acquire semantics in general.

I agree that there is no need for acquire semantics in the common
case.  But care really is required.

First, compiler optimizations can sometimes break the dependency,
first by value-substitution optimizations:

	struct foo *gfp; // Assume non-NULL after initialization
	struct foo default_foo;

	int do_a_foo(struct foo *fp)
	{
		return munge_it(fp->a);
	}

The compiler (presumably in conjunction with feedback from a profiled
run) might convert this to:

	int do_a_foo(struct foo *fp)
	{
		if (fp == &default_foo)
			return munge_it(default_foo.a);
		else
			return munge_it(fp->a);
	}

This would break the dependency because control dependencies do not
order loads.  However, I would not expect compilers to do this in the
absence of feedback-directed optimization.

Second, and more concerning, things can get even more dicey when one
is trying to carry dependencies through integers:

	struct foo foo_array[N_FOOS];

	int do_a_foo(int i)
	{
		return munge_it(fp[i].a);
	}

This actually works well, at least until someone builds with N_FOOS=1,
which causes foo_array[] to reference a single element.  At that point,
the compiler is within its rights to transform to this:

	int do_a_foo(int i)
	{
		return munge_it(fp[0].a);
	}

This again breaks the dependency by substituting a constant.  (Note that
any non-zero index invokes undefined behavior, legalizing the otherwise
inexplicable substitution of the constant zero.)

In the Linux kernel, we deal with this sort of thing by knowing our
compilers (and their command-line flags) and through use of coding rules:
https://www.kernel.org/doc/Documentation/RCU/rcu_dereference.rst

> Thoughts?

And a question below on your sample code.

> --Elad
> 
> On Thu, 23 Dec 2021 at 07:59, Akira Yokosawa <akiyks@gmail.com> wrote:
> >
> > On Thu, 23 Dec 2021 07:46:02 -0500, Elad Lahav wrote:
> > > Yes, Akira, that's where I was going. Nevertheless, my question is a
> > > bit more profound than that: is the asymmetry a result of the
> > > implementation of RCU, as presented here (and the way it is used in
> > > the Linux kernel), or is it inherent to RCU as a data access pattern?
> > > I believe it is the former, and have an idea on how to phrase this
> > > notion more formally. I will write something up and you can then
> > > decide whether:
> > >
> > > 1. I'm wrong
> > > 2. I'm right, but it's not worth mentioning
> > > 3. I'm right, and the point should be added to the book
> > > 4. None of the above ;-)
> >
> > Nice!
> > Looking forward to seeing your write-up.
> >
> >         Thanks, Akira
> >
> > >
> > > --Elad
> > >
> > > On Thu, 23 Dec 2021 at 07:22, Akira Yokosawa <akiyks@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> It sounds to me like Paul is missing the main point of Elad's question.
> > >> So I'm chiming in again.  See inline comment below Elad's question.
> > >>
> > >> On Wed, 22 Dec 2021 18:29:57 -0800, Paul E. McKenney wrote:
> > >>> Strictly speaking, all that the rcu_read_lock()/rcu_read_unlock() does
> > >>> is to ensure that any RCU-protected linked data objects referenced by
> > >>> the corresponding critical section remain in existence for the full
> > >>> duration of that critical section.
> > >>>
> > >>> And even that overstates things a bit, as this describes not the
> > >>> underlying RCU API itself, but rather some of that API's use cases.
> > >>> (Though to be fair, these are the most popular of RCU's use cases.)
> > >>> So even more strictly speaking, all the rcu_read_lock()/rcu_read_unlock()
> > >>> does is to make any synchronize_rcu() or call_rcu() invocations starting
> > >>> after a given rcu_read_lock() wait (synchronously or asynchronously,
> > >>> respectively) until the corresponding rcu_read_unlock() is reached.
> > >>>
> > >>> Returning back to the first paragraph, additional protections can be
> > >>> arranged, depending on the RCU use case.  For example, if the non-pointer
> > >>> fields of objects added to an RCU-protected linked data structure remain
> > >>> unchanged while that object is accessible to readers, then the protection
> > >>> includes not just existence, but also value.  And there are other use
> > >>> cases where those values might change while accessible to readers.
> > >>>
> > >>> So the most accurate answer to your question is "That is a design
> > >>> choice, based on the RCU use case in question."
> > >>>
> > >>> My guess is that you are thinking in terms of designs where the
> > >>> non-pointer fields of an object are constant while that object is
> > >>> accessible to readers.  Is my guess correct?
> > >>>
> > >>>                                                       Thanx, Paul
> > >>>
> > >>> On Wed, Dec 22, 2021 at 04:35:24PM -0500, Elad Lahav wrote:
> > >>>> I will try to come up with something, but first I wanted to get your
> > >>>> opinion as to whether this design pattern, where you need to
> > >>>> dereference a pointer to a data structure containing all of the data
> > >>>> considered to be protected by the critical section, is inherent to
> > >>>> RCU, or is it just an idiosyncrasy of the implementation used in the
> > >>>> Linux kernel?
> > >>>>
> > >>>> --Elad
> > >>
> > >> I guess Elad is asking more of the design choice of rcu_assign_pointer()
> > >> and rcu_dereference().
> > >>
> > >> A naive reader might easily miss the asymmetry of rcu_assign_pointer()
> > >> with store-release and rcu_defeference() *without* load-acquire.
> > >>
> > >> IIUC, this asymmetry comes from the use cases where RCU performs best,
> > >> that is read-mostly.  Therefore, rcu_dereference() need to be
> > >> light-weight as possible and doesn't imply load-acquire.
> > >> Thankfully quite a lot of use cases can be covered by the pattern
> > >> where address-dependency is sufficient for read-side.
> > >>
> > >> So one approach for Elad's concern would be to add a Quick Quiz on
> > >> this asymmetry, I suppose.
> > >>
> > >> Elad, am I guessing right?
> > >>
> > >> Sidenote: RCU and memory accesses are orthogonal.  You can use whatever
> > >> memory access primitives for your needed memory ordering. (Which would
> > >> be quite likely a tricky thing to do for a naive reader, though.)
> > >>
> > >>         Thanks, Akira
> > >>
> > >>>>
> > >>>> On Wed, 22 Dec 2021 at 13:19, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >>>>>
> > >>>>> On Wed, Dec 22, 2021 at 11:35:02PM +0900, Akira Yokosawa wrote:
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> On Wed, 22 Dec 2021 07:15:54 -0500, Elad Lahav wrote:
> > >>>>>>> Hi Akira,
> > >>>>>>>
> > >>>>>>>> I think this is mostly covered in Section 9.5.2.1 "Publish-Subscribe
> > >>>>>>>> Mechanism" and Figure 9.10 "Publication/Subscription Constraints".
> > >>>>>>>
> > >>>>>>> I don't believe it is. I think you can infer that from reading sections 9.5
> > >>>>>>> and chapter 15, but it is never made explicit.
> > >>>>>>
> > >>>>>> I agree it is not explicit.
> > >>>>>>
> > >>>>>>>
> > >>>>>>> Sub-section 9.5.2.1 talks about the use of rcu_dereference() to obtain a
> > >>>>>>> pointer, but it doesn't go very deep into how that's implemented. You would
> > >>>>>>> think that the writer's use of store-release semantics would necessitate
> > >>>>>>> the reader's use of load-acquire semantics, but the discussion here, the
> > >>>>>>> previous examples, and a quick inspection of how rcu_dereference() is
> > >>>>>>> implemented, suggest that a READ_ONCE() is sufficient (or some less
> > >>>>>>> Linux-kernel-specific equivalent).
> > >>>>>>>
> > >>>>>>> If I am a naive reader (and a lazy one, who haven't read chapter 15 and made
> > >>>>>>> the necessary connection), I could write something like:
> > >>>>>>>
> > >>>>>>>     struct foo_s {
> > >>>>>>>         int a;
> > >>>>>>>     } foo;
> > >>>>>>>     int b;
> > >>>>>>>     struct foo_s *g_foop = &foo;
> > >>>>>>>
> > >>>>>>>     // Reader
> > >>>>>>>     rcu_read_lock();
> > >>>>>>>     struct foo *l_foop = rcu_dereference(g_foop);
> > >>>>>>>     use(l_foop->a);
> > >>>>>>>     use(b);
> > >>>>>>>     rcu_read_unlock();
> > >>>>>>>
> > >>>>>>>     // Writer
> > >>>>>>>     struct foo *l_foop = malloc(sizeof(struct foo));
> > >>>>>>>     l_foop->a = 1;
> > >>>>>>>     b = 2;
> > >>>>>>>     // Release semantics ensure previous stores are observed
> > >>>>>>>     rcu_assign_pointer(g_foop, l_foop);
> > >>>>>>>
> > >>>>>>> But since b has no address dependency to g_foop and since neither
> > >>>>>>> rcu_read_lock() nor rcu_dereference() impose acquire semantics, then
> > >>>>>>> the reader may not observe the new value of b.
> > >>>>>>
> > >>>>>> No, it may not.
> > >>>>>> So what you want is the mention of "address dependency" somewhere in
> > >>>>>> Section 9.5.2.1?
> > >>>>>>
> > >>>>>>>
> > >>>>>>> Again, I may be missing something, but this seems to be a major point
> > >>>>>>> that needs to be explained and emphasized.
> > >>>>>>
> > >>>>>> I guess Paul is intentionally avoiding discussions on memory ordering
> > >>>>>> here in Section 9.5.
> > >>>>>>
> > >>>>>> Such a discussion would easily frighten naive readers...
> > >>>>>>
> > >>>>>> Anyway, I guess Paul would have some good compromise to satisfy you.
> > >>>>>
> > >>>>> Actually, I am going to ask Elad to propose the location and wording of an
> > >>>>> addition to Section 9.5 covering this, so that we can discuss and refine
> > >>>>> it.  This addition might be a new paragraph, a footnote, a quick quiz,
> > >>>>> a citation, or what have you.  And the refining might switch back and
> > >>>>> forth among these options a few times.  But we do have to start somewhere.
> > >>>>>
> > >>>>> I am thinking in terms of this going in after the release that I was
> > >>>>> naively planning to do yesterday.  (The objective universe had other
> > >>>>> ideas.)
> > >>>>>
> > >>>>>>>> BTW, I couldn't figure out what you meant by "the Spanish
> > >>>>>>>> Acquisition"...
> > >>>>>>> It's a reference to an old Monty Python skit. Apologies for the silly pun...
> > >>>>>>
> > >>>>>> Ah, now I guess I see the pun of "acquire" and "acquisition".  ;-)
> > >>>>>
> > >>>>> And here I was hoping that Elad was purchasing a house in Barcelona
> > >>>>> or some such.  ;-)
> > >>>>>
> > >>>>> (Sorry, couldn't resist!)
> > >>>>>
> > >>>>>                                                         Thanx, Paul

> #include <rcu.h>
> 
> // These tags represent variables protected by the RCU critical section.
> // The variables are versioned: every version associates each variable with a
> // value.
> rcu_data_tag_t tag1;
> rcu_data_tag_t tag2;
> 
> void
> reader()
> {
>     // Enter the critical section.
>     rcu_reader_enter();
> 
>     // Get the values corresponding to the latest version this reader observes.
>     rcu_version_t latest = rcu_get_latest_version();
>     rcu_value_t value1 = rcu_read(tag1, latest);
>     rcu_value_t value2 = rcu_read(tag1, latest);
> 
>     // Use value1 and value2
> 
>     // Let any writer know that this reader no longer requires this version of
>     // the values.
>     rcu_release_version(latest);

Why is this needed?  What is provided by this that is not covered by
rcu_reader_exit(), AKA rcu_read_unlock()?

If rcu_get_latest_version() is allocating memory or some such, then
it can be passed to call_rcu() just after allocation to cause it to be
freed when it is safe to do so.  This approach relieves the developer
from the need to figure out where to put all of the required invocations
of rcu_release_version().  And in this simple example, it is easy to
see where to put it, but in large examples spanning functions (to say
nothing of translation units), it is not always so easy.

Or are you instead thinking in terms of reference-counted structures?

							Thanx, Paul

>     // Exit the critical section.
>     rcu_reader_exit();
> }
> 
> void
> writer()
> {
>     // Enter the critical section.
>     rcu_writer_enter();
> 
>     // Create a new version identifier.
>     rcu_version_t old_version = rcu_get_latest_version();
>     rcu_version_t new_version = rcu_create_version();
> 
>     // Provide new values for this version.
>     rcu_value_t new_value1 = SOME_VALUE_1;
>     rcu_value_t new_value2 = SOME_VALUE_2;
>     rcu_write(tag1, new_value1, new_version);
>     rcu_write(tag2, new_value2, new_version);
> 
>     // Publish the new version.
>     rcu_set_latest_version(new_version);
> 
>     // Optional:
>     // Wait for readers to stop using the old version and then discard it.
>     rcu_wait_readers(old_version);
>     rcu_remove_version(old_version);
> 
>     // Exit the critical section.
>     rcu_writer_exit();
> }


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2022-01-04  0:05                     ` Paul E. McKenney
@ 2022-01-04  0:48                       ` Elad Lahav
  2022-01-04 11:47                         ` Elad Lahav
  0 siblings, 1 reply; 16+ messages in thread
From: Elad Lahav @ 2022-01-04  0:48 UTC (permalink / raw)
  To: paulmck; +Cc: Akira Yokosawa, perfbook

Hi Paul,

On 2022-01-03 7:05 p.m., Paul E. McKenney wrote:
> First, apologies for the delay.  And happy new year!
No problem! I did not expect you to spend time on this during the 
holidays. Hope you had a good hike on Christmas eve. I would have liked 
to go hiking as well, but the combination of -20C and a raging pandemic 
restrict me to the treadmill.

Your response makes me think that I need to explain better what I am 
trying to do here. Reading section 9.5 it is not always clear to me what 
constitutes a description of RCU and what a description of "RCU in the 
Linux kernel". I decided to try and come up with a more abstract 
description using the code I attached. It may be just a straw man, but 
at least it gives us something to point at while discussing (which I 
guess is a redundant definition of a "straw man"...).

> You lost me here.  How is the address of a data structure different
> than a pointer?  Conceptually, from a high-level-language viewpoint,
> sure, I can see it (not that I always like it, pointer zap being a prime
> offender), but at the machine level I do not.
I'm just trying to correlate the "tag" nomenclature with the way the 
Linux kernel RCU implementation works. I believe that this 
implementation fits within the abstract code I wrote, relying on the 
heap to provide versioned data by doing its job, i.e., never return a 
version (address) that has not been released (freed). The tag is the 
address, the pointer is just a variable that holds that tag.

> I agree that there is no need for acquire semantics in the common
> case.  But care really is required.
> 
> First, compiler optimizations can sometimes break the dependency,
> first by value-substitution optimizations:
> 
> 	struct foo *gfp; // Assume non-NULL after initialization
> 	struct foo default_foo;
> 
> 	int do_a_foo(struct foo *fp)
> 	{
> 		return munge_it(fp->a);
> 	}
> 
> The compiler (presumably in conjunction with feedback from a profiled
> run) might convert this to:
> 
> 	int do_a_foo(struct foo *fp)
> 	{
> 		if (fp == &default_foo)
> 			return munge_it(default_foo.a);
> 		else
> 			return munge_it(fp->a);
> 	}
> 
> This would break the dependency because control dependencies do not
> order loads.  However, I would not expect compilers to do this in the
> absence of feedback-directed optimization.
> 
> Second, and more concerning, things can get even more dicey when one
> is trying to carry dependencies through integers:
> 
> 	struct foo foo_array[N_FOOS];
> 
> 	int do_a_foo(int i)
> 	{
> 		return munge_it(fp[i].a);
> 	}
> 
> This actually works well, at least until someone builds with N_FOOS=1,
> which causes foo_array[] to reference a single element.  At that point,
> the compiler is within its rights to transform to this:
> 
> 	int do_a_foo(int i)
> 	{
> 		return munge_it(fp[0].a);
> 	}
> 
> This again breaks the dependency by substituting a constant.  (Note that
> any non-zero index invokes undefined behavior, legalizing the otherwise
> inexplicable substitution of the constant zero.)

Excellent, that's what I was looking for! If I understand you correctly, 
in principle acquire semantics *are* required for the reader. It just so 
happens that most implementations can get away without explicit acquire 
semantics due to data or address dependencies, but these need to be 
justified.

> Why is this needed?  What is provided by this that is not covered by
> rcu_reader_exit(), AKA rcu_read_unlock()?

Just for verbosity's sake. I first wrote it as 
"rcu_reader_exit(latest)", but I felt it wasn't clear what are the 
semantics of such a call. I guess something like 
"rcu_reader_release_and_exit(latest)" could work.

--Elad

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2022-01-04  0:48                       ` Elad Lahav
@ 2022-01-04 11:47                         ` Elad Lahav
  2022-01-05  2:04                           ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Elad Lahav @ 2022-01-04 11:47 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Akira Yokosawa, perfbook

> The tag is the address, the pointer is just a variable that holds that tag.
Sorry, I re-read what I wrote and it's not accurate. The tag *is* the
pointer variable, the address it holds is the version.

--Elad

On Mon, 3 Jan 2022 at 19:49, Elad Lahav <e2lahav@gmail.com> wrote:
>
> Hi Paul,
>
> On 2022-01-03 7:05 p.m., Paul E. McKenney wrote:
> > First, apologies for the delay.  And happy new year!
> No problem! I did not expect you to spend time on this during the
> holidays. Hope you had a good hike on Christmas eve. I would have liked
> to go hiking as well, but the combination of -20C and a raging pandemic
> restrict me to the treadmill.
>
> Your response makes me think that I need to explain better what I am
> trying to do here. Reading section 9.5 it is not always clear to me what
> constitutes a description of RCU and what a description of "RCU in the
> Linux kernel". I decided to try and come up with a more abstract
> description using the code I attached. It may be just a straw man, but
> at least it gives us something to point at while discussing (which I
> guess is a redundant definition of a "straw man"...).
>
> > You lost me here.  How is the address of a data structure different
> > than a pointer?  Conceptually, from a high-level-language viewpoint,
> > sure, I can see it (not that I always like it, pointer zap being a prime
> > offender), but at the machine level I do not.
> I'm just trying to correlate the "tag" nomenclature with the way the
> Linux kernel RCU implementation works. I believe that this
> implementation fits within the abstract code I wrote, relying on the
> heap to provide versioned data by doing its job, i.e., never return a
> version (address) that has not been released (freed). The tag is the
> address, the pointer is just a variable that holds that tag.
>
> > I agree that there is no need for acquire semantics in the common
> > case.  But care really is required.
> >
> > First, compiler optimizations can sometimes break the dependency,
> > first by value-substitution optimizations:
> >
> >       struct foo *gfp; // Assume non-NULL after initialization
> >       struct foo default_foo;
> >
> >       int do_a_foo(struct foo *fp)
> >       {
> >               return munge_it(fp->a);
> >       }
> >
> > The compiler (presumably in conjunction with feedback from a profiled
> > run) might convert this to:
> >
> >       int do_a_foo(struct foo *fp)
> >       {
> >               if (fp == &default_foo)
> >                       return munge_it(default_foo.a);
> >               else
> >                       return munge_it(fp->a);
> >       }
> >
> > This would break the dependency because control dependencies do not
> > order loads.  However, I would not expect compilers to do this in the
> > absence of feedback-directed optimization.
> >
> > Second, and more concerning, things can get even more dicey when one
> > is trying to carry dependencies through integers:
> >
> >       struct foo foo_array[N_FOOS];
> >
> >       int do_a_foo(int i)
> >       {
> >               return munge_it(fp[i].a);
> >       }
> >
> > This actually works well, at least until someone builds with N_FOOS=1,
> > which causes foo_array[] to reference a single element.  At that point,
> > the compiler is within its rights to transform to this:
> >
> >       int do_a_foo(int i)
> >       {
> >               return munge_it(fp[0].a);
> >       }
> >
> > This again breaks the dependency by substituting a constant.  (Note that
> > any non-zero index invokes undefined behavior, legalizing the otherwise
> > inexplicable substitution of the constant zero.)
>
> Excellent, that's what I was looking for! If I understand you correctly,
> in principle acquire semantics *are* required for the reader. It just so
> happens that most implementations can get away without explicit acquire
> semantics due to data or address dependencies, but these need to be
> justified.
>
> > Why is this needed?  What is provided by this that is not covered by
> > rcu_reader_exit(), AKA rcu_read_unlock()?
>
> Just for verbosity's sake. I first wrote it as
> "rcu_reader_exit(latest)", but I felt it wasn't clear what are the
> semantics of such a call. I guess something like
> "rcu_reader_release_and_exit(latest)" could work.
>
> --Elad

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2022-01-04 11:47                         ` Elad Lahav
@ 2022-01-05  2:04                           ` Paul E. McKenney
  2022-01-06 21:15                             ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2022-01-05  2:04 UTC (permalink / raw)
  To: Elad Lahav; +Cc: Akira Yokosawa, perfbook

On Tue, Jan 04, 2022 at 06:47:42AM -0500, Elad Lahav wrote:
> > The tag is the address, the pointer is just a variable that holds that tag.
> Sorry, I re-read what I wrote and it's not accurate. The tag *is* the
> pointer variable, the address it holds is the version.

Huh.

Oh, are you looking to cover the Linux-kernel SRCU API as well as the
various non-SRCU RCU APIs?  Where srcu_read_lock() returns a value that
must be passed to the corresponding srcu_read_unlock()?

> --Elad
> 
> On Mon, 3 Jan 2022 at 19:49, Elad Lahav <e2lahav@gmail.com> wrote:
> >
> > Hi Paul,
> >
> > On 2022-01-03 7:05 p.m., Paul E. McKenney wrote:
> > > First, apologies for the delay.  And happy new year!
> > No problem! I did not expect you to spend time on this during the
> > holidays. Hope you had a good hike on Christmas eve. I would have liked
> > to go hiking as well, but the combination of -20C and a raging pandemic
> > restrict me to the treadmill.

The hike was good!  We did the medium loop through Silver Falls State
Park near Salem, Oregon.  This got us to seven of the ten easily reached
waterfalls, and to three of the ones that you can walk behind, so that
you are looking through the waterfall down the canyon.  The vertical
drop of the waterfalls we visted ranges up to about 170 feet.

Yeah, I grew up near there, so I am no doubt prejudiced.  ;-)

> > Your response makes me think that I need to explain better what I am
> > trying to do here. Reading section 9.5 it is not always clear to me what
> > constitutes a description of RCU and what a description of "RCU in the
> > Linux kernel". I decided to try and come up with a more abstract
> > description using the code I attached. It may be just a straw man, but
> > at least it gives us something to point at while discussing (which I
> > guess is a redundant definition of a "straw man"...).

Hmmm...  Maybe I should add a section following 9.5.1.1 ("Minimal
Insertion and Deletion") describing the core API:

	rcu_read_lock(): Start an RCU read-side critical section.
	rcu_read_unlock(): End an RCU read-side critical section.
	rcu_dereference(): Safely load an RCU-protected pointer.
	synchronize_rcu(): Wait for all pre-existing RCU read-side
		critical sections to complete.
	call_rcu(): Invoke the specified function with the specified
		rcu_head pointer argument after all pre-existing
		RCU read-side critical sections complete.
	rcu_assign_pointer(): Safely update an RCU-protected pointer.
	rcu_dereference_protected(): Safely load an RCU-protected
		pointer within an update.  (Just debugging, as you
		can safely do a C-language load.  So I will probably
		omit this one.)

Does that help?

> > > You lost me here.  How is the address of a data structure different
> > > than a pointer?  Conceptually, from a high-level-language viewpoint,
> > > sure, I can see it (not that I always like it, pointer zap being a prime
> > > offender), but at the machine level I do not.
> > I'm just trying to correlate the "tag" nomenclature with the way the
> > Linux kernel RCU implementation works. I believe that this
> > implementation fits within the abstract code I wrote, relying on the
> > heap to provide versioned data by doing its job, i.e., never return a
> > version (address) that has not been released (freed). The tag is the
> > address, the pointer is just a variable that holds that tag.

I was thinking in terms of SRCU above, but SRCU's tag does not involve
any pointers.

So why is rcu_get_latest_version() returning anything, and what is
rcu_release_version() is doing?

> > > I agree that there is no need for acquire semantics in the common
> > > case.  But care really is required.
> > >
> > > First, compiler optimizations can sometimes break the dependency,
> > > first by value-substitution optimizations:
> > >
> > >       struct foo *gfp; // Assume non-NULL after initialization
> > >       struct foo default_foo;
> > >
> > >       int do_a_foo(struct foo *fp)
> > >       {
> > >               return munge_it(fp->a);
> > >       }
> > >
> > > The compiler (presumably in conjunction with feedback from a profiled
> > > run) might convert this to:
> > >
> > >       int do_a_foo(struct foo *fp)
> > >       {
> > >               if (fp == &default_foo)
> > >                       return munge_it(default_foo.a);
> > >               else
> > >                       return munge_it(fp->a);
> > >       }
> > >
> > > This would break the dependency because control dependencies do not
> > > order loads.  However, I would not expect compilers to do this in the
> > > absence of feedback-directed optimization.
> > >
> > > Second, and more concerning, things can get even more dicey when one
> > > is trying to carry dependencies through integers:
> > >
> > >       struct foo foo_array[N_FOOS];
> > >
> > >       int do_a_foo(int i)
> > >       {
> > >               return munge_it(fp[i].a);
> > >       }
> > >
> > > This actually works well, at least until someone builds with N_FOOS=1,
> > > which causes foo_array[] to reference a single element.  At that point,
> > > the compiler is within its rights to transform to this:
> > >
> > >       int do_a_foo(int i)
> > >       {
> > >               return munge_it(fp[0].a);
> > >       }
> > >
> > > This again breaks the dependency by substituting a constant.  (Note that
> > > any non-zero index invokes undefined behavior, legalizing the otherwise
> > > inexplicable substitution of the constant zero.)
> >
> > Excellent, that's what I was looking for! If I understand you correctly,
> > in principle acquire semantics *are* required for the reader. It just so
> > happens that most implementations can get away without explicit acquire
> > semantics due to data or address dependencies, but these need to be
> > justified.

There are more examples in Section 4 of https://wg21.link/p0098r1.
And the aforementioned rcu_dereference.rst contains more recent cautionary
tales.

> > > Why is this needed?  What is provided by this that is not covered by
> > > rcu_reader_exit(), AKA rcu_read_unlock()?
> >
> > Just for verbosity's sake. I first wrote it as
> > "rcu_reader_exit(latest)", but I felt it wasn't clear what are the
> > semantics of such a call. I guess something like
> > "rcu_reader_release_and_exit(latest)" could work.

OK, so rcu_reader_release_and_exit(latest) would replace both
rcu_release_version(latest) and rcu_reader_exit()?

If "latest" is a scalar quantity, that looks like SRCU.  If it is a
pointer, then what is its purpose?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Section 9.5: Nobody expects the Spanish Acquisition!
  2022-01-05  2:04                           ` Paul E. McKenney
@ 2022-01-06 21:15                             ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2022-01-06 21:15 UTC (permalink / raw)
  To: Elad Lahav; +Cc: Akira Yokosawa, perfbook

On Tue, Jan 04, 2022 at 06:04:14PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 04, 2022 at 06:47:42AM -0500, Elad Lahav wrote:

[ . . . ]

> > > Your response makes me think that I need to explain better what I am
> > > trying to do here. Reading section 9.5 it is not always clear to me what
> > > constitutes a description of RCU and what a description of "RCU in the
> > > Linux kernel". I decided to try and come up with a more abstract
> > > description using the code I attached. It may be just a straw man, but
> > > at least it gives us something to point at while discussing (which I
> > > guess is a redundant definition of a "straw man"...).
> 
> Hmmm...  Maybe I should add a section following 9.5.1.1 ("Minimal
> Insertion and Deletion") describing the core API:
> 
> 	rcu_read_lock(): Start an RCU read-side critical section.
> 	rcu_read_unlock(): End an RCU read-side critical section.
> 	rcu_dereference(): Safely load an RCU-protected pointer.
> 	synchronize_rcu(): Wait for all pre-existing RCU read-side
> 		critical sections to complete.
> 	call_rcu(): Invoke the specified function with the specified
> 		rcu_head pointer argument after all pre-existing
> 		RCU read-side critical sections complete.
> 	rcu_assign_pointer(): Safely update an RCU-protected pointer.
> 	rcu_dereference_protected(): Safely load an RCU-protected
> 		pointer within an update.  (Just debugging, as you
> 		can safely do a C-language load.  So I will probably
> 		omit this one.)
> 
> Does that help?

Hearing no objections, I added this section and propagated these APIs
into the following sections.  I hope that is helpful.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-01-06 21:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-21 15:20 Section 9.5: Nobody expects the Spanish Acquisition! Elad Lahav
2021-12-22  8:21 ` Akira Yokosawa
2021-12-22 12:15   ` Elad Lahav
2021-12-22 14:35     ` Akira Yokosawa
2021-12-22 18:19       ` Paul E. McKenney
     [not found]         ` <CAJbg=FUHcqEXE+MgXif0n=e09xYFoGFfmhjvY1=pnC6QCCRh2w@mail.gmail.com>
2021-12-23  2:29           ` Paul E. McKenney
2021-12-23 12:22             ` Akira Yokosawa
2021-12-23 12:46               ` Elad Lahav
2021-12-23 12:59                 ` Akira Yokosawa
2021-12-23 14:26                   ` Elad Lahav
2022-01-04  0:05                     ` Paul E. McKenney
2022-01-04  0:48                       ` Elad Lahav
2022-01-04 11:47                         ` Elad Lahav
2022-01-05  2:04                           ` Paul E. McKenney
2022-01-06 21:15                             ` Paul E. McKenney
2021-12-23 17:26                 ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.