All of lore.kernel.org
 help / color / mirror / Atom feed
* Comments on chapter 5
@ 2022-08-11  1:22 Elad Lahav
  2022-08-12  1:08 ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Elad Lahav @ 2022-08-11  1:22 UTC (permalink / raw)
  To: perfbook

Hi Paul,

I have a few comments on chapter 5. If you agree with any of them I
can try to provide a patch.

5.2.2: Not really related to counters. Is it really possible for the
compiler to use any location for temporary storage, even global
variables? That seems a bit excessive on the compiler's part. I have
definitely seen GCC reuse stack storage, but even then only when it
thought that the previous value there was out of scope (erroneously in
my case, as the function behaved like setjmp()).

5.2.3: Perhaps the code should be updated to use ISO C instead of GCC?
_Thread_local and inline are part of the language.

Listing 5.5: There is a mix of thread_id_t from CodeSamples and
pthread_create() from POSIX. One of those should be changed.

5.2.4: The wording suggests that the counting threads are not impacted
by the reader. But doesn't a cache line changing from Modified to
Shared incur a cost on the counter when it next comes to update the
value?

5.3.2: "references only per-thread variables, and should not incur any
cache misses" - except that the thread can migrate to other cores and
can thus incur cache misses.

5.3.3: I think that some clarification, or a simple example, is due
for explaining how a failure can occur when the count is nowhere near
the global max.

5.4.2: "it is worthwhile looking for algorithms with better read-side
performance" - should it not be "write-side performance"?

--Elad

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Comments on chapter 5
  2022-08-11  1:22 Comments on chapter 5 Elad Lahav
@ 2022-08-12  1:08 ` Paul E. McKenney
  2022-08-13 13:47   ` Elad Lahav
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2022-08-12  1:08 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook

On Wed, Aug 10, 2022 at 09:22:53PM -0400, Elad Lahav wrote:
> Hi Paul,
> 
> I have a few comments on chapter 5. If you agree with any of them I
> can try to provide a patch.

Thank you for looking this over!

> 5.2.2: Not really related to counters. Is it really possible for the
> compiler to use any location for temporary storage, even global
> variables? That seems a bit excessive on the compiler's part. I have
> definitely seen GCC reuse stack storage, but even then only when it
> thought that the previous value there was out of scope (erroneously in
> my case, as the function behaved like setjmp()).

According to the standard, a compiler could re-use any location for
temporary storage.  I have not seen compilers do this aside from (as
you say) stack storage, but it is quite possible that a new generation
of one-penny CPUs could force further aggression in this direction.

Better safe than sorry!

> 5.2.3: Perhaps the code should be updated to use ISO C instead of GCC?
> _Thread_local and inline are part of the language.

There will probably continue to be a need to use non-standard extensions,
but it seems sensible to at least try the standard facilities as
they appear.  Use of ISO standard inlining directives should be fine,
as should things like _Thread_local, at least as long as they are as
fast as are their gcc equivalents.

So please do give it a try!  Let's see what can be done to drag the
code into the 2020s.  ;-)

> Listing 5.5: There is a mix of thread_id_t from CodeSamples and
> pthread_create() from POSIX. One of those should be changed.

Good catch!  That thread_id_t should not be what is passed to
pthread_create().  My guess is that changing the type of the "tid"
local variable is the right way to go, but what are your thoughts?

> 5.2.4: The wording suggests that the counting threads are not impacted
> by the reader. But doesn't a cache line changing from Modified to
> Shared incur a cost on the counter when it next comes to update the
> value?

We are comparing the "eventually consistent" approach with the traditional
approach where each read scans all the counters, correct?  If so, then
the relative performance will depend on the frequency of readers in
the traditional approach compared to the sleep time for the eventual()
function, which in Listing 5.5 is a single millisecond.  It will also
depend on the rate at which each CPU does an update.  So it is possible
to compute breakevens.

Would you like to write up something to this effect, possibly expanding
on the last paragraph (starting with "This approach gives...")?

> 5.3.2: "references only per-thread variables, and should not incur any
> cache misses" - except that the thread can migrate to other cores and
> can thus incur cache misses.

Perhaps adding "in the common case" to that sentence?  Or did you have
something else in mind?

> 5.3.3: I think that some clarification, or a simple example, is due
> for explaining how a failure can occur when the count is nowhere near
> the global max.

That does indeed sound like a good addition.  What did you have in mind?

> 5.4.2: "it is worthwhile looking for algorithms with better read-side
> performance" - should it not be "write-side performance"?

Good catch!  You are absolutely right.

I look forward to seeing your patches.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Comments on chapter 5
  2022-08-12  1:08 ` Paul E. McKenney
@ 2022-08-13 13:47   ` Elad Lahav
  2022-08-13 23:15     ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Elad Lahav @ 2022-08-13 13:47 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook

On 2022-08-11 21:08, Paul E. McKenney wrote:
>> 5.3.2: "references only per-thread variables, and should not incur any
>> cache misses" - except that the thread can migrate to other cores and
>> can thus incur cache misses.
> 
> Perhaps adding "in the common case" to that sentence?  Or did you have
> something else in mind?

Pandora, meet can of worms (are mixed metaphors still frowned upon these 
days?).

Do you assume that in the common case threads tend to stick to the same 
core? If so, that's an interesting discussion, as I have changed my mind 
on the subject (or, rather, had the change forced on me) a few times.

--Elad

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Comments on chapter 5
  2022-08-13 13:47   ` Elad Lahav
@ 2022-08-13 23:15     ` Paul E. McKenney
  0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2022-08-13 23:15 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook

On Sat, Aug 13, 2022 at 09:47:44AM -0400, Elad Lahav wrote:
> On 2022-08-11 21:08, Paul E. McKenney wrote:
> > > 5.3.2: "references only per-thread variables, and should not incur any
> > > cache misses" - except that the thread can migrate to other cores and
> > > can thus incur cache misses.
> > 
> > Perhaps adding "in the common case" to that sentence?  Or did you have
> > something else in mind?
> 
> Pandora, meet can of worms (are mixed metaphors still frowned upon these
> days?).
> 
> Do you assume that in the common case threads tend to stick to the same
> core? If so, that's an interesting discussion, as I have changed my mind on
> the subject (or, rather, had the change forced on me) a few times.

In the immortal words of practitioners of engineering since recorded
history began, "It depends."

But you can usually characterize some number of preemptions per unit
time for a given workload, and my guess is that this is usually well
in excess of a microsecond, which would mean that short pieces of code
should not be preempted frequently.  (Famous last words...)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-13 23:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-11  1:22 Comments on chapter 5 Elad Lahav
2022-08-12  1:08 ` Paul E. McKenney
2022-08-13 13:47   ` Elad Lahav
2022-08-13 23:15     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.