All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about Table E.1
@ 2023-02-08  3:07 Leonardo Brás
  2023-02-08  3:41 ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Leonardo Brás @ 2023-02-08  3:07 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook

Hello Paul,

I have been reading the book, until I stumbled on Quick Quiz 3.7,
Table E.1: Performance of Synchronization Mechanisms
on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System

<Copying from source, since the PDF is a little tricky>

The first part looks like:

        Clock period            &           0.4 &           1.0 \\
        Same-CPU CAS            &          12.2 &          33.8 \\
        Same-CPU lock           &          25.6 &          71.2 \\
        Blind CAS               &          12.9 &          35.8 \\
        CAS                     &           7.0 &          19.4 \\
 
In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? 

(For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like
in Table 3.1, but that would not make sense: This "CAS" is faster than the
previous "Same-CPU CAS". )

Thanks for reading,
Leo




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about Table E.1
  2023-02-08  3:07 Question about Table E.1 Leonardo Brás
@ 2023-02-08  3:41 ` Paul E. McKenney
  2023-02-08  5:33   ` Leonardo Brás
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Paul E. McKenney @ 2023-02-08  3:41 UTC (permalink / raw)
  To: Leonardo Brás; +Cc: perfbook

On Wed, Feb 08, 2023 at 12:07:20AM -0300, Leonardo Brás wrote:
> Hello Paul,
> 
> I have been reading the book, until I stumbled on Quick Quiz 3.7,
> Table E.1: Performance of Synchronization Mechanisms
> on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System
> 
> <Copying from source, since the PDF is a little tricky>
> 
> The first part looks like:
> 
>         Clock period            &           0.4 &           1.0 \\
>         Same-CPU CAS            &          12.2 &          33.8 \\
>         Same-CPU lock           &          25.6 &          71.2 \\
>         Blind CAS               &          12.9 &          35.8 \\
>         CAS                     &           7.0 &          19.4 \\
>  
> In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? 
> 
> (For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like
> in Table 3.1, but that would not make sense: This "CAS" is faster than the
> previous "Same-CPU CAS". )

I was surprised myself, but those measurements are quite real.  My best
guess is that the two threads in the core are able to overlap their
accesses, while the single CPU must do everything sequentially.

Strange, but whatever the reason, true!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about Table E.1
  2023-02-08  3:41 ` Paul E. McKenney
@ 2023-02-08  5:33   ` Leonardo Brás
  2023-02-08  5:50   ` Leonardo Brás
  2023-02-08  8:47   ` Akira Yokosawa
  2 siblings, 0 replies; 16+ messages in thread
From: Leonardo Brás @ 2023-02-08  5:33 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook

On Tue, 2023-02-07 at 19:41 -0800, Paul E. McKenney wrote:
> On Wed, Feb 08, 2023 at 12:07:20AM -0300, Leonardo Brás wrote:
> > Hello Paul,
> > 
> > I have been reading the book, until I stumbled on Quick Quiz 3.7,
> > Table E.1: Performance of Synchronization Mechanisms
> > on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System
> > 
> > <Copying from source, since the PDF is a little tricky>
> > 
> > The first part looks like:
> > 
> >         Clock period            &           0.4 &           1.0 \\
> >         Same-CPU CAS            &          12.2 &          33.8 \\
> >         Same-CPU lock           &          25.6 &          71.2 \\
> >         Blind CAS               &          12.9 &          35.8 \\
> >         CAS                     &           7.0 &          19.4 \\
> >  
> > In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? 
> > 
> > (For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like
> > in Table 3.1, but that would not make sense: This "CAS" is faster than the
> > previous "Same-CPU CAS". )
> 
> I was surprised myself, but those measurements are quite real.  My best
> guess is that the two threads in the core are able to overlap their
> accesses, while the single CPU must do everything sequentially.
> 
> Strange, but whatever the reason, true!  ;-)

Yeah, even more strange is that in this case Blind CAS was actually slower than
CAS. Anyway, your suggestion on overlapping accesses make sense.

I want to suggest a change, but it's easier to discuss this over a patch.

> 
> 							Thanx, Paul

Thank you!
Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about Table E.1
  2023-02-08  3:41 ` Paul E. McKenney
  2023-02-08  5:33   ` Leonardo Brás
@ 2023-02-08  5:50   ` Leonardo Brás
  2023-02-08  8:47   ` Akira Yokosawa
  2 siblings, 0 replies; 16+ messages in thread
From: Leonardo Brás @ 2023-02-08  5:50 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook

On Tue, 2023-02-07 at 19:41 -0800, Paul E. McKenney wrote:
> On Wed, Feb 08, 2023 at 12:07:20AM -0300, Leonardo Brás wrote:
> > Hello Paul,
> > 
> > I have been reading the book, until I stumbled on Quick Quiz 3.7,
> > Table E.1: Performance of Synchronization Mechanisms
> > on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System
> > 
> > <Copying from source, since the PDF is a little tricky>
> > 
> > The first part looks like:
> > 
> >         Clock period            &           0.4 &           1.0 \\
> >         Same-CPU CAS            &          12.2 &          33.8 \\
> >         Same-CPU lock           &          25.6 &          71.2 \\
> >         Blind CAS               &          12.9 &          35.8 \\
> >         CAS                     &           7.0 &          19.4 \\
> >  
> > In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? 
> > 
> > (For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like
> > in Table 3.1, but that would not make sense: This "CAS" is faster than the
> > previous "Same-CPU CAS". )
> 
> I was surprised myself, but those measurements are quite real.  My best
> guess is that the two threads in the core are able to overlap their
> accesses, while the single CPU must do everything sequentially.

IMHO This phrase would look great as a footnote for this page, clearing any
question like mine. What do you think about it?

> 
> Strange, but whatever the reason, true!  ;-)
> 
> 							Thanx, Paul


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about Table E.1
  2023-02-08  3:41 ` Paul E. McKenney
  2023-02-08  5:33   ` Leonardo Brás
  2023-02-08  5:50   ` Leonardo Brás
@ 2023-02-08  8:47   ` Akira Yokosawa
  2023-02-08 10:26     ` Akira Yokosawa
  2 siblings, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2023-02-08  8:47 UTC (permalink / raw)
  To: paulmck, Leonardo Brás; +Cc: perfbook, Akira Yokosawa

Hi,

On Tue, 7 Feb 2023 19:41:02 -0800, Paul E. McKenney wrote:
> On Wed, Feb 08, 2023 at 12:07:20AM -0300, Leonardo Brás wrote:
>> Hello Paul,
>>
>> I have been reading the book, until I stumbled on Quick Quiz 3.7,
>> Table E.1: Performance of Synchronization Mechanisms
>> on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System
>>
>> <Copying from source, since the PDF is a little tricky>
>>
>> The first part looks like:
>>
>>         Clock period            &           0.4 &           1.0 \\
>>         Same-CPU CAS            &          12.2 &          33.8 \\
>>         Same-CPU lock           &          25.6 &          71.2 \\
>>         Blind CAS               &          12.9 &          35.8 \\
>>         CAS                     &           7.0 &          19.4 \\
>>  
>> In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? 
>>
>> (For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like
>> in Table 3.1, but that would not make sense: This "CAS" is faster than the
>> previous "Same-CPU CAS". )
> 
> I was surprised myself, but those measurements are quite real.  My best
> guess is that the two threads in the core are able to overlap their
> accesses, while the single CPU must do everything sequentially.

Paul, do you remember how you obtained the data set?
There are several data sets under CodeSamples/cpu/data/, but I don't
see the one corresponds to the table.

The code for collecting these data was added in CodeSamples/cpu/
by commit 81989d7483e2 ("cpu: Reproduce the old cache-to-cache
latency measurement code") in 2020. And the next commit 2fc05ca07edc
("api-pthreads.h: Use clock_gettime() and check sched_setaffinity()")
improved the stability of reproduced code.

This table was first added in commit 38fd945ff401 ("Fill out CPU
chapter, including adding Nehalem data.") in 2009.
The data have never been updated since.

I'm kind of suspecting the "7.0 us" which surprised you at the time
might have been an outlier due to some disturbance discussed in
Appendix A.3 "What Time Is It?".

I'm not sure, just guessing...

        Thanks, Akira

> 
> Strange, but whatever the reason, true!  ;-)
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about Table E.1
  2023-02-08  8:47   ` Akira Yokosawa
@ 2023-02-08 10:26     ` Akira Yokosawa
  2023-02-08 22:15       ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2023-02-08 10:26 UTC (permalink / raw)
  To: paulmck, Leonardo Brás; +Cc: perfbook, Akira Yokosawa

On Wed, 8 Feb 2023 17:47:31 +0900, Akira Yokosawa wrote:
> Hi,
> 
> On Tue, 7 Feb 2023 19:41:02 -0800, Paul E. McKenney wrote:
>> On Wed, Feb 08, 2023 at 12:07:20AM -0300, Leonardo Brás wrote:
>>> Hello Paul,
>>>
>>> I have been reading the book, until I stumbled on Quick Quiz 3.7,
>>> Table E.1: Performance of Synchronization Mechanisms
>>> on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System
>>>
>>> <Copying from source, since the PDF is a little tricky>
>>>
>>> The first part looks like:
>>>
>>>         Clock period            &           0.4 &           1.0 \\
>>>         Same-CPU CAS            &          12.2 &          33.8 \\
>>>         Same-CPU lock           &          25.6 &          71.2 \\
>>>         Blind CAS               &          12.9 &          35.8 \\
>>>         CAS                     &           7.0 &          19.4 \\
>>>  
>>> In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? 
>>>
>>> (For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like
>>> in Table 3.1, but that would not make sense: This "CAS" is faster than the
>>> previous "Same-CPU CAS". )
>>
>> I was surprised myself, but those measurements are quite real.  My best
>> guess is that the two threads in the core are able to overlap their
>> accesses, while the single CPU must do everything sequentially.
> 
> Paul, do you remember how you obtained the data set?
> There are several data sets under CodeSamples/cpu/data/, but I don't
> see the one corresponds to the table.
> 
> The code for collecting these data was added in CodeSamples/cpu/
> by commit 81989d7483e2 ("cpu: Reproduce the old cache-to-cache
> latency measurement code") in 2020. And the next commit 2fc05ca07edc
> ("api-pthreads.h: Use clock_gettime() and check sched_setaffinity()")
> improved the stability of reproduced code.
> 
> This table was first added in commit 38fd945ff401 ("Fill out CPU
> chapter, including adding Nehalem data.") in 2009.
> The data have never been updated since.
> 
> I'm kind of suspecting the "7.0 us" which surprised you at the time
I mean,                      "7.0 ns"

        Thanks, Akira

> might have been an outlier due to some disturbance discussed in
> Appendix A.3 "What Time Is It?".
> 
> I'm not sure, just guessing...
> 
>         Thanks, Akira
> 
>>
>> Strange, but whatever the reason, true!  ;-)
>>
>> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about Table E.1
  2023-02-08 10:26     ` Akira Yokosawa
@ 2023-02-08 22:15       ` Paul E. McKenney
  2023-02-08 23:49         ` Akira Yokosawa
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2023-02-08 22:15 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Leonardo Brás, perfbook

On Wed, Feb 08, 2023 at 07:26:58PM +0900, Akira Yokosawa wrote:
> On Wed, 8 Feb 2023 17:47:31 +0900, Akira Yokosawa wrote:
> > Hi,
> > 
> > On Tue, 7 Feb 2023 19:41:02 -0800, Paul E. McKenney wrote:
> >> On Wed, Feb 08, 2023 at 12:07:20AM -0300, Leonardo Brás wrote:
> >>> Hello Paul,
> >>>
> >>> I have been reading the book, until I stumbled on Quick Quiz 3.7,
> >>> Table E.1: Performance of Synchronization Mechanisms
> >>> on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System
> >>>
> >>> <Copying from source, since the PDF is a little tricky>
> >>>
> >>> The first part looks like:
> >>>
> >>>         Clock period            &           0.4 &           1.0 \\
> >>>         Same-CPU CAS            &          12.2 &          33.8 \\
> >>>         Same-CPU lock           &          25.6 &          71.2 \\
> >>>         Blind CAS               &          12.9 &          35.8 \\
> >>>         CAS                     &           7.0 &          19.4 \\
> >>>  
> >>> In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? 
> >>>
> >>> (For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like
> >>> in Table 3.1, but that would not make sense: This "CAS" is faster than the
> >>> previous "Same-CPU CAS". )
> >>
> >> I was surprised myself, but those measurements are quite real.  My best
> >> guess is that the two threads in the core are able to overlap their
> >> accesses, while the single CPU must do everything sequentially.
> > 
> > Paul, do you remember how you obtained the data set?
> > There are several data sets under CodeSamples/cpu/data/, but I don't
> > see the one corresponds to the table.
> > 
> > The code for collecting these data was added in CodeSamples/cpu/
> > by commit 81989d7483e2 ("cpu: Reproduce the old cache-to-cache
> > latency measurement code") in 2020. And the next commit 2fc05ca07edc
> > ("api-pthreads.h: Use clock_gettime() and check sched_setaffinity()")
> > improved the stability of reproduced code.
> > 
> > This table was first added in commit 38fd945ff401 ("Fill out CPU
> > chapter, including adding Nehalem data.") in 2009.
> > The data have never been updated since.
> > 
> > I'm kind of suspecting the "7.0 us" which surprised you at the time
> I mean,                      "7.0 ns"
> 
>         Thanks, Akira
> 
> > might have been an outlier due to some disturbance discussed in
> > Appendix A.3 "What Time Is It?".
> > 
> > I'm not sure, just guessing...

My surprise caused me to beat on it, and it was persistent.

But I cannot find the raw data, either, so maybe I should delete that
table.  Though I really do like the fact that it is surprising, based
on a hope that it convinces readers to expect the unexpected.

							Thanx, Paul

> >         Thanks, Akira
> > 
> >>
> >> Strange, but whatever the reason, true!  ;-)
> >>
> >> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about Table E.1
  2023-02-08 22:15       ` Paul E. McKenney
@ 2023-02-08 23:49         ` Akira Yokosawa
  2023-02-09 11:13           ` Akira Yokosawa
  0 siblings, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2023-02-08 23:49 UTC (permalink / raw)
  To: paulmck; +Cc: Leonardo Brás, perfbook, Akira Yokosawa

On Wed, 8 Feb 2023 14:15:22 -0800, Paul E. McKenney wrote:
> On Wed, Feb 08, 2023 at 07:26:58PM +0900, Akira Yokosawa wrote:
>> On Wed, 8 Feb 2023 17:47:31 +0900, Akira Yokosawa wrote:
>>> Hi,
>>>
>>> On Tue, 7 Feb 2023 19:41:02 -0800, Paul E. McKenney wrote:
>>>> On Wed, Feb 08, 2023 at 12:07:20AM -0300, Leonardo Brás wrote:
>>>>> Hello Paul,
>>>>>
>>>>> I have been reading the book, until I stumbled on Quick Quiz 3.7,
>>>>> Table E.1: Performance of Synchronization Mechanisms
>>>>> on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System
>>>>>
>>>>> <Copying from source, since the PDF is a little tricky>
>>>>>
>>>>> The first part looks like:
>>>>>
>>>>>         Clock period            &           0.4 &           1.0 \\
>>>>>         Same-CPU CAS            &          12.2 &          33.8 \\
>>>>>         Same-CPU lock           &          25.6 &          71.2 \\
>>>>>         Blind CAS               &          12.9 &          35.8 \\
>>>>>         CAS                     &           7.0 &          19.4 \\
>>>>>  
>>>>> In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? 
>>>>>
>>>>> (For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like
>>>>> in Table 3.1, but that would not make sense: This "CAS" is faster than the
>>>>> previous "Same-CPU CAS". )
>>>>
>>>> I was surprised myself, but those measurements are quite real.  My best
>>>> guess is that the two threads in the core are able to overlap their
>>>> accesses, while the single CPU must do everything sequentially.
>>>
>>> Paul, do you remember how you obtained the data set?
>>> There are several data sets under CodeSamples/cpu/data/, but I don't
>>> see the one corresponds to the table.
>>>
>>> The code for collecting these data was added in CodeSamples/cpu/
>>> by commit 81989d7483e2 ("cpu: Reproduce the old cache-to-cache
>>> latency measurement code") in 2020. And the next commit 2fc05ca07edc
>>> ("api-pthreads.h: Use clock_gettime() and check sched_setaffinity()")
>>> improved the stability of reproduced code.
>>>
>>> This table was first added in commit 38fd945ff401 ("Fill out CPU
>>> chapter, including adding Nehalem data.") in 2009.
>>> The data have never been updated since.
>>>
>>> I'm kind of suspecting the "7.0 us" which surprised you at the time
>> I mean,                      "7.0 ns"
>>
>>         Thanks, Akira
>>
>>> might have been an outlier due to some disturbance discussed in
>>> Appendix A.3 "What Time Is It?".
>>>
>>> I'm not sure, just guessing...
> 
> My surprise caused me to beat on it, and it was persistent.

I see.  So the "outlier" was the microarchitecture of that
X5550 (Nehalem), I guess.
I'd love to reproduce the behavior if at all possible.

> 
> But I cannot find the raw data, either, so maybe I should delete that
> table.  Though I really do like the fact that it is surprising, based
> on a hope that it convinces readers to expect the unexpected.

That episode would be a good Quick Quiz if the Answer to QQz
could have a nested QQz inside it.
Unfortunately that is not possible...

        Thanks, Akira

> 
> 							Thanx, Paul
> 
>>>         Thanks, Akira
>>>
>>>>
>>>> Strange, but whatever the reason, true!  ;-)
>>>>
>>>> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about Table E.1
  2023-02-08 23:49         ` Akira Yokosawa
@ 2023-02-09 11:13           ` Akira Yokosawa
  2023-02-09 15:12             ` [PATCH -perfbook] cpu: PoC of A QQz citing table in answer to another QQz (was Re: Question about Table E.1) Akira Yokosawa
  0 siblings, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2023-02-09 11:13 UTC (permalink / raw)
  To: paulmck; +Cc: Leonardo Brás, perfbook, Akira Yokosawa

On Thu, 9 Feb 2023 08:49:46 +0900, Akira Yokosawa wrote:
> On Wed, 8 Feb 2023 14:15:22 -0800, Paul E. McKenney wrote:
>> My surprise caused me to beat on it, and it was persistent.
> 
> I see.  So the "outlier" was the microarchitecture of that
> X5550 (Nehalem), I guess.
> I'd love to reproduce the behavior if at all possible.
> 
>>
>> But I cannot find the raw data, either, so maybe I should delete that
>> table.  Though I really do like the fact that it is surprising, based
>> on a hope that it convinces readers to expect the unexpected.
> 
> That episode would be a good Quick Quiz if the Answer to QQz
> could have a nested QQz inside it.
> Unfortunately that is not possible...

On second thought, it should be possible to put a QQz next to QQz 3.7
citing Table E.1 in the Quiz part.

Let me try and produce a PoC patch.

        Thanks, Akira

> 
>         Thanks, Akira
> 
>>
>> 							Thanx, Paul
>>
>>>>         Thanks, Akira
>>>>
>>>>>
>>>>> Strange, but whatever the reason, true!  ;-)
>>>>>
>>>>> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH -perfbook] cpu: PoC of A QQz citing table in answer to another QQz (was Re: Question about Table E.1)
  2023-02-09 11:13           ` Akira Yokosawa
@ 2023-02-09 15:12             ` Akira Yokosawa
  2023-02-11  8:49               ` Leonardo Brás
  2023-02-12  0:42               ` [PATCH -perfbook v2] cpu: Add a QQz on table E.1 Akira Yokosawa
  0 siblings, 2 replies; 16+ messages in thread
From: Akira Yokosawa @ 2023-02-09 15:12 UTC (permalink / raw)
  To: paulmck; +Cc: Leonardo Brás, perfbook, Akira Yokosawa

Subject: [PATCH -perfbook] cpu: Add a QQz citing table E.1

An email thread started from a question from Leo [1] stimulated
me to add a QQz on Paul's experience back in 2009.

Link: [1] https://www.spinics.net/lists/perfbook/msg03824.html
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
I wrote:
> On second thought, it should be possible to put a QQz next to QQz 3.7
> citing Table E.1 in the Quiz part.
> 
> Let me try and produce a PoC patch.

Something like this?

I couldn't make the QQz next to QQz 3.7 due to the

  \QuickQuizLabel{\QspeedOfLightAtoms}

just below QQz 3.7.
If you put the label in the middle of

\QuickQuizSeries{
 ...
}

, you will get build errors in -nq builds. 

Instead, I added it next to QQz 3.8 (or the end of Section 3.2.2).
It looks still relevant there.

The wording of the Quiz and its Answer is just a stub.
Paul, feel free to rewrite them as you like.

        Thanks, Akira
--
 cpu/overheads.tex | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/cpu/overheads.tex b/cpu/overheads.tex
index 0d8270bf6e17..cff847cdadbf 100644
--- a/cpu/overheads.tex
+++ b/cpu/overheads.tex
@@ -485,10 +485,11 @@ cycles, as shown in the ``Global Comms'' row.
 %     page 6/76 'Leading Interconnect, Leading Performance'
 % Needs updating...
 
-\QuickQuiz{
+\QuickQuizSeries{%
+\QuickQuizB{
 	These numbers are insanely large!
 	How can I possibly get my head around them?
-}\QuickQuizAnswer{
+}\QuickQuizAnswerB{
 	Get a roll of toilet paper.
 	In the USA, each roll will normally have somewhere around
 	350--500 sheets.
@@ -516,7 +517,24 @@ cycles, as shown in the ``Global Comms'' row.
 	You might wish to avoid disabling interrupts across that many
 	cache misses.\footnote{
 		Kudos to Matthew Wilcox for this holding-breath analogy.}
-}\QuickQuizEnd
+}\QuickQuizEndB
+%
+\QuickQuizE{
+	\Cref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
+	in the answer to \QuickQuizARef{\QspeedOfLightAtoms} says that
+	In-Core CAS is faster than both of Same-CPU CAS and In-Core Blind CAS\@.
+	What is happening there?
+}\QuickQuizAnswerE{
+	I \emph{was} surprised by the data I obtained and did a rigorous
+	check of their validity.
+	I got the same result persistently.
+	One theory that might explain the observation would be:
+	The two threads in the core are able to overlap their accesses,
+	while the single CPU must do everything sequentially.
+	Unfortunately, there seems to be no public documentation explaining
+	why the Intel X5550 (Nehalem) system behaved like that.
+}\QuickQuizEndE
+}                 % End of \QuickQuizSeries
 
 \subsection{Hardware Optimizations}
 \label{sec:cpu:Hardware Optimizations}

base-commit: 14440e232cc1b2580dc1a73f873dc29fe3aea02b
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH -perfbook] cpu: PoC of A QQz citing table in answer to another QQz (was Re: Question about Table E.1)
  2023-02-09 15:12             ` [PATCH -perfbook] cpu: PoC of A QQz citing table in answer to another QQz (was Re: Question about Table E.1) Akira Yokosawa
@ 2023-02-11  8:49               ` Leonardo Brás
  2023-02-12  0:04                 ` Akira Yokosawa
  2023-02-12  0:42               ` [PATCH -perfbook v2] cpu: Add a QQz on table E.1 Akira Yokosawa
  1 sibling, 1 reply; 16+ messages in thread
From: Leonardo Brás @ 2023-02-11  8:49 UTC (permalink / raw)
  To: Akira Yokosawa, paulmck; +Cc: perfbook

On Fri, 2023-02-10 at 00:12 +0900, Akira Yokosawa wrote:
> Subject: [PATCH -perfbook] cpu: Add a QQz citing table E.1
> 
> An email thread started from a question from Leo [1] stimulated
> me to add a QQz on Paul's experience back in 2009.
> 
> Link: [1] https://www.spinics.net/lists/perfbook/msg03824.html
> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> ---
> I wrote:
> > On second thought, it should be possible to put a QQz next to QQz 3.7
> > citing Table E.1 in the Quiz part.
> > 
> > Let me try and produce a PoC patch.
> 
> Something like this?
> 
> I couldn't make the QQz next to QQz 3.7 due to the
> 
>   \QuickQuizLabel{\QspeedOfLightAtoms}
> 
> just below QQz 3.7.
> If you put the label in the middle of
> 
> \QuickQuizSeries{
>  ...
> }
> 
> , you will get build errors in -nq builds. 
> 
> Instead, I added it next to QQz 3.8 (or the end of Section 3.2.2).
> It looks still relevant there.
> 
> The wording of the Quiz and its Answer is just a stub.
> Paul, feel free to rewrite them as you like.
> 
>         Thanks, Akira
> --
>  cpu/overheads.tex | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/cpu/overheads.tex b/cpu/overheads.tex
> index 0d8270bf6e17..cff847cdadbf 100644
> --- a/cpu/overheads.tex
> +++ b/cpu/overheads.tex
> @@ -485,10 +485,11 @@ cycles, as shown in the ``Global Comms'' row.
>  %     page 6/76 'Leading Interconnect, Leading Performance'
>  % Needs updating...
>  
> -\QuickQuiz{
> +\QuickQuizSeries{%
> +\QuickQuizB{
>  	These numbers are insanely large!
>  	How can I possibly get my head around them?
> -}\QuickQuizAnswer{
> +}\QuickQuizAnswerB{
>  	Get a roll of toilet paper.
>  	In the USA, each roll will normally have somewhere around
>  	350--500 sheets.
> @@ -516,7 +517,24 @@ cycles, as shown in the ``Global Comms'' row.
>  	You might wish to avoid disabling interrupts across that many
>  	cache misses.\footnote{
>  		Kudos to Matthew Wilcox for this holding-breath analogy.}
> -}\QuickQuizEnd
> +}\QuickQuizEndB
> +%
> +\QuickQuizE{
> +	\Cref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
> +	in the answer to \QuickQuizARef{\QspeedOfLightAtoms} says that
> +	In-Core CAS is faster than both of Same-CPU CAS and In-Core Blind CAS\@.
> +	What is happening there?
> +}\QuickQuizAnswerE{
> +	I \emph{was} surprised by the data I obtained and did a rigorous
> +	check of their validity.
> +	I got the same result persistently.
> +	One theory that might explain the observation would be:
> +	The two threads in the core are able to overlap their accesses,
> +	while the single CPU must do everything sequentially.
> +	Unfortunately, there seems to be no public documentation explaining
> +	why the Intel X5550 (Nehalem) system behaved like that.
> +}\QuickQuizEndE
> +}                 % End of \QuickQuizSeries
>  
>  \subsection{Hardware Optimizations}
>  \label{sec:cpu:Hardware Optimizations}
> 
> base-commit: 14440e232cc1b2580dc1a73f873dc29fe3aea02b

Tested on Gitlab:
https://gitlab.com/linux-kernel/perfbook/-/pipelines/774642106

Observations for this PDF:
QQz 3.7 has the mentioned table.
QQz 3.9 has the comment on the table.
'Question + link' to QQz 3.9 appears in Chapter 3, on the page after 'question +
link' to QQz 3.7.

Is that the desired output?

(It feels a little 'misplaced', TBH. The best place would be just after QQz 3.7,
but looking into your comments, it seems it was not possible.)


Anyway, if the desired output was achieved:

Tested-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>


Best regards,
Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -perfbook] cpu: PoC of A QQz citing table in answer to another QQz (was Re: Question about Table E.1)
  2023-02-11  8:49               ` Leonardo Brás
@ 2023-02-12  0:04                 ` Akira Yokosawa
  0 siblings, 0 replies; 16+ messages in thread
From: Akira Yokosawa @ 2023-02-12  0:04 UTC (permalink / raw)
  To: Leonardo Brás, paulmck; +Cc: perfbook, Akira Yokosawa

On Sat, 11 Feb 2023 05:49:49 -0300, Leonardo Brás wrote:
> On Fri, 2023-02-10 at 00:12 +0900, Akira Yokosawa wrote:
>> Subject: [PATCH -perfbook] cpu: Add a QQz citing table E.1
>>
>> An email thread started from a question from Leo [1] stimulated
>> me to add a QQz on Paul's experience back in 2009.
>>
>> Link: [1] https://www.spinics.net/lists/perfbook/msg03824.html
>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
>> ---
>> I wrote:
>>> On second thought, it should be possible to put a QQz next to QQz 3.7
>>> citing Table E.1 in the Quiz part.
>>>
>>> Let me try and produce a PoC patch.
>>
>> Something like this?
>>
>> I couldn't make the QQz next to QQz 3.7 due to the
>>
>>   \QuickQuizLabel{\QspeedOfLightAtoms}
>>
>> just below QQz 3.7.
>> If you put the label in the middle of
>>
>> \QuickQuizSeries{
>>  ...
>> }
>>
>> , you will get build errors in -nq builds. 
>>
>> Instead, I added it next to QQz 3.8 (or the end of Section 3.2.2).
>> It looks still relevant there.
>>
>> The wording of the Quiz and its Answer is just a stub.
>> Paul, feel free to rewrite them as you like.
>>
>>         Thanks, Akira
>> --
>>  cpu/overheads.tex | 24 +++++++++++++++++++++---
>>  1 file changed, 21 insertions(+), 3 deletions(-)
>>
...
> 
> Tested on Gitlab:
> https://gitlab.com/linux-kernel/perfbook/-/pipelines/774642106

Thank you for testing!

> 
> Observations for this PDF:
> QQz 3.7 has the mentioned table.
> QQz 3.9 has the comment on the table.
> 'Question + link' to QQz 3.9 appears in Chapter 3, on the page after 'question +
> link' to QQz 3.7.
> 
> Is that the desired output?
> 
> (It feels a little 'misplaced', TBH. The best place would be just after QQz 3.7,
> but looking into your comments, it seems it was not possible.)

I think I have figured out the way for placing it next to QQz 3.7.

Will post a v2 after testing on my side.

        Thanks, Akira

> 
> 
> Anyway, if the desired output was achieved:
> 
> Tested-by: Leonardo Bras <leobras@redhat.com>
> Reviewed-by: Leonardo Bras <leobras@redhat.com>
> 
> 
> Best regards,
> Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH -perfbook v2] cpu: Add a QQz on table E.1
  2023-02-09 15:12             ` [PATCH -perfbook] cpu: PoC of A QQz citing table in answer to another QQz (was Re: Question about Table E.1) Akira Yokosawa
  2023-02-11  8:49               ` Leonardo Brás
@ 2023-02-12  0:42               ` Akira Yokosawa
  2023-02-13  0:28                 ` Paul E. McKenney
  1 sibling, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2023-02-12  0:42 UTC (permalink / raw)
  To: paulmck, Leonardo Brás; +Cc: perfbook, Akira Yokosawa

An email thread started from a question from Leo [1] stimulated
me to add a QQz on Paul's experience back in 2009.

As \QuickQuizLabel{} inside \QuickQuizSeries{} doesn't work in
-nq builds, define \QuickQuizLabelRel{}{} and put it in front
of the series.

Link: [1] https://www.spinics.net/lists/perfbook/msg03824.html
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: Leonardo Brás <leobras.c@gmail.com>
---
v2: Place new QQz next to QQz 3.7

--
 cpu/overheads.tex | 28 +++++++++++++++++++++++-----
 qqz.sty           | 16 ++++++++++++++++
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/cpu/overheads.tex b/cpu/overheads.tex
index 0d8270bf6e17..a89c71158bf9 100644
--- a/cpu/overheads.tex
+++ b/cpu/overheads.tex
@@ -290,12 +290,15 @@ nanoseconds, or more than seven hundred clock cycles.
 A CAS operation consumes almost a full microsecond, or almost two
 thousand clock cycles.
 
-\QuickQuiz{
+\QuickQuizLabelRel{\QspeedOfLightAtoms}{1} % cann't put label inside QQSeries
+
+\QuickQuizSeries{%
+\QuickQuizB{
 	Surely the hardware designers could be persuaded to improve
 	this situation!
 	Why have they been content with such abysmal performance
 	for these single-instruction operations?
-}\QuickQuizAnswer{
+}\QuickQuizAnswerB{
 	The hardware designers \emph{have} been working on this
 	problem, and have consulted with no less a luminary than
 	the late physicist Stephen Hawking.
@@ -423,9 +426,24 @@ thousand clock cycles.
 	\Cref{sec:cpu:Hardware Free Lunch?}
 	looks at what else hardware designers might be
 	able to do to ease the plight of parallel programmers.
-}\QuickQuizEnd
-
-\QuickQuizLabel{\QspeedOfLightAtoms}
+}\QuickQuizEndB
+%
+\QuickQuizE{
+	\Cref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
+	in the answer to \QuickQuizARef{\QspeedOfLightAtoms} says that
+	In-Core CAS is faster than both of Same-CPU CAS and In-Core Blind CAS\@.
+	What is happening there?
+}\QuickQuizAnswerE{
+	I \emph{was} surprised by the data I obtained and did a rigorous
+	check of their validity.
+	I got the same result persistently.
+	One theory that might explain the observation would be:
+	The two threads in the core are able to overlap their accesses,
+	while the single CPU must do everything sequentially.
+	Unfortunately, there seems to be no public documentation explaining
+	why the Intel X5550 (Nehalem) system behaved like that.
+}\QuickQuizEndE
+}                 % End of \QuickQuizSeries
 
 \begin{table}
 \rowcolors{1}{}{lightgray}
diff --git a/qqz.sty b/qqz.sty
index a3a9f22d1ba9..5c7eb5340194 100644
--- a/qqz.sty
+++ b/qqz.sty
@@ -156,6 +156,11 @@
 
 % To create a macro referencing the previously defined quick quiz:
 %	\QuickQuizLabel{\QQname}
+%
+% When labeling a QQz inside \QuickQuizSeries{}, use
+%	\QuickQuizLabelRel{\QQname}{rel}
+% in front of the series.
+%
 % To reference the macro in the text:
 %	\QuickQuizRef{\QQname}
 % To reference the answer of the macro in the text:
@@ -166,6 +171,11 @@
 \newcommand{\QuickQuizLabel}[1]{
 	\edef#1{\thechapter.\thequickquizctrP}
 }
+\newcommand{\QuickQuizLabelRel}[2]{
+        \addtocounter{quickquizctrP}{#2}
+	\QuickQuizLabel{#1}
+	\addtocounter{quickquizctrP}{-#2}
+}
 \newcommand{\QuickQuizRef}[1]{%
 	\hyperref[QQ.#1]{Quick Quiz~#1}%
 }
@@ -176,6 +186,12 @@
 \newcommand{\QuickQuizLabel}[1]{
 	\edef#1{\thechapter.\thequickquizctr}
 }
+\newcommand{\QuickQuizLabelRel}[2]{
+        \addtocounter{quickquizctr}{#2}
+	\QuickQuizLabel{#1}
+	\addtocounter{quickquizctr}{-#2}
+}
+
 \newcommand{\QuickQuizRef}[1]{%
 	\hyperref[QQ.#1]{Quick Quiz~#1}%
 }

base-commit: 14440e232cc1b2580dc1a73f873dc29fe3aea02b
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH -perfbook v2] cpu: Add a QQz on table E.1
  2023-02-12  0:42               ` [PATCH -perfbook v2] cpu: Add a QQz on table E.1 Akira Yokosawa
@ 2023-02-13  0:28                 ` Paul E. McKenney
  2023-02-13  1:54                   ` Akira Yokosawa
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2023-02-13  0:28 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Leonardo Brás, perfbook

On Sun, Feb 12, 2023 at 09:42:07AM +0900, Akira Yokosawa wrote:
> An email thread started from a question from Leo [1] stimulated
> me to add a QQz on Paul's experience back in 2009.
> 
> As \QuickQuizLabel{} inside \QuickQuizSeries{} doesn't work in
> -nq builds, define \QuickQuizLabelRel{}{} and put it in front
> of the series.
> 
> Link: [1] https://www.spinics.net/lists/perfbook/msg03824.html
> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> Cc: Leonardo Brás <leobras.c@gmail.com>

Queued and pushed, thank you!

							Thanx, Paul

> ---
> v2: Place new QQz next to QQz 3.7
> 
> --
>  cpu/overheads.tex | 28 +++++++++++++++++++++++-----
>  qqz.sty           | 16 ++++++++++++++++
>  2 files changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/cpu/overheads.tex b/cpu/overheads.tex
> index 0d8270bf6e17..a89c71158bf9 100644
> --- a/cpu/overheads.tex
> +++ b/cpu/overheads.tex
> @@ -290,12 +290,15 @@ nanoseconds, or more than seven hundred clock cycles.
>  A CAS operation consumes almost a full microsecond, or almost two
>  thousand clock cycles.
>  
> -\QuickQuiz{
> +\QuickQuizLabelRel{\QspeedOfLightAtoms}{1} % cann't put label inside QQSeries
> +
> +\QuickQuizSeries{%
> +\QuickQuizB{
>  	Surely the hardware designers could be persuaded to improve
>  	this situation!
>  	Why have they been content with such abysmal performance
>  	for these single-instruction operations?
> -}\QuickQuizAnswer{
> +}\QuickQuizAnswerB{
>  	The hardware designers \emph{have} been working on this
>  	problem, and have consulted with no less a luminary than
>  	the late physicist Stephen Hawking.
> @@ -423,9 +426,24 @@ thousand clock cycles.
>  	\Cref{sec:cpu:Hardware Free Lunch?}
>  	looks at what else hardware designers might be
>  	able to do to ease the plight of parallel programmers.
> -}\QuickQuizEnd
> -
> -\QuickQuizLabel{\QspeedOfLightAtoms}
> +}\QuickQuizEndB
> +%
> +\QuickQuizE{
> +	\Cref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
> +	in the answer to \QuickQuizARef{\QspeedOfLightAtoms} says that
> +	In-Core CAS is faster than both of Same-CPU CAS and In-Core Blind CAS\@.
> +	What is happening there?
> +}\QuickQuizAnswerE{
> +	I \emph{was} surprised by the data I obtained and did a rigorous
> +	check of their validity.
> +	I got the same result persistently.
> +	One theory that might explain the observation would be:
> +	The two threads in the core are able to overlap their accesses,
> +	while the single CPU must do everything sequentially.
> +	Unfortunately, there seems to be no public documentation explaining
> +	why the Intel X5550 (Nehalem) system behaved like that.
> +}\QuickQuizEndE
> +}                 % End of \QuickQuizSeries
>  
>  \begin{table}
>  \rowcolors{1}{}{lightgray}
> diff --git a/qqz.sty b/qqz.sty
> index a3a9f22d1ba9..5c7eb5340194 100644
> --- a/qqz.sty
> +++ b/qqz.sty
> @@ -156,6 +156,11 @@
>  
>  % To create a macro referencing the previously defined quick quiz:
>  %	\QuickQuizLabel{\QQname}
> +%
> +% When labeling a QQz inside \QuickQuizSeries{}, use
> +%	\QuickQuizLabelRel{\QQname}{rel}
> +% in front of the series.
> +%
>  % To reference the macro in the text:
>  %	\QuickQuizRef{\QQname}
>  % To reference the answer of the macro in the text:
> @@ -166,6 +171,11 @@
>  \newcommand{\QuickQuizLabel}[1]{
>  	\edef#1{\thechapter.\thequickquizctrP}
>  }
> +\newcommand{\QuickQuizLabelRel}[2]{
> +        \addtocounter{quickquizctrP}{#2}
> +	\QuickQuizLabel{#1}
> +	\addtocounter{quickquizctrP}{-#2}
> +}
>  \newcommand{\QuickQuizRef}[1]{%
>  	\hyperref[QQ.#1]{Quick Quiz~#1}%
>  }
> @@ -176,6 +186,12 @@
>  \newcommand{\QuickQuizLabel}[1]{
>  	\edef#1{\thechapter.\thequickquizctr}
>  }
> +\newcommand{\QuickQuizLabelRel}[2]{
> +        \addtocounter{quickquizctr}{#2}
> +	\QuickQuizLabel{#1}
> +	\addtocounter{quickquizctr}{-#2}
> +}
> +
>  \newcommand{\QuickQuizRef}[1]{%
>  	\hyperref[QQ.#1]{Quick Quiz~#1}%
>  }
> 
> base-commit: 14440e232cc1b2580dc1a73f873dc29fe3aea02b
> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -perfbook v2] cpu: Add a QQz on table E.1
  2023-02-13  0:28                 ` Paul E. McKenney
@ 2023-02-13  1:54                   ` Akira Yokosawa
  2023-02-13 19:37                     ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Akira Yokosawa @ 2023-02-13  1:54 UTC (permalink / raw)
  To: paulmck; +Cc: Leonardo Brás, perfbook, Akira Yokosawa

On Sun, 12 Feb 2023 16:28:56 -0800, Paul E. McKenney wrote:
> On Sun, Feb 12, 2023 at 09:42:07AM +0900, Akira Yokosawa wrote:
>> An email thread started from a question from Leo [1] stimulated
>> me to add a QQz on Paul's experience back in 2009.
>>
>> As \QuickQuizLabel{} inside \QuickQuizSeries{} doesn't work in
>> -nq builds, define \QuickQuizLabelRel{}{} and put it in front
>> of the series.
>>
>> Link: [1] https://www.spinics.net/lists/perfbook/msg03824.html
>> Signed-off-by: Akira Yokosawa <akiygmail.com>
>> Cc: Leonardo Brás <leobras.c@gmail.com>
> 
> Queued and pushed, thank you!

Paul, can you revert c82369bef3f6 ("cpu: Add a QQz citing table E.1")?

Patch v2 happens to have no conflict with v1 and you have applied
both of them!

Now we have the same Quizzes of QQz 3.8 and QQz 3.10.

c82369bef3f6 can be reverted cleanly.

        Thanks, Akira

> 
> 							Thanx, Paul
> 
>> ---
>> v2: Place new QQz next to QQz 3.7
>>
>> --
>>  cpu/overheads.tex | 28 +++++++++++++++++++++++-----
>>  qqz.sty           | 16 ++++++++++++++++
>>  2 files changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/cpu/overheads.tex b/cpu/overheads.tex
>> index 0d8270bf6e17..a89c71158bf9 100644
>> --- a/cpu/overheads.tex
>> +++ b/cpu/overheads.tex
>> @@ -290,12 +290,15 @@ nanoseconds, or more than seven hundred clock cycles.
>>  A CAS operation consumes almost a full microsecond, or almost two
>>  thousand clock cycles.
>>  
>> -\QuickQuiz{
>> +\QuickQuizLabelRel{\QspeedOfLightAtoms}{1} % cann't put label inside QQSeries
>> +
>> +\QuickQuizSeries{%
>> +\QuickQuizB{
>>  	Surely the hardware designers could be persuaded to improve
>>  	this situation!
>>  	Why have they been content with such abysmal performance
>>  	for these single-instruction operations?
>> -}\QuickQuizAnswer{
>> +}\QuickQuizAnswerB{
>>  	The hardware designers \emph{have} been working on this
>>  	problem, and have consulted with no less a luminary than
>>  	the late physicist Stephen Hawking.
>> @@ -423,9 +426,24 @@ thousand clock cycles.
>>  	\Cref{sec:cpu:Hardware Free Lunch?}
>>  	looks at what else hardware designers might be
>>  	able to do to ease the plight of parallel programmers.
>> -}\QuickQuizEnd
>> -
>> -\QuickQuizLabel{\QspeedOfLightAtoms}
>> +}\QuickQuizEndB
>> +%
>> +\QuickQuizE{
>> +	\Cref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
>> +	in the answer to \QuickQuizARef{\QspeedOfLightAtoms} says that
>> +	In-Core CAS is faster than both of Same-CPU CAS and In-Core Blind CAS\@.
>> +	What is happening there?
>> +}\QuickQuizAnswerE{
>> +	I \emph{was} surprised by the data I obtained and did a rigorous
>> +	check of their validity.
>> +	I got the same result persistently.
>> +	One theory that might explain the observation would be:
>> +	The two threads in the core are able to overlap their accesses,
>> +	while the single CPU must do everything sequentially.
>> +	Unfortunately, there seems to be no public documentation explaining
>> +	why the Intel X5550 (Nehalem) system behaved like that.
>> +}\QuickQuizEndE
>> +}                 % End of \QuickQuizSeries
>>  
>>  \begin{table}
>>  \rowcolors{1}{}{lightgray}
>> diff --git a/qqz.sty b/qqz.sty
>> index a3a9f22d1ba9..5c7eb5340194 100644
>> --- a/qqz.sty
>> +++ b/qqz.sty
>> @@ -156,6 +156,11 @@
>>  
>>  % To create a macro referencing the previously defined quick quiz:
>>  %	\QuickQuizLabel{\QQname}
>> +%
>> +% When labeling a QQz inside \QuickQuizSeries{}, use
>> +%	\QuickQuizLabelRel{\QQname}{rel}
>> +% in front of the series.
>> +%
>>  % To reference the macro in the text:
>>  %	\QuickQuizRef{\QQname}
>>  % To reference the answer of the macro in the text:
>> @@ -166,6 +171,11 @@
>>  \newcommand{\QuickQuizLabel}[1]{
>>  	\edef#1{\thechapter.\thequickquizctrP}
>>  }
>> +\newcommand{\QuickQuizLabelRel}[2]{
>> +        \addtocounter{quickquizctrP}{#2}
>> +	\QuickQuizLabel{#1}
>> +	\addtocounter{quickquizctrP}{-#2}
>> +}
>>  \newcommand{\QuickQuizRef}[1]{%
>>  	\hyperref[QQ.#1]{Quick Quiz~#1}%
>>  }
>> @@ -176,6 +186,12 @@
>>  \newcommand{\QuickQuizLabel}[1]{
>>  	\edef#1{\thechapter.\thequickquizctr}
>>  }
>> +\newcommand{\QuickQuizLabelRel}[2]{
>> +        \addtocounter{quickquizctr}{#2}
>> +	\QuickQuizLabel{#1}
>> +	\addtocounter{quickquizctr}{-#2}
>> +}
>> +
>>  \newcommand{\QuickQuizRef}[1]{%
>>  	\hyperref[QQ.#1]{Quick Quiz~#1}%
>>  }
>>
>> base-commit: 14440e232cc1b2580dc1a73f873dc29fe3aea02b
>> -- 
>> 2.25.1
>>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -perfbook v2] cpu: Add a QQz on table E.1
  2023-02-13  1:54                   ` Akira Yokosawa
@ 2023-02-13 19:37                     ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2023-02-13 19:37 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Leonardo Brás, perfbook

On Mon, Feb 13, 2023 at 10:54:09AM +0900, Akira Yokosawa wrote:
> On Sun, 12 Feb 2023 16:28:56 -0800, Paul E. McKenney wrote:
> > On Sun, Feb 12, 2023 at 09:42:07AM +0900, Akira Yokosawa wrote:
> >> An email thread started from a question from Leo [1] stimulated
> >> me to add a QQz on Paul's experience back in 2009.
> >>
> >> As \QuickQuizLabel{} inside \QuickQuizSeries{} doesn't work in
> >> -nq builds, define \QuickQuizLabelRel{}{} and put it in front
> >> of the series.
> >>
> >> Link: [1] https://www.spinics.net/lists/perfbook/msg03824.html
> >> Signed-off-by: Akira Yokosawa <akiygmail.com>
> >> Cc: Leonardo Brás <leobras.c@gmail.com>
> > 
> > Queued and pushed, thank you!
> 
> Paul, can you revert c82369bef3f6 ("cpu: Add a QQz citing table E.1")?
> 
> Patch v2 happens to have no conflict with v1 and you have applied
> both of them!
> 
> Now we have the same Quizzes of QQz 3.8 and QQz 3.10.
> 
> c82369bef3f6 can be reverted cleanly.

Apologies for my confusion!  I have revered c82369bef3f6 as you
suggested.

							Thanx, Paul

>         Thanks, Akira
> 
> > 
> > 							Thanx, Paul
> > 
> >> ---
> >> v2: Place new QQz next to QQz 3.7
> >>
> >> --
> >>  cpu/overheads.tex | 28 +++++++++++++++++++++++-----
> >>  qqz.sty           | 16 ++++++++++++++++
> >>  2 files changed, 39 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/cpu/overheads.tex b/cpu/overheads.tex
> >> index 0d8270bf6e17..a89c71158bf9 100644
> >> --- a/cpu/overheads.tex
> >> +++ b/cpu/overheads.tex
> >> @@ -290,12 +290,15 @@ nanoseconds, or more than seven hundred clock cycles.
> >>  A CAS operation consumes almost a full microsecond, or almost two
> >>  thousand clock cycles.
> >>  
> >> -\QuickQuiz{
> >> +\QuickQuizLabelRel{\QspeedOfLightAtoms}{1} % cann't put label inside QQSeries
> >> +
> >> +\QuickQuizSeries{%
> >> +\QuickQuizB{
> >>  	Surely the hardware designers could be persuaded to improve
> >>  	this situation!
> >>  	Why have they been content with such abysmal performance
> >>  	for these single-instruction operations?
> >> -}\QuickQuizAnswer{
> >> +}\QuickQuizAnswerB{
> >>  	The hardware designers \emph{have} been working on this
> >>  	problem, and have consulted with no less a luminary than
> >>  	the late physicist Stephen Hawking.
> >> @@ -423,9 +426,24 @@ thousand clock cycles.
> >>  	\Cref{sec:cpu:Hardware Free Lunch?}
> >>  	looks at what else hardware designers might be
> >>  	able to do to ease the plight of parallel programmers.
> >> -}\QuickQuizEnd
> >> -
> >> -\QuickQuizLabel{\QspeedOfLightAtoms}
> >> +}\QuickQuizEndB
> >> +%
> >> +\QuickQuizE{
> >> +	\Cref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
> >> +	in the answer to \QuickQuizARef{\QspeedOfLightAtoms} says that
> >> +	In-Core CAS is faster than both of Same-CPU CAS and In-Core Blind CAS\@.
> >> +	What is happening there?
> >> +}\QuickQuizAnswerE{
> >> +	I \emph{was} surprised by the data I obtained and did a rigorous
> >> +	check of their validity.
> >> +	I got the same result persistently.
> >> +	One theory that might explain the observation would be:
> >> +	The two threads in the core are able to overlap their accesses,
> >> +	while the single CPU must do everything sequentially.
> >> +	Unfortunately, there seems to be no public documentation explaining
> >> +	why the Intel X5550 (Nehalem) system behaved like that.
> >> +}\QuickQuizEndE
> >> +}                 % End of \QuickQuizSeries
> >>  
> >>  \begin{table}
> >>  \rowcolors{1}{}{lightgray}
> >> diff --git a/qqz.sty b/qqz.sty
> >> index a3a9f22d1ba9..5c7eb5340194 100644
> >> --- a/qqz.sty
> >> +++ b/qqz.sty
> >> @@ -156,6 +156,11 @@
> >>  
> >>  % To create a macro referencing the previously defined quick quiz:
> >>  %	\QuickQuizLabel{\QQname}
> >> +%
> >> +% When labeling a QQz inside \QuickQuizSeries{}, use
> >> +%	\QuickQuizLabelRel{\QQname}{rel}
> >> +% in front of the series.
> >> +%
> >>  % To reference the macro in the text:
> >>  %	\QuickQuizRef{\QQname}
> >>  % To reference the answer of the macro in the text:
> >> @@ -166,6 +171,11 @@
> >>  \newcommand{\QuickQuizLabel}[1]{
> >>  	\edef#1{\thechapter.\thequickquizctrP}
> >>  }
> >> +\newcommand{\QuickQuizLabelRel}[2]{
> >> +        \addtocounter{quickquizctrP}{#2}
> >> +	\QuickQuizLabel{#1}
> >> +	\addtocounter{quickquizctrP}{-#2}
> >> +}
> >>  \newcommand{\QuickQuizRef}[1]{%
> >>  	\hyperref[QQ.#1]{Quick Quiz~#1}%
> >>  }
> >> @@ -176,6 +186,12 @@
> >>  \newcommand{\QuickQuizLabel}[1]{
> >>  	\edef#1{\thechapter.\thequickquizctr}
> >>  }
> >> +\newcommand{\QuickQuizLabelRel}[2]{
> >> +        \addtocounter{quickquizctr}{#2}
> >> +	\QuickQuizLabel{#1}
> >> +	\addtocounter{quickquizctr}{-#2}
> >> +}
> >> +
> >>  \newcommand{\QuickQuizRef}[1]{%
> >>  	\hyperref[QQ.#1]{Quick Quiz~#1}%
> >>  }
> >>
> >> base-commit: 14440e232cc1b2580dc1a73f873dc29fe3aea02b
> >> -- 
> >> 2.25.1
> >>
> >>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-02-13 19:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-08  3:07 Question about Table E.1 Leonardo Brás
2023-02-08  3:41 ` Paul E. McKenney
2023-02-08  5:33   ` Leonardo Brás
2023-02-08  5:50   ` Leonardo Brás
2023-02-08  8:47   ` Akira Yokosawa
2023-02-08 10:26     ` Akira Yokosawa
2023-02-08 22:15       ` Paul E. McKenney
2023-02-08 23:49         ` Akira Yokosawa
2023-02-09 11:13           ` Akira Yokosawa
2023-02-09 15:12             ` [PATCH -perfbook] cpu: PoC of A QQz citing table in answer to another QQz (was Re: Question about Table E.1) Akira Yokosawa
2023-02-11  8:49               ` Leonardo Brás
2023-02-12  0:04                 ` Akira Yokosawa
2023-02-12  0:42               ` [PATCH -perfbook v2] cpu: Add a QQz on table E.1 Akira Yokosawa
2023-02-13  0:28                 ` Paul E. McKenney
2023-02-13  1:54                   ` Akira Yokosawa
2023-02-13 19:37                     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.