[PATCH -perfbook 0/4] Add script to check cleveref macro uses

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -perfbook 0/4] Add script to check cleveref macro uses
@ 2021-05-06 14:21 Akira Yokosawa
  2021-05-06 14:23 ` [PATCH -perfbook 1/4] cleverefcheck: Add check script of cleveref macro usage Akira Yokosawa
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Akira Yokosawa @ 2021-05-06 14:21 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Hi Paul,

Now that end-of-sentence periods are at the end of input lines,
wrong uses of \Cref{}, \cref{}, and their variants can be detected
by a not-that-intelligent script.

Patch 1/4 adds the set of scripts.
They also detect raw \ref{} and \lnref{} commands.

Patches 2/4--4/4 are updates of intro, cpu, and toolsoftrade
to satisfy the scripts.

Remaining updates of the other chapters will follow when they
are ready.

My plan is to add the check as a make target once the other updates
are finished.

In the meantime, you can run the check of each file by:

    ./utilities/cleverefcheck.pl count/count.tex

, or tree-wide:

    ./utilities/cleverefcheck.sh

        Thanks, Akira

--
Akira Yokosawa (4):
  cleverefcheck: Add check script of cleveref macro usage
  intro: Employ \cref{} and its variants
  cpu: Employ \cref{} and its variants
  toolsoftrade: Employ \cref{} and its variants

 cpu/cpu.tex                   |   6 +-
 cpu/hwfreelunch.tex           |  12 +-
 cpu/overheads.tex             |  26 +--
 cpu/overview.tex              |  32 +--
 cpu/swdesign.tex              |  18 +-
 intro/intro.tex               |  24 +-
 perfbook-lt.tex               |  21 ++
 toolsoftrade/toolsoftrade.tex | 412 +++++++++++++++++-----------------
 utilities/cleverefcheck.pl    |  96 ++++++++
 utilities/cleverefcheck.sh    |  23 ++
 10 files changed, 402 insertions(+), 268 deletions(-)
 create mode 100755 utilities/cleverefcheck.pl
 create mode 100755 utilities/cleverefcheck.sh

-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH -perfbook 1/4] cleverefcheck: Add check script of cleveref macro usage
  2021-05-06 14:21 [PATCH -perfbook 0/4] Add script to check cleveref macro uses Akira Yokosawa
@ 2021-05-06 14:23 ` Akira Yokosawa
  2021-05-06 14:24 ` [PATCH -perfbook 2/4] intro: Employ \cref{} and its variants Akira Yokosawa
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Akira Yokosawa @ 2021-05-06 14:23 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Now that end-of-sentence periods are at the end of input lines,
we can check the distinction of \Cref{} and \cref{} and their variants.
Add a set of scripts to do the check.

Raw uses of \ref{} and \lnref{} are also detected.

Exception:
o Patterns "or~\lnref{foo}" and "item~\ref{bar}" are ignored.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 utilities/cleverefcheck.pl | 96 ++++++++++++++++++++++++++++++++++++++
 utilities/cleverefcheck.sh | 23 +++++++++
 2 files changed, 119 insertions(+)
 create mode 100755 utilities/cleverefcheck.pl
 create mode 100755 utilities/cleverefcheck.sh

diff --git a/utilities/cleverefcheck.pl b/utilities/cleverefcheck.pl
new file mode 100755
index 00000000..963b055a
--- /dev/null
+++ b/utilities/cleverefcheck.pl
@@ -0,0 +1,96 @@
+#!/usr/bin/perl
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# Check LaTeX source of cleveref macros usages
+#
+# Assumptions:
+#    End-of-sentence fullstops are at the end of input lines.
+#
+# Copyright (C) Akira Yokosawa, 2021
+#
+# Authors: Akira Yokosawa <akiyks@gmail.com>
+
+use strict;
+use warnings;
+
+my $line;
+my $new_sentence = 1;
+my $line_num = 0;
+my $skip = 0;
+my $safe = 0;
+my $Verbatim_begin = qr/\\begin\{Verbatim/ ;
+my $Verbatim_end = qr/\\end\{Verbatim/ ;
+my $tabular_begin = qr/\\begin\{tabula/ ;
+my $tabular_end = qr/\\end\{tabula/ ;
+my $label_ptn = qr/(^\s*|\{)(,?[a-z]{3,4}:([a-zMPS]+:)?[^\},]+)(\}|,)/ ;
+
+sub check_line {
+    my $raw_line = $line;
+    $line =~ s/\\%/pct/g ;
+    if ($line =~ /$Verbatim_begin/ ||
+	$line =~ /$tabular_begin/) {
+	$skip = 1;
+    }
+    unless ($skip) {
+	$safe = 1;
+	if ($line =~ /^(?=[\s]*+[^%])[^%]*\\ref\{/ ||
+	    $line =~ /^(?=[\s]*+[^%])[^%]*\\pageref\{/ ||
+	    $line =~ /^(?=[\s]*+[^%])[^%]*\\lnref\{/) {
+	    $safe = 0;
+	    if ($line =~ /or~\\lnref\{/ ||
+		$line =~ /item~\\ref\{/) {
+		$safe = 1;
+	    }
+	}
+	if ($new_sentence &&
+	    ($line =~ /^\s*\\cref/ || $line =~ /^\s*\\cpageref/ ||
+	     $line =~ /^\s*\\clnref/)) {
+	    $safe = 0;
+	}
+	if ($line =~ /^\s*\\Cref/ || $line =~ /^\s*\\Cpageref/ ||
+	    $line =~ /^\s*\\Clnref/) {
+	    if ($new_sentence) {
+		$safe = 1;
+	    } else {
+		$safe = 0;
+	    }
+	}
+	if ($line =~ /^(?=[\s]*+[^%])[^%]*[^\s]+\s*\\Cref/ ||
+	    $line =~ /^(?=[\s]*+[^%])[^%]*[^\s]+\s*\\Cpageref/ ||
+	    $line =~ /^(?=[\s]*+[^%])[^%]*[^\s]+\s*\\Clnref/) {
+	    $safe = 0;
+	    if ($line =~ /^(?=[\s]*+[^%])[^%]*^\s*\\item\s+\\C/ ) {
+		$safe = 1;
+	    }
+	}
+	unless ($safe) {
+	    print $ARGV[0], ':', $line_num, ':', $raw_line;
+	}
+    }
+    if ($line =~ /$Verbatim_end/ ||
+	$line =~ /$tabular_end/) {
+	$skip = 0;
+	$new_sentence = 1;
+    } else {
+	unless ($line =~ /\\begin\{fcvref\}/ || $line =~ /\\end\{fcvref\}/ ||
+	    $line =~ /^\s*\}\s*$/ || $line =~ /^\s*%/) {
+	    if ($line =~ /^(?=[\s]*+[^%])[^%]*[\.\?!:]\s*[\)\}]*$/ ||
+		$line =~ /^(?=[\s]*+[^%])[^%]*[\.\?!:]\s*[\)\}]*%.*$/ ||
+		$line =~ /^\s*$/ || $line =~ /^\\paragraph\{/ ||
+		$line =~ /^\s*\\begin\{/ || $line =~ /^\s*\\end\{/ ||
+		$line =~ /^\\E?QuickQuiz/ || $line =~ /\\E?QuickQuizAnswer[BEM]?\{/ ) {
+		$new_sentence = 1;
+	    } else {
+		$new_sentence = 0;
+	    }
+	}
+    }
+}
+
+open(my $fh, '<:encoding(UTF-8)', $ARGV[0])
+    or die "Could not open file '$ARGV[0]' $!";
+
+while($line = <$fh>) {
+    $line_num++ ;
+    check_line();
+}
diff --git a/utilities/cleverefcheck.sh b/utilities/cleverefcheck.sh
new file mode 100755
index 00000000..6ff9f695
--- /dev/null
+++ b/utilities/cleverefcheck.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+tex_sources_all=`find . -name "*.tex" -print`
+tex_sources=""
+
+for f in $tex_sources_all
+do
+	case $f in
+	./perfbook*) ;;
+	./qqz*) ;;
+	./glsdict.tex) ;;
+	./origpub.tex) ;;
+	./contrib.tex) ;;
+	./future/HTMtable*) ;;
+	./appendix/styleguide*) ;;
+	*) tex_sources="$tex_sources $f" ;;
+        esac
+done
+
+for g in $tex_sources
+do
+	utilities/cleverefcheck.pl $g
+done
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH -perfbook 2/4] intro: Employ \cref{} and its variants
  2021-05-06 14:21 [PATCH -perfbook 0/4] Add script to check cleveref macro uses Akira Yokosawa
  2021-05-06 14:23 ` [PATCH -perfbook 1/4] cleverefcheck: Add check script of cleveref macro usage Akira Yokosawa
@ 2021-05-06 14:24 ` Akira Yokosawa
  2021-05-06 14:26 ` [PATCH -perfbook 3/4] cpu: " Akira Yokosawa
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Akira Yokosawa @ 2021-05-06 14:24 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 intro/intro.tex | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/intro/intro.tex b/intro/intro.tex
index 77e89f3c..b40abeb5 100644
--- a/intro/intro.tex
+++ b/intro/intro.tex
@@ -136,7 +136,7 @@ so that the aforementioned engineering discipline has evolved practical
 and effective strategies for handling it.
 In addition, hardware designers are increasingly aware of these issues,
 so perhaps future hardware will be more friendly to parallel software,
-as discussed in Section~\ref{sec:cpu:Hardware Free Lunch?}.
+as discussed in \cref{sec:cpu:Hardware Free Lunch?}.
 
 \QuickQuiz{
 	Come on now!!!
@@ -351,7 +351,7 @@ This change in focus is due to the fact that, although \IXr{Moore's Law}
 continues to deliver increases in transistor density, it has ceased to
 provide the traditional single-threaded performance increases.
 This can be seen in
-Figure~\ref{fig:intro:Clock-Frequency Trend for Intel CPUs}\footnote{
+\cref{fig:intro:Clock-Frequency Trend for Intel CPUs}\footnote{
 	This plot shows clock frequencies for newer CPUs theoretically
 	capable of retiring one or more instructions per clock, and MIPS
 	(millions of instructions per second, usually from the old
@@ -473,7 +473,7 @@ anything but insignificant for the Z80.
 
 The CSIRAC and the Z80 are two points in a long-term trend, as can be
 seen in
-Figure~\ref{fig:intro:MIPS per Die for Intel CPUs}.
+\cref{fig:intro:MIPS per Die for Intel CPUs}.
 This figure plots an approximation to computational power per die
 over the past four decades, showing an impressive six-order-of-magnitude
 increase over a period of forty years.
@@ -601,7 +601,7 @@ not yet exist.
 Until such a nirvana appears, it will be necessary to make engineering
 tradeoffs among performance, productivity, and generality.
 One such tradeoff is shown in
-Figure~\ref{fig:intro:Software Layers and Performance; Productivity; and Generality},
+\cref{fig:intro:Software Layers and Performance; Productivity; and Generality},
 which shows how productivity becomes increasingly important at the upper layers
 of the system stack,
 while performance and generality become increasingly important at the
@@ -633,7 +633,7 @@ many things besides driving nails.
 It should therefore be no surprise to see similar tradeoffs
 appear in the field of parallel computing.
 This tradeoff is shown schematically in
-Figure~\ref{fig:intro:Tradeoff Between Productivity and Generality}.
+\cref{fig:intro:Tradeoff Between Productivity and Generality}.
 Here, users~1, 2, 3, and 4 have specific jobs that they need the computer
 to help them with.
 The most productive possible language or environment for a given user is one
@@ -671,7 +671,7 @@ to the hardware system (for example, low-level languages such as
 assembly, C, C++, or Java) or to some abstraction (for example,
 Haskell, Prolog, or Snobol), as is shown by the circular region near
 the center of
-Figure~\ref{fig:intro:Tradeoff Between Productivity and Generality}.
+\cref{fig:intro:Tradeoff Between Productivity and Generality}.
 These languages can be considered to be general in the sense that they
 are equally ill-suited to the jobs required by users~1, 2, 3, and 4.
 In other words, their generality comes at the expense of
@@ -698,7 +698,7 @@ it is not always the best tool for the job.
 In order to properly consider alternatives to parallel programming,
 you must first decide on what exactly you expect the parallelism
 to do for you.
-As seen in Section~\ref{sec:intro:Parallel Programming Goals},
+As seen in \cref{sec:intro:Parallel Programming Goals},
 the primary goals of parallel programming are performance, productivity,
 and generality.
 Because this book is intended for developers working on
@@ -801,7 +801,7 @@ optimization, albeit one that is becoming much more attractive
 as parallel systems become cheaper and more readily available.
 However, it is wise to keep in mind that the speedup available from
 parallelism is limited to roughly the number of CPUs
-(but see Section~\ref{sec:SMPdesign:Beyond Partitioning}
+(but see \cref{sec:SMPdesign:Beyond Partitioning}
 for an interesting exception).
 In contrast, the speedup available from traditional single-threaded
 software optimizations can be much larger.
@@ -921,7 +921,7 @@ programmers must undertake that are not required of sequential programmers.
 We can then evaluate how well a given programming language or environment
 assists the developer with these tasks.
 These tasks fall into the four categories shown in
-Figure~\ref{fig:intro:Categories of Tasks Required of Parallel Programmers},
+\cref{fig:intro:Categories of Tasks Required of Parallel Programmers},
 each of which is covered in the following sections.
 
 \subsection{Work Partitioning}
@@ -1046,8 +1046,8 @@ of these synchronization mechanisms, for example locking vs.\@ transactional
 memory~\cite{McKenney2007PLOSTM}, but such elaboration is beyond the
 scope of this section.
 (See
-Sections~\ref{sec:future:Transactional Memory}
-and~\ref{sec:future:Hardware Transactional Memory}
+\cref{sec:future:Transactional Memory,%
+sec:future:Hardware Transactional Memory}
 for more information on transactional memory.)
 
 \QuickQuiz{
@@ -1134,7 +1134,7 @@ inter-partition communication, partitions the code accordingly,
 and finally maps data partitions and threads so as to maximize
 throughput while minimizing inter-thread communication,
 as shown in
-Figure~\ref{fig:intro:Ordering of Parallel-Programming Tasks}.
+\cref{fig:intro:Ordering of Parallel-Programming Tasks}.
 The developer can then
 consider each partition separately, greatly reducing the size
 of the relevant state space, in turn increasing productivity.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH -perfbook 3/4] cpu: Employ \cref{} and its variants
  2021-05-06 14:21 [PATCH -perfbook 0/4] Add script to check cleveref macro uses Akira Yokosawa
  2021-05-06 14:23 ` [PATCH -perfbook 1/4] cleverefcheck: Add check script of cleveref macro usage Akira Yokosawa
  2021-05-06 14:24 ` [PATCH -perfbook 2/4] intro: Employ \cref{} and its variants Akira Yokosawa
@ 2021-05-06 14:26 ` Akira Yokosawa
  2021-05-06 14:27 ` [PATCH -perfbook 4/4] toolsoftrade: " Akira Yokosawa
  2021-05-06 17:52 ` [PATCH -perfbook 0/4] Add script to check cleveref macro uses Paul E. McKenney
  4 siblings, 0 replies; 6+ messages in thread
From: Akira Yokosawa @ 2021-05-06 14:26 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 cpu/cpu.tex         |  6 +++---
 cpu/hwfreelunch.tex | 12 ++++++------
 cpu/overheads.tex   | 26 +++++++++++++-------------
 cpu/overview.tex    | 32 ++++++++++++++++----------------
 cpu/swdesign.tex    | 18 +++++++-----------
 5 files changed, 45 insertions(+), 49 deletions(-)

diff --git a/cpu/cpu.tex b/cpu/cpu.tex
index 44b470a4..183feb8c 100644
--- a/cpu/cpu.tex
+++ b/cpu/cpu.tex
@@ -63,11 +63,11 @@ So, to sum up:
 The remainder of this book describes ways of handling this bad news.
 
 In particular,
-Chapter~\ref{chp:Tools of the Trade} will cover some of the low-level
+\cref{chp:Tools of the Trade} will cover some of the low-level
 tools used for parallel programming,
-Chapter~\ref{chp:Counting} will investigate problems and solutions to
+\cref{chp:Counting} will investigate problems and solutions to
 parallel counting, and
-Chapter~\ref{cha:Partitioning and Synchronization Design}
+\cref{cha:Partitioning and Synchronization Design}
 will discuss design disciplines that promote performance and scalability.
 
 \QuickQuizAnswersChp{qqzcpu}
diff --git a/cpu/hwfreelunch.tex b/cpu/hwfreelunch.tex
index c4a9b495..92f04f16 100644
--- a/cpu/hwfreelunch.tex
+++ b/cpu/hwfreelunch.tex
@@ -17,8 +17,8 @@ induced single-threaded
 performance increases
 (or ``free lunch''~\cite{HerbSutter2008EffectiveConcurrency}),
 as shown in
-Figure~\ref{fig:intro:Clock-Frequency Trend for Intel CPUs} on
-page~\pageref{fig:intro:Clock-Frequency Trend for Intel CPUs}.
+\cref{fig:intro:Clock-Frequency Trend for Intel CPUs} on
+\cpageref{fig:intro:Clock-Frequency Trend for Intel CPUs}.
 This section briefly surveys a few ways that hardware designers
 might bring back the ``free lunch''.
 
@@ -27,8 +27,8 @@ obstacles to exploiting concurrency.
 One severe physical limitation that hardware designers face is the
 finite speed of light.
 As noted in
-Figure~\ref{fig:cpu:System Hardware Architecture} on
-page~\pageref{fig:cpu:System Hardware Architecture},
+\cref{fig:cpu:System Hardware Architecture} on
+\cpageref{fig:cpu:System Hardware Architecture},
 light can manage only about an 8-centimeters round trip in a vacuum
 during the duration of a 1.8\,GHz clock period.
 This distance drops to about 3~centimeters for a 5\,GHz clock.
@@ -140,7 +140,7 @@ significant fabrication challenges~\cite{JohnKnickerbocker2008:3DI}.
 
 Perhaps the most important benefit of 3DI is decreased path length through
 the system, as shown in
-Figure~\ref{fig:cpu:Latency Benefit of 3D Integration}.
+\cref{fig:cpu:Latency Benefit of 3D Integration}.
 A 3-centimeter silicon die is replaced with a stack of four 1.5-centimeter
 dies, in theory decreasing the maximum path through the system by a factor
 of two, keeping in mind that each layer is quite thin.
@@ -266,7 +266,7 @@ The purpose of these accelerators is to improve energy efficiency
 and thus extend battery life: special purpose hardware can often
 compute more efficiently than can a general-purpose CPU\@.
 This is another example of the principle called out in
-Section~\ref{sec:intro:Generality}: \IX{Generality} is almost never free.
+\cref{sec:intro:Generality}: \IX{Generality} is almost never free.
 
 Nevertheless, given the end of \IXaltr{Moore's-Law}{Moore's Law}-induced
 single-threaded performance increases, it seems safe to assume that
diff --git a/cpu/overheads.tex b/cpu/overheads.tex
index 3505b4f4..cb6b4f83 100644
--- a/cpu/overheads.tex
+++ b/cpu/overheads.tex
@@ -24,7 +24,7 @@ architecture, which is the subject of the next section.
 \label{fig:cpu:System Hardware Architecture}
 \end{figure}
 
-Figure~\ref{fig:cpu:System Hardware Architecture}
+\Cref{fig:cpu:System Hardware Architecture}
 shows a rough schematic of an eight-core computer system.
 Each die has a pair of CPU cores, each with its cache, as well as an
 interconnect allowing the pair of CPUs to communicate with each other.
@@ -116,7 +116,7 @@ events might ensue:
 This simplified sequence is just the beginning of a discipline called
 \emph{cache-coherency protocols}~\cite{Hennessy95a,DavidECuller1999,MiloMKMartin2012scale,DanielJSorin2011MemModel},
 which is discussed in more detail in
-Appendix~\ref{chp:app:whymb:Why Memory Barriers?}.
+\cref{chp:app:whymb:Why Memory Barriers?}.
 As can be seen in the sequence of events triggered by a \IXacr{cas} operation,
 a single instruction can cause considerable protocol traffic, which
 can significantly degrade your parallel program's performance.
@@ -126,7 +126,7 @@ interval during which it is never updated, that variable can be replicated
 across all CPUs' caches.
 This replication permits all CPUs to enjoy extremely fast access to
 this \emph{read-mostly} variable.
-Chapter~\ref{chp:Deferred Processing} presents synchronization
+\Cref{chp:Deferred Processing} presents synchronization
 mechanisms that take full advantage of this important hardware read-mostly
 optimization.
 
@@ -174,7 +174,7 @@ optimization.
 
 The overheads of some common operations important to parallel programs are
 displayed in
-Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}.
+\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}.
 This system's clock period rounds to 0.5\,ns.
 Although it is not unusual for modern microprocessors to be able to
 retire multiple instructions per clock period, the operations' costs are
@@ -351,16 +351,16 @@ thousand clock cycles.
 	10\,GHz.
 
 	In addition,
-	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
+	\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	on
-	page~\pageref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
+	\cpageref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	represents a reasonably large system with no fewer 448~hardware
 	threads.
 	Smaller systems often achieve better latency, as may be seen in
-	Table~\ref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System},
+	\cref{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System},
 	which represents a much smaller system with only 16 hardware threads.
 	A similar view is provided by the rows of
-	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
+	\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	down to and including the two ``Off-core'' rows.
 
 \begin{table*}
@@ -404,7 +404,7 @@ thousand clock cycles.
 	Alternatively, a 64-CPU system in the mid 1990s had
 	cross-interconnect latencies in excess of five microseconds,
 	so even the eight-socket 448-hardware-thread monster shown in
-	Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
+	\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 	represents more than a five-fold improvement over its
 	25-years-prior counterparts.
 
@@ -420,7 +420,7 @@ thousand clock cycles.
 	for concurrent software, even when running on relatively
 	small systems.
 
-	Section~\ref{sec:cpu:Hardware Free Lunch?}
+	\Cref{sec:cpu:Hardware Free Lunch?}
 	looks at what else hardware designers might be
 	able to do to ease the plight of parallel programmers.
 }\QuickQuizEnd
@@ -543,7 +543,7 @@ of store instructions to execute quickly even when the stores are to
 non-consecutive addresses and when none of the needed cachelines are
 present in the CPU's cache.
 The dark side of this optimization is memory misordering, for which see
-Chapter~\ref{chp:Advanced Synchronization: Memory Ordering}.
+\cref{chp:Advanced Synchronization: Memory Ordering}.
 
 A fourth hardware optimization is speculative execution, which can
 allow the hardware to make good use of the store buffers without
@@ -571,7 +571,7 @@ data that is frequently read but rarely updated is present in all
 CPUs' caches.
 This optimization allows the read-mostly data to be accessed
 exceedingly efficiently, and is the subject of
-Chapter~\ref{chp:Deferred Processing}.
+\cref{chp:Deferred Processing}.
 
 \begin{figure}
 \centering
@@ -583,7 +583,7 @@ Chapter~\ref{chp:Deferred Processing}.
 In short, hardware and software engineers are really on the same side,
 with both trying to make computers go fast despite the best efforts of
 the laws of physics, as fancifully depicted in
-Figure~\ref{fig:cpu:Hardware and Software: On Same Side}
+\cref{fig:cpu:Hardware and Software: On Same Side}
 where our data stream is trying its best to exceed the speed of light.
 The next section discusses some additional things that the hardware engineers
 might (or might not) be able to do, depending on how well recent
diff --git a/cpu/overview.tex b/cpu/overview.tex
index 95899b35..d1d23b36 100644
--- a/cpu/overview.tex
+++ b/cpu/overview.tex
@@ -10,7 +10,7 @@
 
 Careless reading of computer-system specification sheets might lead one
 to believe that CPU performance is a footrace on a clear track, as
-illustrated in Figure~\ref{fig:cpu:CPU Performance at its Best},
+illustrated in \cref{fig:cpu:CPU Performance at its Best},
 where the race always goes to the swiftest.
 
 \begin{figure}
@@ -21,7 +21,7 @@ where the race always goes to the swiftest.
 \end{figure}
 
 Although there are a few CPU-bound benchmarks that approach the ideal case
-shown in Figure~\ref{fig:cpu:CPU Performance at its Best},
+shown in \cref{fig:cpu:CPU Performance at its Best},
 the typical program more closely resembles an obstacle course than
 a race track.
 This is because the internal architecture of CPUs has changed dramatically
@@ -53,7 +53,7 @@ Some cores have more than one hardware thread, which is variously called
 each of which appears as
 an independent CPU to software, at least from a functional viewpoint.
 These modern hardware features can greatly improve performance, as
-illustrated by Figure~\ref{fig:cpu:CPUs Old and New}.
+illustrated by \cref{fig:cpu:CPUs Old and New}.
 
 Achieving full performance with a CPU having a long pipeline requires
 highly predictable control flow through the program.
@@ -90,7 +90,7 @@ speculatively executed instructions following the corresponding
 branch, resulting in a pipeline flush.
 If pipeline flushes appear too frequently, they drastically reduce
 overall performance, as fancifully depicted in
-Figure~\ref{fig:cpu:CPU Meets a Pipeline Flush}.
+\cref{fig:cpu:CPU Meets a Pipeline Flush}.
 
 \begin{figure}
 \centering
@@ -168,7 +168,7 @@ have extremely unpredictable memory-access patterns---after all,
 if the pattern was predictable, us software types would not bother
 with the pointers, right?
 Therefore, as shown in
-Figure~\ref{fig:cpu:CPU Meets a Memory Reference},
+\cref{fig:cpu:CPU Meets a Memory Reference},
 memory references often pose severe obstacles to modern CPUs.
 
 Thus far, we have only been considering obstacles that can arise during
@@ -211,7 +211,7 @@ store buffer, without the need to wait for cacheline ownership.
 Although there are a number of hardware optimizations that can sometimes
 hide cache latencies, the resulting effect on performance is all too
 often as depicted in
-Figure~\ref{fig:cpu:CPU Meets an Atomic Operation}.
+\cref{fig:cpu:CPU Meets an Atomic Operation}.
 
 Unfortunately, atomic operations usually apply only to single elements
 of data.
@@ -234,20 +234,20 @@ as described in the next section.
 	By early 2014, several mainstream systems provided limited
 	hardware transactional memory implementations, which is covered
 	in more detail in
-	Section~\ref{sec:future:Hardware Transactional Memory}.
+	\cref{sec:future:Hardware Transactional Memory}.
 	The jury is still out on the applicability of software transactional
 	memory~\cite{McKenney2007PLOSTM,DonaldEPorter2007TRANSACT,
 	ChistopherJRossbach2007a,CalinCascaval2008tmtoy,
 	AleksandarDragovejic2011STMnotToy,AlexanderMatveev2012PessimisticTM},
-	which is covered in Section~\ref{sec:future:Transactional Memory}.
+	which is covered in \cref{sec:future:Transactional Memory}.
 }\QuickQuizEnd
 
 \subsection{Memory Barriers}
 \label{sec:cpu:Memory Barriers}
 
 Memory barriers will be considered in more detail in
-Chapter~\ref{chp:Advanced Synchronization: Memory Ordering} and
-Appendix~\ref{chp:app:whymb:Why Memory Barriers?}.
+\cref{chp:Advanced Synchronization: Memory Ordering} and
+\cref{chp:app:whymb:Why Memory Barriers?}.
 In the meantime, consider the following simple lock-based \IX{critical
 section}:
 
@@ -273,7 +273,7 @@ either explicit or implicit memory barriers.
 Because the whole purpose of these memory barriers is to prevent reorderings
 that the CPU would otherwise undertake in order to increase performance,
 memory barriers almost always reduce performance, as depicted in
-Figure~\ref{fig:cpu:CPU Meets a Memory Barrier}.
+\cref{fig:cpu:CPU Meets a Memory Barrier}.
 
 As with atomic operations, CPU designers have been working hard to
 reduce memory-barrier overhead, and have made substantial progress.
@@ -299,9 +299,9 @@ This is because when a given CPU wishes to modify the variable, it is
 most likely the case that some other CPU has modified it recently.
 In this case, the variable will be in that other CPU's cache, but not
 in this CPU's cache, which will therefore incur an expensive cache miss
-(see Section~\ref{sec:app:whymb:Cache Structure} for more detail).
+(see \cref{sec:app:whymb:Cache Structure} for more detail).
 Such cache misses form a major obstacle to CPU performance, as shown
-in Figure~\ref{fig:cpu:CPU Meets a Cache Miss}.
+in \cref{fig:cpu:CPU Meets a Cache Miss}.
 
 \QuickQuiz{
 	So have CPU designers also greatly reduced the overhead of
@@ -312,7 +312,7 @@ in Figure~\ref{fig:cpu:CPU Meets a Cache Miss}.
 	but the finite speed of light and the atomic nature of
 	matter limits their ability to reduce cache-miss overhead
 	for larger systems.
-	Section~\ref{sec:cpu:Hardware Free Lunch?}
+	\Cref{sec:cpu:Hardware Free Lunch?}
 	discusses some possible avenues for possible future progress.
 }\QuickQuizEnd
 
@@ -332,7 +332,7 @@ I/O operations involving networking, mass storage, or (worse yet) human
 beings pose much greater obstacles than the internal obstacles called
 out in the prior sections,
 as illustrated by
-Figure~\ref{fig:cpu:CPU Waits for I/O Completion}.
+\cref{fig:cpu:CPU Waits for I/O Completion}.
 
 This is one of the differences between shared-memory and distributed-system
 parallelism: shared-memory parallel programs must normally deal with no
@@ -345,7 +345,7 @@ that of the actual work being performed is a key design parameter.
 A major goal of parallel hardware design is to reduce this ratio as
 needed to achieve the relevant performance and scalability goals.
 In turn, as will be seen in
-Chapter~\ref{cha:Partitioning and Synchronization Design},
+\cref{cha:Partitioning and Synchronization Design},
 a major goal of parallel software design is to reduce the
 frequency of expensive operations like communications cache misses.
 
diff --git a/cpu/swdesign.tex b/cpu/swdesign.tex
index c59a70eb..63b6222f 100644
--- a/cpu/swdesign.tex
+++ b/cpu/swdesign.tex
@@ -12,7 +12,7 @@
 	 {\emph{Ella Wheeler Wilcox}}
 
 The values of the ratios in
-Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
+\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 are critically important, as they limit the
 efficiency of a given parallel application.
 To see this, suppose that the parallel application uses \IXacr{cas}
@@ -51,7 +51,7 @@ be extremely infrequent and to enable very large quantities of processing.
 		cache-miss latencies than do smaller system.
 		To see this, compare
 		\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
-		on page~\pageref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
+		on \cpageref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz}
 		with
 		\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 12-CPU Intel Core i7-8750H CPU @ 2.20GHz}.
 	\item	The distributed-systems communications operations do
@@ -88,20 +88,16 @@ One approach is to run nearly independent threads.
 The less frequently the threads communicate, whether by \IX{atomic} operations,
 locks, or explicit messages, the better the application's performance
 and scalability will be.
-This approach will be touched on in
-Chapter~\ref{chp:Counting},
-explored in
-Chapter~\ref{cha:Partitioning and Synchronization Design},
-and taken to its logical extreme in
-Chapter~\ref{chp:Data Ownership}.
+This approach will be touched on in \cref{chp:Counting},
+explored in \cref{cha:Partitioning and Synchronization Design},
+and taken to its logical extreme in \cref{chp:Data Ownership}.
 
 Another approach is to make sure that any sharing be read-mostly, which
 allows the CPUs' caches to replicate the read-mostly data, in turn
 allowing all CPUs fast access.
 This approach is touched on in
-Section~\ref{sec:count:Eventually Consistent Implementation},
-and explored more deeply in
-Chapter~\ref{chp:Deferred Processing}.
+\cref{sec:count:Eventually Consistent Implementation},
+and explored more deeply in \cref{chp:Deferred Processing}.
 
 In short, achieving excellent parallel performance and scalability means
 striving for embarrassingly parallel algorithms and implementations,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH -perfbook 4/4] toolsoftrade: Employ \cref{} and its variants
  2021-05-06 14:21 [PATCH -perfbook 0/4] Add script to check cleveref macro uses Akira Yokosawa
                   ` (2 preceding siblings ...)
  2021-05-06 14:26 ` [PATCH -perfbook 3/4] cpu: " Akira Yokosawa
@ 2021-05-06 14:27 ` Akira Yokosawa
  2021-05-06 17:52 ` [PATCH -perfbook 0/4] Add script to check cleveref macro uses Paul E. McKenney
  4 siblings, 0 replies; 6+ messages in thread
From: Akira Yokosawa @ 2021-05-06 14:27 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Also define \clnrefr{} and \Clnrefr{} to be used instead of
"line~\ref{ln:xxx:yyy}", whose argument is a (list of) raw line label(s).

They can be used outside of "fcvref" environments.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 perfbook-lt.tex               |  21 ++
 toolsoftrade/toolsoftrade.tex | 412 +++++++++++++++++-----------------
 2 files changed, 226 insertions(+), 207 deletions(-)

diff --git a/perfbook-lt.tex b/perfbook-lt.tex
index 80d7fb0c..6e7d8d1f 100644
--- a/perfbook-lt.tex
+++ b/perfbook-lt.tex
@@ -341,8 +341,28 @@
   \docsvlist{#1}% Process list
   \fi%
 }
+\newcommand{\clnrefpraw}[2]{%
+  \setcounter{lblcount}{0}% Restart label count
+  \renewcommand*{\do}[1]{\stepcounter{lblcount}}% Count label
+  \docsvlist{#1}% Process list and count labels
+  \def\nextitem{\def\nextitem{, }}% Separator
+  \ifnum\value{lblcount}=1 #2~\lnrefraw{#1}%
+  \else\ifnum\value{lblcount}=2 {#2}s~%
+  \renewcommand*{\do}[1]{%
+    \addtocounter{lblcount}{-1}%
+    \ifnum\value{lblcount}=0 { }and~\else\nextitem\fi\lnrefraw{##1}}% How to process each label
+  \else {#2}s~%
+  \renewcommand*{\do}[1]{%
+    \addtocounter{lblcount}{-1}%
+    \ifnum\value{lblcount}=0 , and~\else\nextitem\fi\lnrefraw{##1}}% How to process each label
+  \fi%
+  \docsvlist{#1}% Process list
+  \fi%
+}
 \newcommand{\clnref}[1]{\clnrefp{#1}{line}}
 \newcommand{\Clnref}[1]{\clnrefp{#1}{Line}}
+\newcommand{\clnrefr}[1]{\clnrefpraw{#1}{line}}
+\newcommand{\Clnrefr}[1]{\clnrefpraw{#1}{Line}}
 \newcommand{\clnrefrange}[2]{lines~\lnref{#1}\==\lnref{#2}}
 \newcommand{\Clnrefrange}[2]{Lines~\lnref{#1}\==\lnref{#2}}
 \newcommand{\clnrefthro}[2]{lines~\lnref{#1} through~\lnref{#2}}
@@ -497,6 +517,7 @@
 }
 \newcommand{\lnrefbase}{}
 \newcommand{\lnref}[1]{\ref{\lnrefbase:#1}}
+\newcommand{\lnrefraw}[1]{\ref{#1}}
 
 \newenvironment{fcvlabel}[1][]{\renewcommand{\lnlblbase}{#1}%
 \ignorespaces}{\ignorespacesafterend}
diff --git a/toolsoftrade/toolsoftrade.tex b/toolsoftrade/toolsoftrade.tex
index f9a8ee90..45d32c79 100644
--- a/toolsoftrade/toolsoftrade.tex
+++ b/toolsoftrade/toolsoftrade.tex
@@ -10,14 +10,14 @@
 This chapter provides a brief introduction to some basic tools of the
 parallel-programming trade, focusing mainly on those available to
 user applications running on operating systems similar to Linux.
-Section~\ref{sec:toolsoftrade:Scripting Languages} begins with
+\Cref{sec:toolsoftrade:Scripting Languages} begins with
 scripting languages,
-Section~\ref{sec:toolsoftrade:POSIX Multiprocessing}
+\cref{sec:toolsoftrade:POSIX Multiprocessing}
 describes the multi-process parallelism supported by the POSIX API and
 touches on POSIX threads,
-Section~\ref{sec:toolsoftrade:Alternatives to POSIX Operations}
+\cref{sec:toolsoftrade:Alternatives to POSIX Operations}
 presents analogous operations in other environments, and finally,
-Section~\ref{sec:toolsoftrade:The Right Tool for the Job: How to Choose?}
+\cref{sec:toolsoftrade:The Right Tool for the Job: How to Choose?}
 helps to choose the tool that will get the job done.
 
 \QuickQuiz{
@@ -59,16 +59,15 @@ This can be accomplished using UNIX shell scripting as follows:
 \end{figure}
 
 \begin{fcvref}[ln:toolsoftrade:parallel:compute_it]
-Lines~\lnref{comp1} and~\lnref{comp2} launch two instances of this
+\Clnref{comp1,comp2} launch two instances of this
 program, redirecting their
 output to two separate files, with the \co{&} character directing the
 shell to run the two instances of the program in the background.
-Line~\lnref{wait} waits for both instances to complete, and
-lines~\lnref{cat1} and~\lnref{cat2}
-display their output.
+\Clnref{wait} waits for both instances to complete, and
+\clnref{cat1,cat2} display their output.
 \end{fcvref}
 The resulting execution is as shown in
-Figure~\ref{fig:toolsoftrade:Execution Diagram for Parallel Shell Execution}:
+\cref{fig:toolsoftrade:Execution Diagram for Parallel Shell Execution}:
 the two instances of \co{compute_it} execute in parallel,
 \co{wait} completes after both of them do, and then the two instances
 of \co{cat} execute sequentially.
@@ -146,7 +145,7 @@ programming need not always be complex or difficult.
 	in a coarse-grained form.
 	If not, they should consider using other parallel-programming
 	environments, such as those discussed in
-	Section~\ref{sec:toolsoftrade:POSIX Multiprocessing}.
+	\cref{sec:toolsoftrade:POSIX Multiprocessing}.
 }\QuickQuizEnd
 
 \section{POSIX Multiprocessing}
@@ -157,13 +156,13 @@ programming need not always be complex or difficult.
 This section scratches the surface of the
 POSIX environment, including pthreads~\cite{OpenGroup1997pthreads},
 as this environment is readily available and widely implemented.
-Section~\ref{sec:toolsoftrade:POSIX Process Creation and Destruction}
+\Cref{sec:toolsoftrade:POSIX Process Creation and Destruction}
 provides a glimpse of the POSIX \co{fork()} and related primitives,
-Section~\ref{sec:toolsoftrade:POSIX Thread Creation and Destruction}
+\cref{sec:toolsoftrade:POSIX Thread Creation and Destruction}
 touches on thread creation and destruction,
-Section~\ref{sec:toolsoftrade:POSIX Locking} gives a brief overview
+\cref{sec:toolsoftrade:POSIX Locking} gives a brief overview
 of POSIX locking, and, finally,
-Section~\ref{sec:toolsoftrade:POSIX Reader-Writer Locking} describes a
+\cref{sec:toolsoftrade:POSIX Reader-Writer Locking} describes a
 specific lock which can be used for data that is read by many threads and only
 occasionally updated.
 
@@ -212,18 +211,18 @@ If \co{fork()} succeeds, it returns twice, once for the parent
 and again for the child.
 The value returned from \co{fork()} allows the caller to tell
 the difference, as shown in
-Listing~\ref{lst:toolsoftrade:Using the fork() Primitive}
+\cref{lst:toolsoftrade:Using the fork() Primitive}
 (\path{forkjoin.c}).
-Line~\lnref{fork} executes the \co{fork()} primitive, and saves
+\Clnref{fork} executes the \co{fork()} primitive, and saves
 its return value in local variable \co{pid}.
-Line~\lnref{if} checks to see if \co{pid} is zero, in which case,
-this is the child, which continues on to execute line~\lnref{child}.
+\Clnref{if} checks to see if \co{pid} is zero, in which case,
+this is the child, which continues on to execute \clnref{child}.
 As noted earlier, the child may terminate via the \co{exit()} primitive.
 Otherwise, this is the parent, which checks for an error return from
-the \co{fork()} primitive on line~\lnref{else}, and prints an error
+the \co{fork()} primitive on \clnref{else}, and prints an error
 and exits on \clnrefrange{errora}{errorb} if so.
 Otherwise, the \co{fork()} has executed successfully, and the parent
-therefore executes line~\lnref{parent} with the variable \co{pid}
+therefore executes \clnref{parent} with the variable \co{pid}
 containing the process ID of the child.
 \end{fcvref}
 
@@ -241,20 +240,20 @@ counterpart, as each invocation of \co{wait()} waits for but one child
 process.
 It is therefore customary to wrap \co{wait()} into a function similar
 to the \apipx{waitall()} function shown in
-Listing~\ref{lst:toolsoftrade:Using the wait() Primitive}
+\cref{lst:toolsoftrade:Using the wait() Primitive}
 (\path{api-pthreads.h}),
 with this \co{waitall()} function having semantics similar to the
 shell-script \co{wait} command.
 Each pass through the loop spanning \clnrefrange{loopa}{loopb}
 waits on one child process.
-Line~\lnref{wait} invokes the \co{wait()} primitive, which blocks
+\Clnref{wait} invokes the \co{wait()} primitive, which blocks
 until a child process exits, and returns that child's process ID\@.
 If the process ID is instead $-1$, this indicates that the \co{wait()}
 primitive was unable to wait on a child.
-If so, line~\lnref{ECHILD} checks for the \co{ECHILD} errno, which
+If so, \clnref{ECHILD} checks for the \co{ECHILD} errno, which
 indicates that there are no more child processes, so that
-line~\lnref{break} exits the loop.
-Otherwise, lines~\lnref{perror} and~\lnref{exit} print an error and exit.
+\clnref{break} exits the loop.
+Otherwise, \clnref{perror,exit} print an error and exit.
 \end{fcvref}
 
 \QuickQuiz{
@@ -266,7 +265,7 @@ Otherwise, lines~\lnref{perror} and~\lnref{exit} print an error and exit.
 	child individually.
 	In addition, some parallel applications need to detect the
 	reason that the child died.
-	As we saw in Listing~\ref{lst:toolsoftrade:Using the wait() Primitive},
+	As we saw in \cref{lst:toolsoftrade:Using the wait() Primitive},
 	it is not hard to build a \co{waitall()} function out of
 	the \co{wait()} function, but it would be impossible to
 	do the reverse.
@@ -283,12 +282,12 @@ Otherwise, lines~\lnref{perror} and~\lnref{exit} print an error and exit.
 It is critically important to note that the parent and child do \emph{not}
 share memory.
 This is illustrated by the program shown in
-Listing~\ref{lst:toolsoftrade:Processes Created Via fork() Do Not Share Memory}
+\cref{lst:toolsoftrade:Processes Created Via fork() Do Not Share Memory}
 (\path{forkjoinvar.c}),
-in which the child sets a global variable \co{x} to 1 on line~\lnref{setx},
-prints a message on line~\lnref{print:c}, and exits on line~\lnref{exit:s}.
-The parent continues at line~\lnref{waitall}, where it waits on the child,
-and on line~\lnref{print:p} finds that its copy of the variable \co{x} is
+in which the child sets a global variable \co{x} to 1 on \clnref{setx},
+prints a message on \clnref{print:c}, and exits on \clnref{exit:s}.
+The parent continues at \clnref{waitall}, where it waits on the child,
+and on \clnref{print:p} finds that its copy of the variable \co{x} is
 still zero.
 The output is thus as follows:
 \end{fcvref}
@@ -314,7 +313,7 @@ Parent process sees x=0
 	source code of the Linux-kernel implementations themselves.
 
 	It is important to note that the parent process in
-	Listing~\ref{lst:toolsoftrade:Processes Created Via fork() Do Not Share Memory}
+	\cref{lst:toolsoftrade:Processes Created Via fork() Do Not Share Memory}
 	waits until after the child terminates to do its \co{printf()}.
 	Using \co{printf()}'s buffered I/O concurrently to the same file
 	from multiple processes is non-trivial, and is best avoided.
@@ -327,7 +326,7 @@ Parent process sees x=0
 
 The finest-grained parallelism requires shared memory, and
 this is covered in
-Section~\ref{sec:toolsoftrade:POSIX Thread Creation and Destruction}.
+\cref{sec:toolsoftrade:POSIX Thread Creation and Destruction}.
 That said, shared-memory parallelism can be significantly more complex
 than fork-join parallelism.
 
@@ -337,8 +336,8 @@ than fork-join parallelism.
 \begin{fcvref}[ln:toolsoftrade:pcreate:mythread]
 To create a thread within an existing process, invoke the
 \apipx{pthread_create()} primitive, for example, as shown on
-lines~\lnref{create:a} and~\lnref{create:b} of
-Listing~\ref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory}
+\clnref{create:a,create:b} of
+\cref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory}
 (\path{pcreate.c}).
 The first argument is a pointer to a \co{pthread_t} in which to store the
 ID of the thread to be created, the second \co{NULL} argument is a pointer
@@ -359,7 +358,7 @@ call \apipx{pthread_exit()}.
 
 \QuickQuiz{
 	If the \co{mythread()} function in
-	Listing~\ref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory}
+	\cref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory}
 	can simply return, why bother with \co{pthread_exit()}?
 }\QuickQuizAnswer{
 	In this simple example, there is no reason whatsoever.
@@ -371,7 +370,7 @@ call \apipx{pthread_exit()}.
 }\QuickQuizEnd
 
 \begin{fcvref}[ln:toolsoftrade:pcreate:mythread]
-The \apipx{pthread_join()} primitive, shown on line~\lnref{join},
+The \apipx{pthread_join()} primitive, shown on \clnref{join},
 is analogous to
 the fork-join \apipx{wait()} primitive.
 It blocks until the thread specified by the \co{tid} variable completes
@@ -385,7 +384,7 @@ how the thread in question exits.
 \end{fcvref}
 
 The program shown in
-Listing~\ref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory}
+\cref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory}
 produces output as follows, demonstrating that memory is in fact
 shared between the two threads:
 
@@ -470,7 +469,7 @@ lock~\cite{Hoare74}.
 	by many threads, and only occasionally updated'', then
 	POSIX reader-writer locks might be what you are looking for.
 	These are introduced in
-	Section~\ref{sec:toolsoftrade:POSIX Reader-Writer Locking}.
+	\cref{sec:toolsoftrade:POSIX Reader-Writer Locking}.
 
 	Another way to get the effect of multiple threads holding
 	the same lock is for one thread to acquire the lock, and
@@ -489,18 +488,18 @@ lock~\cite{Hoare74}.
 
 \begin{fcvref}[ln:toolsoftrade:lock:reader_writer]
 This exclusive-locking property is demonstrated using the code shown in
-Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}
+\cref{lst:toolsoftrade:Demonstration of Exclusive Locks}
 (\path{lock.c}).
-Line~\lnref{lock_a} defines and initializes a POSIX lock named \co{lock_a}, while
-line~\lnref{lock_b} similarly defines and initializes a lock named \co{lock_b}.
-Line~\lnref{x} defines and initializes a shared variable~\co{x}.
+\Clnref{lock_a} defines and initializes a POSIX lock named \co{lock_a}, while
+\clnref{lock_b} similarly defines and initializes a lock named \co{lock_b}.
+\Clnref{x} defines and initializes a shared variable~\co{x}.
 \end{fcvref}
 
 \begin{fcvref}[ln:toolsoftrade:lock:reader_writer:reader]
 \Clnrefrange{b}{e} define a function \co{lock_reader()} which repeatedly
 reads the shared variable \co{x} while holding
 the lock specified by \co{arg}.
-Line~\lnref{cast} casts \co{arg} to a pointer to a \co{pthread_mutex_t}, as
+\Clnref{cast} casts \co{arg} to a pointer to a \co{pthread_mutex_t}, as
 required by the \co{pthread_mutex_lock()} and \co{pthread_mutex_unlock()}
 primitives.
 \end{fcvref}
@@ -508,8 +507,8 @@ primitives.
 \QuickQuizSeries{%
 \QuickQuizB{
 	Why not simply make the argument to \co{lock_reader()}
-	on line~\ref{ln:toolsoftrade:lock:reader_writer:reader:b} of
-	Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}
+	on \clnrefr{ln:toolsoftrade:lock:reader_writer:reader:b} of
+	\cref{lst:toolsoftrade:Demonstration of Exclusive Locks}
 	be a pointer to a \co{pthread_mutex_t}?
 }\QuickQuizAnswerB{
 	Because we will need to pass \co{lock_reader()} to
@@ -522,9 +521,9 @@ primitives.
 \QuickQuizE{
 	\begin{fcvref}[ln:toolsoftrade:lock:reader_writer]
 	What is the \apik{READ_ONCE()} on
-        lines~\lnref{reader:read_x} and~\lnref{writer:inc} and the
-	\apik{WRITE_ONCE()} on line~\lnref{writer:inc} of
-	Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}?
+	\clnref{reader:read_x,writer:inc} and the
+	\apik{WRITE_ONCE()} on \clnref{writer:inc} of
+	\cref{lst:toolsoftrade:Demonstration of Exclusive Locks}?
 	\end{fcvref}
 }\QuickQuizAnswerE{
 	These macros constrain the compiler so as to prevent it from
@@ -534,15 +533,15 @@ primitives.
 	reordering of accesses to a given single variable.
 	Note that this single-variable constraint does apply to the
 	code shown in
-	Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}
+	\cref{lst:toolsoftrade:Demonstration of Exclusive Locks}
 	because only the variable \co{x} is accessed.
 
 	For more information on \co{READ_ONCE()} and \co{WRITE_ONCE()},
 	please see
-	Section~\ref{sec:toolsoftrade:Atomic Operations (gcc Classic)}.
+	\cref{sec:toolsoftrade:Atomic Operations (gcc Classic)}.
 	For more information on ordering accesses to multiple variables
 	by multiple threads, please see
-	Chapter~\ref{chp:Advanced Synchronization: Memory Ordering}.
+	\cref{chp:Advanced Synchronization: Memory Ordering}.
 	In the meantime, \co{READ_ONCE(x)} has much in common with
 	the \GCC\  intrinsic \co{__atomic_load_n(&x, __ATOMIC_RELAXED)}
 	and \co{WRITE_ONCE()} has much in common with the \GCC\ 
@@ -557,12 +556,12 @@ for errors and exiting the program if any occur.
 \Clnrefrange{loop:b}{loop:e} repeatedly check the value of \co{x},
 printing the new value
 each time that it changes.
-Line~\lnref{sleep} sleeps for one millisecond, which allows this demonstration
+\Clnref{sleep} sleeps for one millisecond, which allows this demonstration
 to run nicely on a uniprocessor machine.
 \Clnrefrange{rel:b}{rel:e} release the \co{pthread_mutex_t},
 again checking for
 errors and exiting the program if any occur.
-Finally, line~\lnref{return} returns \co{NULL}, again to match the function type
+Finally, \clnref{return} returns \co{NULL}, again to match the function type
 required by \co{pthread_create()}.
 \end{fcvref}
 
@@ -581,11 +580,11 @@ required by \co{pthread_create()}.
 
 \begin{fcvref}[ln:toolsoftrade:lock:reader_writer:writer]
 \Clnrefrange{b}{e} of
-Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}
+\cref{lst:toolsoftrade:Demonstration of Exclusive Locks}
 show \co{lock_writer()}, which
 periodically updates the shared variable \co{x} while holding the
 specified \co{pthread_mutex_t}.
-As with \co{lock_reader()}, line~\lnref{cast} casts \co{arg} to a pointer
+As with \co{lock_reader()}, \clnref{cast} casts \co{arg} to a pointer
 to \co{pthread_mutex_t},
 \clnrefrange{acq:b}{acq:e} acquire the specified lock,
 and \clnrefrange{rel:b}{rel:e} release it.
@@ -602,7 +601,7 @@ Finally, \clnrefrange{rel:b}{rel:e} release the lock.
 \end{listing}
 
 \begin{fcvref}[ln:toolsoftrade:lock:same_lock]
-Listing~\ref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}
+\Cref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}
 shows a code fragment that runs \co{lock_reader()} and
 \co{lock_writer()} as threads using the same lock, namely, \co{lock_a}.
 \Clnrefrange{cr:reader:b}{cr:reader:e} create a thread
@@ -625,7 +624,7 @@ by \co{lock_writer()} while holding the lock.
 \QuickQuiz{
 	Is ``x = 0'' the only possible output from the code fragment
 	shown in
-	Listing~\ref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}?
+	\cref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}?
 	If so, why?
 	If not, what other output could appear, and why?
 }\QuickQuizAnswer{
@@ -647,7 +646,7 @@ by \co{lock_writer()} while holding the lock.
 \label{lst:toolsoftrade:Demonstration of Different Exclusive Locks}
 \end{listing}
 
-Listing~\ref{lst:toolsoftrade:Demonstration of Different Exclusive Locks}
+\Cref{lst:toolsoftrade:Demonstration of Different Exclusive Locks}
 shows a similar code fragment, but this time using different locks:
 \co{lock_a} for \co{lock_reader()} and \co{lock_b} for
 \co{lock_writer()}.
@@ -688,7 +687,7 @@ values of \co{x} stored by \co{lock_writer()}.
 %
 \QuickQuizM{
 	In the code shown in
-	Listing~\ref{lst:toolsoftrade:Demonstration of Different Exclusive Locks},
+	\cref{lst:toolsoftrade:Demonstration of Different Exclusive Locks},
 	is \co{lock_reader()} guaranteed to see all the values produced
 	by \co{lock_writer()}?
 	Why or why not?
@@ -702,18 +701,18 @@ values of \co{x} stored by \co{lock_writer()}.
 %
 \QuickQuizE{
 	Wait a minute here!!!
-	Listing~\ref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}
+	\Cref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}
 	didn't initialize shared variable~\co{x},
 	so why does it need to be initialized in
-	Listing~\ref{lst:toolsoftrade:Demonstration of Different Exclusive Locks}?
+	\cref{lst:toolsoftrade:Demonstration of Different Exclusive Locks}?
 }\QuickQuizAnswerE{
-	See line~\ref{ln:toolsoftrade:lock:reader_writer:x} of
-	Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}.
+	See \clnrefr{ln:toolsoftrade:lock:reader_writer:x} of
+	\cref{lst:toolsoftrade:Demonstration of Exclusive Locks}.
 	Because the code in
-	Listing~\ref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}
+	\cref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}
 	ran first, it could rely on the compile-time initialization of~\co{x}.
 	The code in
-	Listing~\ref{lst:toolsoftrade:Demonstration of Different Exclusive Locks}
+	\cref{lst:toolsoftrade:Demonstration of Different Exclusive Locks}
 	ran next, so it had to re-initialize~\co{x}.
 }\QuickQuizEndE
 }
@@ -757,17 +756,17 @@ provided by reader-writer locks.
 \end{listing}
 
 \begin{fcvref}[ln:toolsoftrade:rwlockscale:reader]
-Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}
+\Cref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}
 (\path{rwlockscale.c})
 shows one way of measuring reader-writer lock scalability.
-Line~\lnref{rwlock} shows the definition and initialization of the reader-writer
-lock, line~\lnref{holdtm} shows the \co{holdtime} argument controlling the
+\Clnref{rwlock} shows the definition and initialization of the reader-writer
+lock, \clnref{holdtm} shows the \co{holdtime} argument controlling the
 time each thread holds the reader-writer lock,
-line~\lnref{thinktm} shows the \co{thinktime} argument controlling the time between
+\clnref{thinktm} shows the \co{thinktime} argument controlling the time between
 the release of the reader-writer lock and the next acquisition,
-line~\lnref{rdcnts} defines the \co{readcounts} array into which each reader thread
+\clnref{rdcnts} defines the \co{readcounts} array into which each reader thread
 places the number of times it acquired the lock, and
-line~\lnref{nrdrun} defines the \co{nreadersrunning} variable, which
+\clnref{nrdrun} defines the \co{nreadersrunning} variable, which
 determines when all reader threads have started running.
 
 \Clnrefrange{goflag:b}{goflag:e} define \co{goflag},
@@ -780,7 +779,7 @@ set to \co{GOFLAG_STOP} to terminate the test run.
 
 \begin{fcvref}[ln:toolsoftrade:rwlockscale:reader:reader]
 \Clnrefrange{b}{e} define \co{reader()}, which is the reader thread.
-Line~\lnref{atmc_inc} atomically increments the \co{nreadersrunning} variable
+\Clnref{atmc_inc} atomically increments the \co{nreadersrunning} variable
 to indicate that this thread is now running, and
 \clnrefrange{wait:b}{wait:e} wait for the test to start.
 The \apik{READ_ONCE()} primitive forces the compiler to fetch \co{goflag}
@@ -792,8 +791,8 @@ rights to assume that the value of \co{goflag} would never change.
 \QuickQuizB{
 	Instead of using \apik{READ_ONCE()} everywhere, why not just
 	declare \co{goflag} as \co{volatile} on
-        line~\ref{ln:toolsoftrade:rwlockscale:reader:goflag:e} of
-	Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}?
+	\clnrefr{ln:toolsoftrade:rwlockscale:reader:goflag:e} of
+	\cref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}?
 }\QuickQuizAnswerB{
 	A \co{volatile} declaration is in fact a reasonable alternative in
 	this particular case.
@@ -815,7 +814,7 @@ rights to assume that the value of \co{goflag} would never change.
 	Don't we also need memory barriers to make sure
 	that the change in \co{goflag}'s value propagates to the
 	CPU in a timely fashion in
-	Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}?
+	\cref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}?
 }\QuickQuizAnswerM{
 	No, memory barriers are not needed and won't help here.
 	Memory barriers only enforce ordering among multiple memory
@@ -846,7 +845,7 @@ rights to assume that the value of \co{goflag} would never change.
 	and never from a signal handler, then no.
 	Otherwise, it is quite possible that \apik{READ_ONCE()} is needed.
 	We will see examples of both situations in
-	Section~\ref{sec:count:Signal-Theft Limit Counter Implementation}.
+	\cref{sec:count:Signal-Theft Limit Counter Implementation}.
 
 	This leads to the question of how one thread can gain access to
 	another thread's \apig{__thread} variable, and the answer is that
@@ -866,10 +865,10 @@ number of microseconds,
 \clnrefrange{rel:b}{rel:e} release the lock,
 and \clnrefrange{think:b}{think:e} wait for the specified
 number of microseconds before re-acquiring the lock.
-Line~\lnref{count} counts this lock acquisition.
+\Clnref{count} counts this lock acquisition.
 
-Line~\lnref{mov_cnt} moves the lock-acquisition count to this thread's element of the
-\co{readcounts[]} array, and line~\lnref{return} returns, terminating this thread.
+\Clnref{mov_cnt} moves the lock-acquisition count to this thread's element of the
+\co{readcounts[]} array, and \clnref{return} returns, terminating this thread.
 \end{fcvref}
 
 \begin{figure}
@@ -879,7 +878,7 @@ Line~\lnref{mov_cnt} moves the lock-acquisition count to this thread's element o
 \label{fig:toolsoftrade:Reader-Writer Lock Scalability vs. Microseconds in Critical Section}
 \end{figure}
 
-Figure~\ref{fig:toolsoftrade:Reader-Writer Lock Scalability vs. Microseconds in Critical Section}
+\Cref{fig:toolsoftrade:Reader-Writer Lock Scalability vs. Microseconds in Critical Section}
 shows the results of running this test on a 224-core Xeon system
 with two hardware threads per core for a total of 448 software-visible
 CPUs.
@@ -943,7 +942,7 @@ in fact degraded by about 10\,\% from ideal.
 	increases.
 
 	Some other ways of efficiently handling very small critical
-	sections are described in Chapter~\ref{chp:Deferred Processing}.
+	sections are described in \cref{chp:Deferred Processing}.
 }\QuickQuizEndM
 %
 \QuickQuizM{
@@ -965,12 +964,12 @@ in fact degraded by about 10\,\% from ideal.
 Despite these limitations, reader-writer locking is quite useful in many
 cases, for example when the readers must do high-latency file or network I/O\@.
 There are alternatives, some of which will be presented in
-Chapters~\ref{chp:Counting} and \ref{chp:Deferred Processing}.
+\cref{chp:Counting,chp:Deferred Processing}.
 
 \subsection{Atomic Operations (\GCC\ Classic)}
 \label{sec:toolsoftrade:Atomic Operations (gcc Classic)}
 
-Figure~\ref{fig:toolsoftrade:Reader-Writer Lock Scalability vs. Microseconds in Critical Section}
+\Cref{fig:toolsoftrade:Reader-Writer Lock Scalability vs. Microseconds in Critical Section}
 shows that the overhead of reader-writer locking is most severe for the
 smallest critical sections, so it would be nice to have some other way
 of protecting tiny critical sections.
@@ -978,7 +977,7 @@ One such way uses \IX{atomic} operations.
 \begin{fcvref}[ln:toolsoftrade:rwlockscale:reader:reader]
 We have seen an atomic operation already, namely the
 \apig{__sync_fetch_and_add()} primitive on \clnref{atmc_inc} of
-Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}.
+\cref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}.
 This primitive atomically adds the value of its second argument to
 the value referenced by its first argument, returning the old value
 (which was ignored in this case).
@@ -1050,13 +1049,13 @@ problems~\cite{MauriceHerlihy90a}.
 	they be the fastest possible way to get things done?
 }\QuickQuizAnswer{
 	Unfortunately, no.
-	See Chapter~\ref{chp:Counting} for some stark counterexamples.
+	See \cref{chp:Counting} for some stark counterexamples.
 }\QuickQuizEnd
 
 The \apig{__sync_synchronize()} primitive issues a ``memory barrier'',
 which constrains both the compiler's and the CPU's ability to reorder
 operations, as discussed in
-Chapter~\ref{chp:Advanced Synchronization: Memory Ordering}.
+\cref{chp:Advanced Synchronization: Memory Ordering}.
 In some cases, it is sufficient to constrain the compiler's ability
 to reorder operations, while allowing the CPU free rein, in which
 case the \apik{barrier()} primitive may be used.
@@ -1064,15 +1063,15 @@ case the \apik{barrier()} primitive may be used.
 In some cases, it is only necessary to ensure that the compiler
 avoids optimizing away a given memory read, in which case the
 \apik{READ_ONCE()} primitive may be used, as it was on \clnref{read_x} of
-Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}.
+\cref{lst:toolsoftrade:Demonstration of Exclusive Locks}.
 \end{fcvref}
 Similarly, the \apik{WRITE_ONCE()} primitive may be used to prevent the
 compiler from optimizing away a given memory write.
 These last three primitives are not provided directly by \GCC,
 but may be implemented straightforwardly as shown in
-Listing~\ref{lst:toolsoftrade:Compiler Barrier Primitive (for GCC)},
+\cref{lst:toolsoftrade:Compiler Barrier Primitive (for GCC)},
 and all three are discussed at length in
-Section~\ref{sec:toolsoftrade:Accessing Shared Variables}.
+\cref{sec:toolsoftrade:Accessing Shared Variables}.
 Alternatively, \apialtk{READ_ONCE(x)}{READ_ONCE()} has much in common with
 the \GCC\  intrinsic
 \apialtg{__atomic_load_n(&x, __ATOMIC_RELAXED)}{__atomic_load_n()}
@@ -1123,7 +1122,7 @@ The read-modify-write atomics include
 and
 \apic{atomic_compare_exchange_weak()}.
 These operate in a manner similar to those described in
-Section~\ref{sec:toolsoftrade:Atomic Operations (gcc Classic)},
+\cref{sec:toolsoftrade:Atomic Operations (gcc Classic)},
 but with the addition of memory-order arguments to \co{_explicit}
 variants of all of the operations.
 Without memory-order arguments, all the atomic operations are
@@ -1132,8 +1131,8 @@ For example, ``\apialtc{atomic_load_explicit(&a, memory_order_relaxed)}
 {atomic_load_explicit()}''
 is vaguely similar to the Linux kernel's ``\apik{READ_ONCE()}''.\footnote{
 	Memory ordering is described in more detail in
-	Chapter~\ref{chp:Advanced Synchronization: Memory Ordering} and
-	Appendix~\ref{chp:app:whymb:Why Memory Barriers?}.}
+	\cref{chp:Advanced Synchronization: Memory Ordering} and
+	\cref{chp:app:whymb:Why Memory Barriers?}.}
 
 \subsection{Atomic Operations (Modern \GCC)}
 \label{sec:toolsoftrade:Atomic Operations (Modern gcc)}
@@ -1163,7 +1162,7 @@ this list:
 Per-thread variables, also called thread-specific data, thread-local
 storage, and other less-polite names, are used extremely
 heavily in concurrent code, as will be explored in
-Chapters~\ref{chp:Counting} and~\ref{chp:Data Ownership}.
+\cref{chp:Counting,chp:Data Ownership}.
 POSIX supplies the \apipx{pthread_key_create()} function to create a
 per-thread variable (and return the corresponding key),
 \apipx{pthread_key_delete()} to delete the per-thread variable corresponding
@@ -1260,7 +1259,7 @@ Threads share everything except for per-thread local state,\footnote{
 which includes program counter and stack.
 
 The thread API is shown in
-Listing~\ref{lst:toolsoftrade:Thread API}, and members are described in the
+\cref{lst:toolsoftrade:Thread API}, and members are described in the
 following sections.
 
 \begin{listing}
@@ -1310,7 +1309,7 @@ the like.
 The \apipf{for_each_thread()} macro loops through all threads that exist,
 including all threads that \emph{would} exist if created.
 This macro is useful for handling per-thread variables as will be
-seen in Section~\ref{sec:toolsoftrade:Per-Thread Variables}.
+seen in \cref{sec:toolsoftrade:Per-Thread Variables}.
 
 \subsubsection{\tco{for_each_running_thread()}}
 
@@ -1339,14 +1338,14 @@ a run, so such synchronization is normally not needed.
 
 \subsubsection{Example Usage}
 
-Listing~\ref{lst:toolsoftrade:Example Child Thread} (\path{threadcreate.c})
+\Cref{lst:toolsoftrade:Example Child Thread} (\path{threadcreate.c})
 shows an example hello-world-like child thread.
 As noted earlier, each thread is allocated its own stack, so
 each thread has its own private \co{arg} argument and \co{myarg} variable.
 Each child simply prints its argument and its \apipf{smp_thread_id()}
 before exiting.
 Note that the \co{return} statement on
-line~\ref{ln:intro:threadcreate:thread_test:return} terminates the thread,
+\clnrefr{ln:intro:threadcreate:thread_test:return} terminates the thread,
 returning a \co{NULL} to whoever invokes \apipf{wait_thread()} on this
 thread.
 
@@ -1358,14 +1357,14 @@ thread.
 
 \begin{fcvref}[ln:intro:threadcreate:main]
 The parent program is shown in
-Listing~\ref{lst:toolsoftrade:Example Parent Thread}.
+\cref{lst:toolsoftrade:Example Parent Thread}.
 It invokes \co{smp_init()} to initialize the threading system on
-line~\lnref{smp_init},
+\clnref{smp_init},
 parses arguments on \clnrefrange{parse:b}{parse:e},
-and announces its presence on line~\lnref{announce}.
+and announces its presence on \clnref{announce}.
 It creates the specified number of child threads on
 \clnrefrange{create:b}{create:e},
-and waits for them to complete on line~\lnref{wait}.
+and waits for them to complete on \clnref{wait}.
 Note that \apipf{wait_all_threads()} discards the threads return values,
 as in this case they are all \co{NULL}, which is not very interesting.
 \end{fcvref}
@@ -1390,7 +1389,7 @@ as in this case they are all \co{NULL}, which is not very interesting.
 \label{sec:toolsoftrade:Locking}
 
 A good starting subset of the Linux kernel's locking API is shown in
-Listing~\ref{lst:toolsoftrade:Locking API},
+\cref{lst:toolsoftrade:Locking API},
 each API element being described in the following sections.
 This book's CodeSamples locking API closely follows that of the Linux kernel.
 
@@ -1469,7 +1468,7 @@ STORE r0,counter
 
 However, the \apik{spin_lock()} and \apik{spin_unlock()} primitives
 do have performance consequences, as will be seen in
-Chapter~\ref{chp:Data Structures}.
+\cref{chp:Data Structures}.
 
 \subsection{Accessing Shared Variables}
 \label{sec:toolsoftrade:Accessing Shared Variables}
@@ -1512,32 +1511,32 @@ In (say) the early 1990s, compilers did fewer optimizations, in part
 because there were fewer compiler writers and in part due to the
 relatively small memories of that era.
 Nevertheless, problems did arise, as shown in
-Listing~\ref{lst:toolsoftrade:Living Dangerously Early 1990s Style},
+\cref{lst:toolsoftrade:Living Dangerously Early 1990s Style},
 which the compiler is within its rights to transform into
-Listing~\ref{lst:toolsoftrade:C Compilers Can Invent Loads}.
+\cref{lst:toolsoftrade:C Compilers Can Invent Loads}.
 As you can see, the temporary on
-line~\ref{ln:toolsoftrade:Living Dangerously Early 1990s Style:temp} of
-Listing~\ref{lst:toolsoftrade:Living Dangerously Early 1990s Style}
+\clnrefr{ln:toolsoftrade:Living Dangerously Early 1990s Style:temp} of
+\cref{lst:toolsoftrade:Living Dangerously Early 1990s Style}
 has been optimized away, so that \co{global_ptr} will be loaded
 up to three times.
 
 \QuickQuiz{
 	What is wrong with loading
-	Listing~\ref{lst:toolsoftrade:Living Dangerously Early 1990s Style}'s
+	\cref{lst:toolsoftrade:Living Dangerously Early 1990s Style}'s
 	\co{global_ptr} up to three times?
 }\QuickQuizAnswer{
 	Suppose that \co{global_ptr} is initially non-\co{NULL},
 	but that some other thread sets \co{global_ptr} to \co{NULL}.
 	\begin{fcvref}[ln:toolsoftrade:C Compilers Can Invent Loads]
-	Suppose further that line~\lnref{if:a} of the transformed code
-	(Listing~\ref{lst:toolsoftrade:C Compilers Can Invent Loads})
+	Suppose further that \clnref{if:a} of the transformed code
+	(\cref{lst:toolsoftrade:C Compilers Can Invent Loads})
 	executes just before \co{global_ptr} is set to \co{NULL} and
-	line~\lnref{if:b} just after.
-	Then line~\lnref{if:a} will conclude that
+	\clnref{if:b} just after.
+	Then \clnref{if:a} will conclude that
         \co{global_ptr} is non-\co{NULL},
-	line~\lnref{if:b} will conclude that it is less than
+	\clnref{if:b} will conclude that it is less than
         \co{high_address},
-	so that line~\lnref{do_low} passes \co{do_low()} a \co{NULL} pointer,
+	so that \clnref{do_low} passes \co{do_low()} a \co{NULL} pointer,
 	which \co{do_low()} just might not be prepared to deal with.
 	\end{fcvref}
 
@@ -1549,15 +1548,15 @@ up to three times.
 	go away on its own.
 }\QuickQuizEnd
 
-Section~\ref{sec:toolsoftrade:Shared-Variable Shenanigans}
+\Cref{sec:toolsoftrade:Shared-Variable Shenanigans}
 describes additional problems caused by plain accesses,
-Sections~\ref{sec:toolsoftrade:A Volatile Solution}
-and~\ref{sec:toolsoftrade:Assembling the Rest of a Solution}
+\cref{sec:toolsoftrade:A Volatile Solution,%
+sec:toolsoftrade:Assembling the Rest of a Solution}
 describe some pre-C11 solutions.
 Of course, where practical, the primitives described in
-Section~\ref{sec:toolsoftrade:Atomic Operations (gcc Classic)}
+\cref{sec:toolsoftrade:Atomic Operations (gcc Classic)}
 or (especially)
-Section~\ref{sec:toolsoftrade:Atomic Operations (C11)}
+\cref{sec:toolsoftrade:Atomic Operations (C11)}
 should instead be used to avoid \IXpl{data race}, that is, to ensure
 that if there are multiple concurrent accesses to a given
 variable, all of those accesses are loads.
@@ -1584,8 +1583,8 @@ or shared-variable shenanigans, as described below.
 instructions for a single access.
 For example, the compiler could in theory compile the load from
 \co{global_ptr} (see
-line~\ref{ln:toolsoftrade:Living Dangerously Early 1990s Style:temp} of
-Listing~\ref{lst:toolsoftrade:Living Dangerously Early 1990s Style})
+\clnrefr{ln:toolsoftrade:Living Dangerously Early 1990s Style:temp} of
+\cref{lst:toolsoftrade:Living Dangerously Early 1990s Style})
 as a series of one-byte loads.
 If some other thread was concurrently setting \co{global_ptr} to
 \co{NULL}, the result might have all but one byte of the pointer
@@ -1679,14 +1678,14 @@ function named \co{do_something_quickly()} repeatedly until the
 variable \co{need_to_stop} was set, and that the compiler can see
 that \co{do_something_quickly()} does not store to \co{need_to_stop}.
 One (unsafe) way to code this is shown in
-Listing~\ref{lst:toolsoftrade:Inviting Load Fusing}.
+\cref{lst:toolsoftrade:Inviting Load Fusing}.
 The compiler might reasonably unroll this loop sixteen times in order
 to reduce the per-invocation of the backwards branch at the end of the
 loop.
 Worse yet, because the compiler knows that \co{do_something_quickly()}
 does not store to \co{need_to_stop}, the compiler could quite reasonably
 decide to check this variable only once, resulting in the code shown in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Loads}.
+\cref{lst:toolsoftrade:C Compilers Can Fuse Loads}.
 \begin{fcvref}[ln:toolsoftrade:C Compilers Can Fuse Loads]
 Once entered, the loop on
 \clnrefrange{loop:b}{loop:e} will never exit, regardless of how
@@ -1724,21 +1723,21 @@ void t1(void)
 \begin{fcvref}[ln:toolsoftrade:C Compilers Can Fuse Non-Adjacent Loads]
 The compiler can fuse loads across surprisingly large spans of code.
 For example, in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Non-Adjacent Loads},
+\cref{lst:toolsoftrade:C Compilers Can Fuse Non-Adjacent Loads},
 \co{t0()} and \co{t1()} run concurrently, and \co{do_something()} and
 \co{do_something_else()} are inline functions.
-Line~\lnref{gp} declares pointer \co{gp}, which C initializes to \co{NULL}
+\Clnref{gp} declares pointer \co{gp}, which C initializes to \co{NULL}
 by default.
-At some point, line~\lnref{wgp} of \co{t0()} stores a non-\co{NULL}
+At some point, \clnref{wgp} of \co{t0()} stores a non-\co{NULL}
 pointer to \co{gp}.
 Meanwhile, \co{t1()} loads from \co{gp} three times on
-lines~\lnref{p1}, \lnref{p2}, and~\lnref{p3}.
-Given that line~\lnref{if} finds that \co{gp} is non-\co{NULL}, one might
-hope that the dereference on line~\lnref{p3} would be guaranteed never
+\clnref{p1,p2,p3}.
+Given that \clnref{if} finds that \co{gp} is non-\co{NULL}, one might
+hope that the dereference on \clnref{p3} would be guaranteed never
 to fault.
 Unfortunately, the compiler is within its rights to fuse the read on
-lines~\lnref{p1} and~\lnref{p3}, which means that if line~\lnref{p1}
-loads \co{NULL} and line~\lnref{p2} loads \co{&myvar}, line~\lnref{p3}
+\clnref{p1,p3}, which means that if \clnref{p1}
+loads \co{NULL} and \clnref{p2} loads \co{&myvar}, \clnref{p3}
 could load \co{NULL}, resulting in a fault.\footnote{
 	\ppl{Will}{Deacon} reports that this happened in the Linux kernel.}
 Note that the intervening \apik{READ_ONCE()} does not prevent the other
@@ -1749,7 +1748,7 @@ from the same variable.
 \QuickQuiz{
 	Why does it matter whether \co{do_something()} and
 	\co{do_something_else()} in
-	Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Non-Adjacent Loads}
+	\cref{lst:toolsoftrade:C Compilers Can Fuse Non-Adjacent Loads}
 	are inline functions?
 }\QuickQuizAnswer{
 	\begin{fcvref}[ln:toolsoftrade:C Compilers Can Fuse Non-Adjacent Loads]
@@ -1758,7 +1757,7 @@ from the same variable.
 	compiled, the compiler would have to assume that either or both
 	of these two functions might change the value of \co{gp}.
 	This possibility would force the compiler to reload \co{gp}
-	on line~\lnref{p3}, thus avoiding the \co{NULL}-pointer dereference.
+	on \clnref{p3}, thus avoiding the \co{NULL}-pointer dereference.
 	\end{fcvref}
 }\QuickQuizEnd
 
@@ -1798,25 +1797,25 @@ void work_until_shut_down(void)
 \end{listing}
 
 However, there are exceptions, for example as shown in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Stores}.
+\cref{lst:toolsoftrade:C Compilers Can Fuse Stores}.
 \begin{fcvref}[ln:toolsoftrade:C Compilers Can Fuse Stores]
 The function \co{shut_it_down()} stores to the shared
-variable \co{status} on lines~\lnref{store:a} and~\lnref{store:b},
+variable \co{status} on \clnref{store:a,store:b},
 and so assuming that neither
 \co{start_shutdown()} nor \co{finish_shutdown()} access \co{status},
 the compiler could reasonably remove the store to \co{status} on
-line~\lnref{store:a}.
+\clnref{store:a}.
 Unfortunately, this would mean that \co{work_until_shut_down()} would
 never exit its loop spanning
-lines~\lnref{until:loop:b} and~\lnref{until:loop:e}, and thus would never set
+\clnref{until:loop:b,until:loop:e}, and thus would never set
 \co{other_task_ready}, which would in turn mean that \co{shut_it_down()}
 would never exit its loop spanning
-lines~\lnref{loop:b} and~\lnref{loop:e}, even if
+\clnref{loop:b,loop:e}, even if
 the compiler chooses not to fuse the successive loads from
-\co{other_task_ready} on line~\lnref{loop:b}.
+\co{other_task_ready} on \clnref{loop:b}.
 
 And there are more problems with the code in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Stores},
+\cref{lst:toolsoftrade:C Compilers Can Fuse Stores},
 including code reordering.
 
 {\bf Code reordering} is a common compilation technique used to
@@ -1824,17 +1823,17 @@ combine common subexpressions, reduce register pressure, and
 improve utilization of the many functional units available on
 modern superscalar microprocessors.
 It is also another reason why the code in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Stores}
+\cref{lst:toolsoftrade:C Compilers Can Fuse Stores}
 is buggy.
 For example, suppose that the \co{do_more_work()} function on
-line~\lnref{until:loop:e}
+\clnref{until:loop:e}
 does not access \co{other_task_ready}.
 Then the compiler would be within its rights to move the assignment
 to \co{other_task_ready} on
-line~\lnref{other:store} to precede line~\lnref{until:loop:b}, which might
+\clnref{other:store} to precede \clnref{until:loop:b}, which might
 be a great disappointment for anyone hoping that the last call to
-\co{do_more_work()} on line~\lnref{until:loop:e} happens before the call to
-\co{finish_shutdown()} on line~\lnref{finish}.
+\co{do_more_work()} on \clnref{until:loop:e} happens before the call to
+\co{finish_shutdown()} on \clnref{finish}.
 \end{fcvref}
 
 It might seem futile to prevent the compiler from changing the order of
@@ -1853,8 +1852,8 @@ independent of the ordering provided by the underlying hardware.\footnote{
 	of \apik{READ_ONCE()} and \apik{WRITE_ONCE()}.}
 
 {\bf Invented loads} were illustrated by the code in
-Listings~\ref{lst:toolsoftrade:Living Dangerously Early 1990s Style}
-and~\ref{lst:toolsoftrade:C Compilers Can Invent Loads},
+\cref{lst:toolsoftrade:Living Dangerously Early 1990s Style,%
+lst:toolsoftrade:C Compilers Can Invent Loads},
 in which the compiler optimized away a temporary variable,
 thus loading from a shared variable more often than intended.
 
@@ -1868,9 +1867,9 @@ both performance and scalability.
 \begin{fcvref}[ln:toolsoftrade:C Compilers Can Fuse Stores]
 {\bf Invented stores} can occur in a number of situations.
 For example, a compiler emitting code for \co{work_until_shut_down()} in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Stores}
+\cref{lst:toolsoftrade:C Compilers Can Fuse Stores}
 might notice that \co{other_task_ready} is not accessed by
-\co{do_more_work()}, and stored to on line~\lnref{other:store}.
+\co{do_more_work()}, and stored to on \clnref{other:store}.
 If \co{do_more_work()} was a complex inline function, it might be
 necessary to do a register spill, in which case one attractive
 place to use for temporary storage is \co{other_task_ready}.
@@ -1878,7 +1877,7 @@ After all, there are no accesses to it, so what is the harm?
 
 Of course, a non-zero store to this variable at just the wrong time
 would result in the \co{while} loop on
-line~\lnref{loop:b} terminating
+\clnref{loop:b} terminating
 prematurely, again allowing \co{finish_shutdown()} to run
 concurrently with \co{do_more_work()}.
 Given that the entire point of this \co{while} appears to be to
@@ -1916,19 +1915,19 @@ Using a stored-to variable as a temporary might seem outlandish,
 but it is permitted by the standard.
 Nevertheless, readers might be justified in wanting a less
 outlandish example, which is provided by
-Listings~\ref{lst:toolsoftrade:Inviting an Invented Store}
-and~\ref{lst:toolsoftrade:Compiler Invents an Invited Store}.
+\cref{lst:toolsoftrade:Inviting an Invented Store,%
+lst:toolsoftrade:Compiler Invents an Invited Store}.
 
 A compiler emitting code for
-Listing~\ref{lst:toolsoftrade:Inviting an Invented Store}
+\cref{lst:toolsoftrade:Inviting an Invented Store}
 might know that the value of \co{a} is initially zero,
 which might be a strong temptation to optimize away one branch
 by transforming this code to that in
-Listing~\ref{lst:toolsoftrade:Compiler Invents an Invited Store}.
+\cref{lst:toolsoftrade:Compiler Invents an Invited Store}.
 \begin{fcvref}[ln:toolsoftrade:Compiler Invents an Invited Store]
-Here, line~\lnref{store:uncond} unconditionally stores \co{1} to \co{a}, then
+Here, \clnref{store:uncond} unconditionally stores \co{1} to \co{a}, then
 resets the value back to zero on
-line~\lnref{store:cond} if \co{condition} was not set.
+\clnref{store:cond} if \co{condition} was not set.
 This transforms the if-then-else into an if-then, saving one branch.
 \end{fcvref}
 
@@ -1984,18 +1983,18 @@ p = NULL;\lnlbl[null]
 that a plain store might not actually change the value in memory.
 \begin{fcvref}[ln:toolsoftrade:Inviting a Store-to-Load Conversion]
 For example, consider
-Listing~\ref{lst:toolsoftrade:Inviting a Store-to-Load Conversion}.
-Line~\lnref{load:p} fetches \co{p}, but the \qco{if} statement on
-line~\lnref{if} also tells the compiler that the developer thinks that
+\cref{lst:toolsoftrade:Inviting a Store-to-Load Conversion}.
+\Clnref{load:p} fetches \co{p}, but the \qco{if} statement on
+\clnref{if} also tells the compiler that the developer thinks that
 \co{p} is usually zero.\footnote{
 	The \apik{unlikely()} function provides this hint to the compiler,
 	and different compilers provide different ways of implementing
 	\co{unlikely()}.}
-The \apik{barrier()} statment on line~\lnref{barrier} forces the compiler
+The \apik{barrier()} statment on \clnref{barrier} forces the compiler
 to forget the value of \co{p}, but one could imagine a compiler
 choosing to remember the hint---or getting an additional hint via
 feedback-directed optimization.
-Doing so would cause the compiler to realize that line~\lnref{null}
+Doing so would cause the compiler to realize that \clnref{null}
 is often an expensive no-op.
 \end{fcvref}
 
@@ -2016,8 +2015,8 @@ if (p != NULL)\lnlbl[if1]
 
 \begin{fcvref}[ln:toolsoftrade:Compiler Converts a Store to a Load]
 Such a compiler might therefore guard the store of \co{NULL}
-with a check, as shown on lines~\lnref{if1} and~\lnref{null} of
-Listing~\ref{lst:toolsoftrade:Compiler Converts a Store to a Load}.
+with a check, as shown on \clnref{if1,null} of
+\cref{lst:toolsoftrade:Compiler Converts a Store to a Load}.
 Although this transformation is often desirable, it could be problematic
 if the actual store was required for ordering.
 For example, a write memory barrier (Linux kernel \apik{smp_wmb()}) would
@@ -2042,8 +2041,8 @@ might thus eliminate a variable that the external code relies upon.
 Reliable concurrent code clearly needs a way to cause the compiler to
 preserve the number, order, and type of important accesses to shared
 memory, a topic taken up by
-Sections~\ref{sec:toolsoftrade:A Volatile Solution}
-and~\ref{sec:toolsoftrade:Assembling the Rest of a Solution},
+\cref{sec:toolsoftrade:A Volatile Solution,%
+sec:toolsoftrade:Assembling the Rest of a Solution},
 which are up next.
 
 \subsubsection{A Volatile Solution}
@@ -2114,7 +2113,7 @@ constraints~\cite{PaulEMcKenney2016P0124R6-LKMM}:
 	Concurrent code relies on this constraint in order to achieve
 	the desired ordering properties from combinations of volatile
 	accesses and other means discussed in
-	Section~\ref{sec:toolsoftrade:Assembling the Rest of a Solution}.
+	\cref{sec:toolsoftrade:Assembling the Rest of a Solution}.
 \end{enumerate}
 
 Concurrent code also relies on the first two constraints to avoid
@@ -2139,11 +2138,11 @@ if (ptr != NULL && ptr < high_address)
 \end{listing}
 
 Using \apik{READ_ONCE()} on
-line~\ref{ln:toolsoftrade:Living Dangerously Early 1990s Style:temp} of
-Listing~\ref{lst:toolsoftrade:Living Dangerously Early 1990s Style}
+\clnrefr{ln:toolsoftrade:Living Dangerously Early 1990s Style:temp} of
+\cref{lst:toolsoftrade:Living Dangerously Early 1990s Style}
 avoids invented loads,
 resulting in the code shown in
-Listing~\ref{lst:toolsoftrade:Avoiding Danger; 2018 Style}.
+\cref{lst:toolsoftrade:Avoiding Danger; 2018 Style}.
 
 \begin{listing}
 \begin{fcvlabel}[ln:toolsoftrade:Preventing Load Fusing]
@@ -2157,9 +2156,9 @@ while (!READ_ONCE(need_to_stop))
 \end{listing}
 
 As shown in
-Listing~\ref{lst:toolsoftrade:Preventing Load Fusing},
+\cref{lst:toolsoftrade:Preventing Load Fusing},
 \apik{READ_ONCE()} can also prevent the loop unrolling in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Loads}.
+\cref{lst:toolsoftrade:C Compilers Can Fuse Loads}.
 
 \begin{listing}
 \begin{fcvlabel}[ln:toolsoftrade:Preventing Store Fusing and Invented Stores]
@@ -2189,12 +2188,12 @@ void work_until_shut_down(void)
 
 \apik{READ_ONCE()} and \apik{WRITE_ONCE()} can also be used to prevent the
 store fusing and invented stores that were shown in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Stores},
+\cref{lst:toolsoftrade:C Compilers Can Fuse Stores},
 with the result shown in
-Listing~\ref{lst:toolsoftrade:Preventing Store Fusing and Invented Stores}.
+\cref{lst:toolsoftrade:Preventing Store Fusing and Invented Stores}.
 However, this does nothing to prevent code reordering, which requires
 some additional tricks taught in
-Section~\ref{sec:toolsoftrade:Assembling the Rest of a Solution}.
+\cref{sec:toolsoftrade:Assembling the Rest of a Solution}.
 
 \begin{listing}
 \begin{fcvlabel}[ln:toolsoftrade:Disinviting an Invented Store]
@@ -2211,9 +2210,9 @@ else
 
 Finally, \apik{WRITE_ONCE()} can be used to prevent the store invention
 shown in
-Listing~\ref{lst:toolsoftrade:Inviting an Invented Store},
+\cref{lst:toolsoftrade:Inviting an Invented Store},
 with the resulting code shown in
-Listing~\ref{lst:toolsoftrade:Disinviting an Invented Store}.
+\cref{lst:toolsoftrade:Disinviting an Invented Store}.
 
 To summarize, the \apic{volatile} keyword can prevent load
 tearing and store tearing in cases where the loads and stores are
@@ -2236,7 +2235,7 @@ Additional ordering has traditionally been provided by recourse to
 assembly language, for example, \GCC\ asm directives.
 Oddly enough, these directives need not actually contain assembly language,
 as exemplified by the \apik{barrier()} macro shown in
-Listing~\ref{lst:toolsoftrade:Compiler Barrier Primitive (for GCC)}.
+\cref{lst:toolsoftrade:Compiler Barrier Primitive (for GCC)}.
 
 \begin{listing}
 \begin{fcvlabel}[ln:toolsoftrade:Preventing C Compilers From Fusing Loads]
@@ -2260,12 +2259,12 @@ this do-nothing asm can arbitrarily change memory.
 In response, the compiler will avoid moving any memory references across
 the \apik{barrier()} macro.
 This means that the real-time-destroying loop unrolling shown in
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Loads}
+\cref{lst:toolsoftrade:C Compilers Can Fuse Loads}
 can be prevented by adding \apik{barrier()} calls as shown on
-lines~\ref{ln:toolsoftrade:Preventing C Compilers From Fusing Loads:b1}
-and~\ref{ln:toolsoftrade:Preventing C Compilers From Fusing Loads:b2}
+\clnrefr{ln:toolsoftrade:Preventing C Compilers From Fusing Loads:b1,%
+ln:toolsoftrade:Preventing C Compilers From Fusing Loads:b2}
 of
-Listing~\ref{lst:toolsoftrade:Preventing C Compilers From Fusing Loads}.
+\cref{lst:toolsoftrade:Preventing C Compilers From Fusing Loads}.
 These two lines of code prevent the compiler from pushing the load from
 \co{need_to_stop} into or past \co{do_something_quickly()} from either
 direction.
@@ -2307,19 +2306,18 @@ references.
 In many cases, this is not a problem because the hardware can only do
 a certain amount of reordering.
 However, there are cases such as
-Listing~\ref{lst:toolsoftrade:C Compilers Can Fuse Stores} where the
+\cref{lst:toolsoftrade:C Compilers Can Fuse Stores} where the
 hardware must be constrained.
-Listing~\ref{lst:toolsoftrade:Preventing Store Fusing and Invented Stores}
+\Cref{lst:toolsoftrade:Preventing Store Fusing and Invented Stores}
 prevented store fusing and invention, and
-Listing~\ref{lst:toolsoftrade:Preventing Reordering}
+\cref{lst:toolsoftrade:Preventing Reordering}
 further prevents the remaining reordering by addition of
 \apik{smp_mb()} on
 \begin{fcvref}[ln:toolsoftrade:Preventing Reordering]
-lines~\lnref{mb1}, \lnref{mb2}, \lnref{mb3}, \lnref{mb4},
-and~\lnref{mb5}.
+\clnref{mb1,mb2,mb3,mb4,mb5}.
 \end{fcvref}
 The \apik{smp_mb()} macro is similar to \apik{barrier()} shown in
-Listing~\ref{lst:toolsoftrade:Compiler Barrier Primitive (for GCC)},
+\cref{lst:toolsoftrade:Compiler Barrier Primitive (for GCC)},
 but with the empty string replaced by a string containing the
 instruction for a full memory barrier, for example, \co{"mfence"}
 on x86 or \co{"sync"} on PowerPC.
@@ -2327,23 +2325,23 @@ on x86 or \co{"sync"} on PowerPC.
 \QuickQuiz{
 	But aren't full memory barriers very heavyweight?
 	Isn't there a cheaper way to enforce the ordering needed in
-	Listing~\ref{lst:toolsoftrade:Preventing Reordering}?
+	\cref{lst:toolsoftrade:Preventing Reordering}?
 }\QuickQuizAnswer{
 	As is often the case, the answer is ``it depends''.
 	However, if only two threads are accessing the \co{status}
 	and \co{other_task_ready} variables, then the
 	\apik{smp_store_release()} and \apik{smp_load_acquire()}
 	functions discussed in
-	Section~\ref{sec:toolsoftrade:Atomic Operations}
+	\cref{sec:toolsoftrade:Atomic Operations}
 	will suffice.
 }\QuickQuizEnd
 
 Ordering is also provided by some read-modify-write atomic
 operations, some of which are presented in
-Section~\ref{sec:toolsoftrade:Atomic Operations}.
+\cref{sec:toolsoftrade:Atomic Operations}.
 In the general case, memory ordering can be quite subtle, as
 discussed in
-Chapter~\ref{chp:Advanced Synchronization: Memory Ordering}.
+\cref{chp:Advanced Synchronization: Memory Ordering}.
 The next section covers an alternative to memory ordering, namely
 limiting or even entirely avoiding data races.
 
@@ -2360,10 +2358,10 @@ shared variables!''
 The doctor's advice might seem unhelpful, but
 one time-tested way to avoid concurrently accessing shared variables
 is access those variables only when holding a particular lock, as will
-be discussed in Chapter~\ref{chp:Locking}.
+be discussed in \cref{chp:Locking}.
 Another way is to access a given ``shared'' variable only from a given
 CPU or thread, as will be discussed in
-Chapter~\ref{chp:Data Ownership}.
+\cref{chp:Data Ownership}.
 It is possible to combine these two approaches, for example, a given
 variable might be modified only by a given CPU or thread while holding a
 particular lock, and might be read either from that same CPU or thread
@@ -2431,12 +2429,12 @@ use \apik{READ_ONCE()} and \apik{WRITE_ONCE()} or stronger, respectively.
 But it bears repeating that neither \apik{READ_ONCE()} nor \apik{WRITE_ONCE()}
 provide any ordering guarantees other than within the compiler.
 See the above
-Section~\ref{sec:toolsoftrade:Assembling the Rest of a Solution} or
-Chapter~\ref{chp:Advanced Synchronization: Memory Ordering}
+\cref{sec:toolsoftrade:Assembling the Rest of a Solution} or
+\cref{chp:Advanced Synchronization: Memory Ordering}
 for information on such guarantees.
 
 Examples of many of these data-race-avoidance patterns are presented in
-Chapter~\ref{chp:Counting}.
+\cref{chp:Counting}.
 
 \subsection{Atomic Operations}
 \label{sec:toolsoftrade:Atomic Operations}
@@ -2482,7 +2480,7 @@ given per-CPU variable, \apik{per_cpu()} to access a specified CPU's
 instance of a given per-CPU variable, along with many other special-purpose
 per-CPU operations.
 
-Listing~\ref{lst:toolsoftrade:Per-Thread-Variable API}
+\Cref{lst:toolsoftrade:Per-Thread-Variable API}
 shows this book's per-thread-variable API, which is patterned
 after the Linux kernel's per-CPU-variable API\@.
 This API provides the per-thread equivalent of global variables.
@@ -2558,7 +2556,7 @@ the CPU-online process.
 Suppose that we have a counter that is incremented very frequently
 but read out quite rarely.
 As will become clear in
-Section~\ref{sec:count:Statistical Counters},
+\cref{sec:count:Statistical Counters},
 it is helpful to implement such a counter using a per-thread variable.
 Such a variable can be defined as follows:
 
@@ -2591,7 +2589,7 @@ for_each_thread(t)
 Again, it is possible to gain a similar effect using other mechanisms,
 but per-thread variables combine convenience and high performance,
 as will be shown in more detail in
-Section~\ref{sec:count:Statistical Counters}.
+\cref{sec:count:Statistical Counters}.
 
 \section{The Right Tool for the Job: How to Choose?}
 \label{sec:toolsoftrade:The Right Tool for the Job: How to Choose?}
@@ -2612,14 +2610,14 @@ might need to use the POSIX threading primitives, choosing the appropriate
 locking and/or atomic-operation primitives.
 If the overhead of the POSIX threading primitives (typically sub-microsecond)
 is too great, then the primitives introduced in
-Chapter~\ref{chp:Deferred Processing} may be required.
+\cref{chp:Deferred Processing} may be required.
 Of course, the actual overheads will depend not only on your hardware,
 but most critically on the manner in which you use the primitives.
 Furthermore, always remember that inter-process communication and
 message-passing can be good alternatives to shared-memory multithreaded
 execution, especially when your code makes good use of the design
 principles called out in
-Chapter~\ref{cha:Partitioning and Synchronization Design}.
+\cref{cha:Partitioning and Synchronization Design}.
 
 \QuickQuiz{
 	Wouldn't the shell normally use \apipx{vfork()} rather than
@@ -2636,22 +2634,22 @@ Because concurrency was added to the C standard several decades after
 the C language was first used to build concurrent systems, there are
 a number of ways of concurrently accessing shared variables.
 All else being equal, the C11 standard operations described in
-Section~\ref{sec:toolsoftrade:Atomic Operations (C11)}
+\cref{sec:toolsoftrade:Atomic Operations (C11)}
 should be your first stop.
 If you need to access a given shared variable both with \IXplx{plain access}{es}
 and atomically, then the modern \GCC\ atomics described in
-Section~\ref{sec:toolsoftrade:Atomic Operations (Modern gcc)}
+\cref{sec:toolsoftrade:Atomic Operations (Modern gcc)}
 might work well for you.
 If you are working on an old codebase that uses the classic \GCC\ \co{__sync}
 API, then you should review
-Section~\ref{sec:toolsoftrade:Atomic Operations (gcc Classic)}
+\cref{sec:toolsoftrade:Atomic Operations (gcc Classic)}
 as well as the relevant \GCC\ documentation.
 If you are working on the Linux kernel or similar codebase that
 combines use of the \apic{volatile} keyword with inline assembly,
 or if you need dependencies to provide ordering, look at the material
-presented in Section~\ref{sec:toolsoftrade:Accessing Shared Variables}
+presented in \cref{sec:toolsoftrade:Accessing Shared Variables}
 as well as that in
-Chapter~\ref{chp:Advanced Synchronization: Memory Ordering}.
+\cref{chp:Advanced Synchronization: Memory Ordering}.
 
 Whatever approach you take, please keep in mind that randomly hacking
 multi-threaded code is a spectacularly bad idea, especially given that
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -perfbook 0/4] Add script to check cleveref macro uses
  2021-05-06 14:21 [PATCH -perfbook 0/4] Add script to check cleveref macro uses Akira Yokosawa
                   ` (3 preceding siblings ...)
  2021-05-06 14:27 ` [PATCH -perfbook 4/4] toolsoftrade: " Akira Yokosawa
@ 2021-05-06 17:52 ` Paul E. McKenney
  4 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2021-05-06 17:52 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Thu, May 06, 2021 at 11:21:36PM +0900, Akira Yokosawa wrote:
> Hi Paul,
> 
> Now that end-of-sentence periods are at the end of input lines,
> wrong uses of \Cref{}, \cref{}, and their variants can be detected
> by a not-that-intelligent script.
> 
> Patch 1/4 adds the set of scripts.
> They also detect raw \ref{} and \lnref{} commands.
> 
> Patches 2/4--4/4 are updates of intro, cpu, and toolsoftrade
> to satisfy the scripts.
> 
> Remaining updates of the other chapters will follow when they
> are ready.
> 
> My plan is to add the check as a make target once the other updates
> are finished.
> 
> In the meantime, you can run the check of each file by:
> 
>     ./utilities/cleverefcheck.pl count/count.tex
> 
> , or tree-wide:
> 
>     ./utilities/cleverefcheck.sh

Nice!!!  And sounds like a good plan.

Queued and pushed, thank you!

							Thanx, Paul

>         Thanks, Akira
> 
> --
> Akira Yokosawa (4):
>   cleverefcheck: Add check script of cleveref macro usage
>   intro: Employ \cref{} and its variants
>   cpu: Employ \cref{} and its variants
>   toolsoftrade: Employ \cref{} and its variants
> 
>  cpu/cpu.tex                   |   6 +-
>  cpu/hwfreelunch.tex           |  12 +-
>  cpu/overheads.tex             |  26 +--
>  cpu/overview.tex              |  32 +--
>  cpu/swdesign.tex              |  18 +-
>  intro/intro.tex               |  24 +-
>  perfbook-lt.tex               |  21 ++
>  toolsoftrade/toolsoftrade.tex | 412 +++++++++++++++++-----------------
>  utilities/cleverefcheck.pl    |  96 ++++++++
>  utilities/cleverefcheck.sh    |  23 ++
>  10 files changed, 402 insertions(+), 268 deletions(-)
>  create mode 100755 utilities/cleverefcheck.pl
>  create mode 100755 utilities/cleverefcheck.sh
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-06 17:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 14:21 [PATCH -perfbook 0/4] Add script to check cleveref macro uses Akira Yokosawa
2021-05-06 14:23 ` [PATCH -perfbook 1/4] cleverefcheck: Add check script of cleveref macro usage Akira Yokosawa
2021-05-06 14:24 ` [PATCH -perfbook 2/4] intro: Employ \cref{} and its variants Akira Yokosawa
2021-05-06 14:26 ` [PATCH -perfbook 3/4] cpu: " Akira Yokosawa
2021-05-06 14:27 ` [PATCH -perfbook 4/4] toolsoftrade: " Akira Yokosawa
2021-05-06 17:52 ` [PATCH -perfbook 0/4] Add script to check cleveref macro uses Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.