All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -perfbook 0/8] Employ cleveref macros, take four
@ 2021-05-18 12:13 Akira Yokosawa
  2021-05-18 12:15 ` [PATCH -perfbook 1/8] fixsvgfonts: Add pattern for 'sans-serif' Akira Yokosawa
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Hi Paul,

This is (hopefully) the final round of \cref{} related updates.

Patches 1/8 and 2/8 are unrelated updates in build scripts.
Patch 1/8 adds patterns to catch font specifiers in .svg
figures recently added/updated.
Patch 2/8 gets rid of noindentafter.sty in our repository.
Recent distros have this package on their own.

Patches 3/8--8/8 are \cref{} related updates.

I'm thinking of adding a few patterns to cleverefcheck.pl
to catch indents by white spaces.

At least, "sh ./utilities/cleverefcheck.sh" should pass
after this patch set is applied.

Note:

    styleguide.tex is not checked by the shell script.
    I'll update it as far as I can, but it won't be
    warning-free due to its nature.

        Thanks, Akira

--
Akira Yokosawa (8):
  fixsvgfonts: Add pattern for 'sans-serif'
  Omit noindentafter.sty
  defer: Employ \cref{} and its variants, take three
  datastruct: Employ \cref{} and its variants
  debugging: Employ \cref{} and its variants
  formal: Employ \cref{} and its variants
  together, advsync, memorder: Employ \cref{} and its variants
  easy, future, appendix: Employ \cref{} and its variants

 Makefile                             |   1 -
 ack.tex                              |  40 +--
 advsync/advsync.tex                  |   2 +-
 advsync/rt.tex                       |   4 +-
 appendix/questions/after.tex         |   8 +-
 appendix/questions/removelocking.tex |   2 +-
 appendix/toyrcu/toyrcu.tex           |  10 +-
 appendix/whymb/whymemorybarriers.tex |   2 +-
 datastruct/datastruct.tex            | 348 +++++++++++++--------------
 debugging/debugging.tex              | 140 +++++------
 defer/rcuapi.tex                     |  90 +++----
 defer/rcuexercises.tex               |  12 +-
 defer/rcuintro.tex                   |   2 +-
 defer/rcurelated.tex                 |  10 +-
 defer/rcuusage.tex                   | 181 +++++++-------
 defer/updates.tex                    |   4 +-
 defer/whichtochoose.tex              |  18 +-
 easy/easy.tex                        |   6 +-
 formal/axiomatic.tex                 |  90 +++----
 formal/dyntickrcu.tex                |   2 +-
 formal/formal.tex                    |   6 +-
 formal/sat.tex                       |   2 +-
 formal/spinhint.tex                  | 122 +++++-----
 formal/stateless.tex                 |   2 +-
 future/cpu.tex                       |  10 +-
 future/formalregress.tex             |   2 +-
 future/htm.tex                       |   2 +-
 future/tm.tex                        |   2 +-
 glossary.tex                         |   2 +-
 memorder/memorder.tex                |  24 +-
 noindentafter.sty                    | 194 ---------------
 summary.tex                          |   2 +-
 together/applyrcu.tex                |   4 +-
 together/refcnt.tex                  |   4 +-
 utilities/fixsvgfonts-urwps.sh       |   2 +
 utilities/fixsvgfonts.sh             |   2 +
 36 files changed, 581 insertions(+), 773 deletions(-)
 delete mode 100644 noindentafter.sty

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH -perfbook 1/8] fixsvgfonts: Add pattern for 'sans-serif'
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
@ 2021-05-18 12:15 ` Akira Yokosawa
  2021-05-18 12:19 ` [PATCH -perfbook 2/8] Omit noindentafter.sty Akira Yokosawa
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:15 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Recently updated .svg figures have new font-specification patterns
due to inkscape's behavior change.
Add patterns to fixsvgfonts scripts to catch them and make sure
"Nimbus Sans" fonts are embedded in .pdf.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 utilities/fixsvgfonts-urwps.sh | 2 ++
 utilities/fixsvgfonts.sh       | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/utilities/fixsvgfonts-urwps.sh b/utilities/fixsvgfonts-urwps.sh
index 55a5a3f2..17c30e81 100644
--- a/utilities/fixsvgfonts-urwps.sh
+++ b/utilities/fixsvgfonts-urwps.sh
@@ -9,6 +9,8 @@ sed	-e 's+family:Helvetica+family:Nimbus Sans+g' \
 	-e 's+family="Helvetica+family="Nimbus Sans+g' \
 	-e 's+family:Sans+family:Nimbus Sans+g' \
 	-e 's+cation:Sans+cation:Nimbus Sans+g' \
+	-e 's+family:sans-serif+family:Nimbus Sans+g' \
+	-e 's+cation:sans-serif+cation:Nimbus Sans+g' \
 	-e 's+family:Courier+family:Nimbus Mono PS+g' \
 	-e 's+family="Courier+family="Nimbus Mono PS+g' \
 	-e 's+family:Symbol+family:MdSymbol+g' \
diff --git a/utilities/fixsvgfonts.sh b/utilities/fixsvgfonts.sh
index 657c3001..47dbaea9 100644
--- a/utilities/fixsvgfonts.sh
+++ b/utilities/fixsvgfonts.sh
@@ -9,5 +9,7 @@ sed	-e 's+family:Helvetica+family:Nimbus Sans L+g' \
 	-e 's+family="Helvetica+family="Nimbus Sans L+g' \
 	-e 's+family:Sans+family:Nimbus Sans L+g' \
 	-e 's+cation:Sans+cation:Nimbus Sans L+g' \
+	-e 's+family:sans-serif+family:Nimbus Sans L+g' \
+	-e 's+cation:sans-serif+cation:Nimbus Sans L+g' \
 	-e 's+family:Courier+family:Nimbus Mono L+g' \
 	-e 's+family="Courier+family="Nimbus Mono L+g'
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -perfbook 2/8] Omit noindentafter.sty
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
  2021-05-18 12:15 ` [PATCH -perfbook 1/8] fixsvgfonts: Add pattern for 'sans-serif' Akira Yokosawa
@ 2021-05-18 12:19 ` Akira Yokosawa
  2021-05-18 12:20 ` [PATCH -perfbook 3/8] defer: Employ \cref{} and its variants, take three Akira Yokosawa
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:19 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

noindentafter.sty was added to perfbook's repository for those
with old TeX Live packages such as the one on Ubuntu Trusty.

Now that we assume Ubuntu Bionic or later for build environment,
we can expect that it is available in the properly installed
TeX Live.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 Makefile          |   1 -
 noindentafter.sty | 194 ----------------------------------------------
 2 files changed, 195 deletions(-)
 delete mode 100644 noindentafter.sty

diff --git a/Makefile b/Makefile
index b23a7096..3dc7e12c 100644
--- a/Makefile
+++ b/Makefile
@@ -9,7 +9,6 @@ LATEXSOURCES = \
 	glossary.tex \
 	qqz.sty origpub.sty \
 	glsdict.tex indexsee.tex \
-	noindentafter.sty \
 	pfbook.cls \
 	ushyphex.tex pfhyphex.tex \
 	ack.tex \
diff --git a/noindentafter.sty b/noindentafter.sty
deleted file mode 100644
index 7a362248..00000000
--- a/noindentafter.sty
+++ /dev/null
@@ -1,194 +0,0 @@
-%% Notice to comply with Clause 6 of LPPL v 1.3c
-%% This file is copied from TeX Live 2015/Debian.
-%% Whole package including documentation can be obtained from:
-%%     https://ctan.org/pkg/noindentafter
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \iffalse %%%%
-%                                                                              %
-%  Copyright (c) 2014 - Michiel Helvensteijn - www.mhelvens.net                %
-%                                                                              %
-%  http://latex-noindentafter.googlecode.com                                   %
-%                                                                              %
-%  This work may be distributed and/or modified under the conditions           %
-%  of the LaTeX Project Public License, either version 1.3 of this             %
-%  license or (at your option) any later version. The latest version           %
-%  of this license is in     http://www.latex-project.org/lppl.txt             %
-%  and version 1.3 or later is part of all distributions of LaTeX              %
-%  version 2005/12/01 or later.                                                %
-%                                                                              %
-%  This work has the LPPL maintenance status `maintained'.                     %
-%                                                                              %
-%  The Current Maintainer of this work is Michiel Helvensteijn.                %
-%                                                                              %
-%  This work consists of the files noindentafter.tex and noindentafter.sty.    %
-%                                                                              %
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \fi %%%%
-
-% \CheckSum{50}
-%
-% \CharacterTable
-%  {Upper-case    \A\B\C\D\E\F\G\H\I\J\K\L\M\N\O\P\Q\R\S\T\U\V\W\X\Y\Z
-%   Lower-case    \a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z
-%   Digits        \0\1\2\3\4\5\6\7\8\9
-%   Exclamation   \!     Double quote  \"     Hash (number) \#
-%   Dollar        \$     Percent       \%     Ampersand     \&
-%   Acute accent  \'     Left paren    \(     Right paren   \)
-%   Asterisk      \*     Plus          \+     Comma         \,
-%   Minus         \-     Point         \.     Solidus       \/
-%   Colon         \:     Semicolon     \;     Less than     \<
-%   Equals        \=     Greater than  \>     Question mark \?
-%   Commercial at \@     Left bracket  \[     Backslash     \\
-%   Right bracket \]     Circumflex    \^     Underscore    \_
-%   Grave accent  \`     Left brace    \{     Vertical bar  \|
-%   Right brace   \}     Tilde         \~}
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% \subsection{Package Info}                                                    %
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%
-%    \begin{macrocode}
-\NeedsTeXFormat{LaTeX2e}
-\ProvidesPackage{noindentafter}[2014/11/30 0.2.2
-  prevent paragraph indentation after specific environments or macros]
-%    \end{macrocode}
-%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% \subsection{Packages}                                                        %
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%
-%    \begin{macrocode}
-\RequirePackage{etoolbox}
-%    \end{macrocode}
-%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% \subsection{Patches}                                                         %
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%% \needspace{5\baselineskip}\begin{macro}{\end}
-%
-%  The package |etoolbox| provides the command
-%  |\AfterEndEnvironment| which creates a hook executed at a
-%  very late point inside the |\end| command. However, this
-%  hook is still located before |\ignorespaces|, which is
-%  too early to properly suppress the indention after an
-%  environment. Therefore another hook is now added to |\end|
-%  using |\patchcmd|. This new hook puts new code at the very
-%  end.
-%
-%    \begin{macrocode}
-\patchcmd\end{%
-  \if@ignore\@ignorefalse\ignorespaces\fi%
-}{%
-  \if@ignore\@ignorefalse\ignorespaces\fi%
-  \csuse{@noindent@#1@hook}%
-}{}{%
-  \PackageWarningNoLine{noindentafter}{%
-    Patching `\string\end' failed!\MessageBreak%
-    `\string\NoIndentAfter...' commands won't work%
-  }%
-}
-%    \end{macrocode}
-%
-%\end{macro}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% \subsection{Macros}                                                          %
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%% \needspace{5\baselineskip}\begin{macro}{\@NoIndentAfter}
-%
-%  \noindent This command implements the main principle
-%  behind this package. It checks whether it is followed by
-%  a paragraph. If so, the command |\par| is temporarily
-%  changed using |\everypar|, so that the following paragraph
-%  is not indented. Immediately afterwards, default paragraph
-%  behavior is restored with |\@restorepar| (from the \LaTeX{}
-%  base).
-%
-%    \begin{macrocode}
-\newcommand*\@NoIndentAfter{%
-  \@ifnextchar\par{%
-    \def\par{%
-      \everypar{\setbox\z@\lastbox\everypar{}}%
-      \@restorepar%
-    }%
-  }{}%
-}
-%    \end{macrocode}
-%
-%\end{macro}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%% \needspace{5\baselineskip}\begin{macro}{\NoIndentAfterThis}
-%
-%  \noindent Enforce a paragraph break and suppress
-%  indentation for whatever follows.
-% 
-%    \begin{macrocode}
-\newrobustcmd*{\NoIndentAfterThis}{\@NoIndentAfter\par\par}
-%    \end{macrocode}
-% 
-%\end{macro}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-  % \needspace{5\baselineskip}\begin{macro}{\NoIndentAfterEnv}
-%%% \marg{environment}\\
-%
-%  \noindent Append |\@NoIndentAfter| to the output of
-%  \meta{environment} by using the new environment hook.
-% 
-%    \begin{macrocode}
-\newrobustcmd{\NoIndentAfterEnv}[1]{%
-  \csdef{@noindent@#1@hook}{\@NoIndentAfter}%
-}
-%    \end{macrocode}
-% 
-%\end{macro}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-  % \needspace{5\baselineskip}\begin{macro}{\NoIndentAfterCmd}
-%%% \marg{command}\\
-%
-%  \noindent Append |\NoIndentAfterThis| to the output of
-%  \meta{command}.
-% 
-%    \begin{macrocode}
-\newrobustcmd*{\NoIndentAfterCmd}[1]{%
-  \apptocmd{#1}{\NoIndentAfterThis}{}{%
-    \PackageWarning{noindentafter}{%
-      Patching `\string#1' failed!\MessageBreak%
-      `\string\NoIndentAfterCmd' won't work%
-    }%
-  }%
-}
-%    \end{macrocode}
-% 
-%\end{macro}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -perfbook 3/8] defer: Employ \cref{} and its variants, take three
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
  2021-05-18 12:15 ` [PATCH -perfbook 1/8] fixsvgfonts: Add pattern for 'sans-serif' Akira Yokosawa
  2021-05-18 12:19 ` [PATCH -perfbook 2/8] Omit noindentafter.sty Akira Yokosawa
@ 2021-05-18 12:20 ` Akira Yokosawa
  2021-05-18 12:21 ` [PATCH -perfbook 4/8] datastruct: Employ \cref{} and its variants Akira Yokosawa
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:20 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Also fix indent by white spaces.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 defer/rcuapi.tex        |  90 ++++++++++----------
 defer/rcuexercises.tex  |  12 +--
 defer/rcuintro.tex      |   2 +-
 defer/rcurelated.tex    |  10 +--
 defer/rcuusage.tex      | 181 ++++++++++++++++++++--------------------
 defer/updates.tex       |   4 +-
 defer/whichtochoose.tex |  18 ++--
 7 files changed, 158 insertions(+), 159 deletions(-)

diff --git a/defer/rcuapi.tex b/defer/rcuapi.tex
index a7de666c..1554463d 100644
--- a/defer/rcuapi.tex
+++ b/defer/rcuapi.tex
@@ -9,23 +9,23 @@
 This section looks at RCU from the viewpoint of its Linux-kernel API\@.\footnote{
 	Userspace RCU's API is documented
 	elsewhere~\cite{PaulMcKenney2013LWNURCU}.}
-Section~\ref{sec:defer:RCU has a Family of Wait-to-Finish APIs}
+\Cref{sec:defer:RCU has a Family of Wait-to-Finish APIs}
 presents RCU's wait-to-finish APIs,
-Section~\ref{sec:defer:RCU has Publish-Subscribe and Version-Maintenance APIs}
+\cref{sec:defer:RCU has Publish-Subscribe and Version-Maintenance APIs}
 presents RCU's publish-subscribe and version-maintenance APIs,
-Section~\ref{sec:defer:RCU has List-Processing APIs}
+\cref{sec:defer:RCU has List-Processing APIs}
 presents RCU's list-processing APIs,
-Section~\ref{sec:defer:RCU Has Diagnostic APIs}
+\cref{sec:defer:RCU Has Diagnostic APIs}
 presents RCU's diagnostic APIs, and
-Section~\ref{sec:defer:Where Can RCU's APIs Be Used?}
+\cref{sec:defer:Where Can RCU's APIs Be Used?}
 describes in which contexts RCU's various APIs may be used.
 Finally,
-Section~\ref{sec:defer:So; What is RCU Really?}
+\cref{sec:defer:So; What is RCU Really?}
 presents concluding remarks.
 
 Readers who are not excited about kernel internals may wish to skip
 ahead to \cref{sec:defer:RCU Usage}
-on page~\pageref{sec:defer:RCU Usage}.
+on \cpageref{sec:defer:RCU Usage}.
 
 \subsubsection{RCU has a Family of Wait-to-Finish APIs}
 \label{sec:defer:RCU has a Family of Wait-to-Finish APIs}
@@ -34,11 +34,11 @@ The most straightforward answer to ``what is RCU'' is that RCU is
 an API\@.
 For example, the RCU implementation used in the Linux kernel is
 summarized by
-Table~\ref{tab:defer:RCU Wait-to-Finish APIs},
+\cref{tab:defer:RCU Wait-to-Finish APIs},
 which shows the wait-for-readers portions of the RCU, ``sleepable'' RCU
 (SRCU), Tasks RCU, and generic APIs, respectively,
 and by
-Table~\ref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs},
+\cref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs},
 which shows the publish-subscribe portions of the
 API~\cite{PaulEMcKenney2019RCUAPI}.\footnote{
 	This citation covers v4.20 and later.
@@ -152,7 +152,7 @@ API~\cite{PaulEMcKenney2019RCUAPI}.\footnote{
 
 If you are new to RCU, you might consider focusing on just one
 of the columns in
-Table~\ref{tab:defer:RCU Wait-to-Finish APIs},
+\cref{tab:defer:RCU Wait-to-Finish APIs},
 each of which summarizes one member of the Linux kernel's RCU API family.
 For example, if you are primarily interested in understanding how RCU
 is used in the Linux kernel, ``RCU'' would be the place to start,
@@ -166,7 +166,7 @@ serve as a useful reference.
 
 \QuickQuiz{
 	Why do some of the cells in
-	Table~\ref{tab:defer:RCU Wait-to-Finish APIs}
+	\cref{tab:defer:RCU Wait-to-Finish APIs}
 	have exclamation marks (``!'')?
 }\QuickQuizAnswer{
 	The API members with exclamation marks (\co{rcu_read_lock()},
@@ -243,7 +243,7 @@ The \co{rcu_barrier()} primitive does this job.
 
 Finally, RCU may be used to provide
 type-safe memory~\cite{Cheriton96a}, as described in
-Section~\ref{sec:defer:RCU Provides Type-Safe Memory}.
+\cref{sec:defer:RCU Provides Type-Safe Memory}.
 In the context of RCU, type-safe memory guarantees that a given
 data element will not change type during any RCU read-side critical section
 that accesses it.
@@ -251,7 +251,7 @@ To make use of RCU-based type-safe memory, pass
 \co{SLAB_TYPESAFE_BY_RCU} to \co{kmem_cache_create()}.
 
 The ``SRCU'' column in
-Table~\ref{tab:defer:RCU Wait-to-Finish APIs}
+\cref{tab:defer:RCU Wait-to-Finish APIs}
 displays a specialized RCU API that permits general sleeping in SRCU
 read-side critical
 sections~\cite{PaulEMcKenney2006c}
@@ -280,7 +280,7 @@ the expense of increased CPU overhead.
 	\co{srcu_struct}.
 	In practice, however, doing this is almost certainly a bad idea.
 	In particular, the code shown in
-	Listing~\ref{lst:defer:Multistage SRCU Deadlocks}
+	\cref{lst:defer:Multistage SRCU Deadlocks}
 	could still result in deadlock.
 
 \begin{listing}
@@ -403,7 +403,7 @@ rivaling that of RCU\@.
 
 Fortunately, the RCU publish-subscribe and version-maintenance
 primitives shown in
-Table~\ref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs}
+\cref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs}
 apply to all of the variants of RCU discussed above.
 This commonality can allow more code to be shared, and reduces API
 proliferation.
@@ -470,7 +470,7 @@ creating RCU-protected linked data structures, such as RCU-protected
 arrays and trees.
 The special case of linked lists is handled by a separate set of
 APIs described in
-Section~\ref{sec:defer:RCU has List-Processing APIs}.
+\cref{sec:defer:RCU has List-Processing APIs}.
 
 The first category publishes pointers to new data items.
 The \co{rcu_assign_pointer()} primitive ensures that any
@@ -487,7 +487,7 @@ pointer and free the structure referenced by the old pointer.
 \QuickQuizB{
 	Normally, any pointer subject to \co{rcu_dereference()} \emph{must}
 	always be updated using one of the pointer-publish functions in
-	Table~\ref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs},
+	\cref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs},
 	for example, \co{rcu_assign_pointer()}.
 
 	What is an exception to this rule?
@@ -517,7 +517,7 @@ pointer and free the structure referenced by the old pointer.
 	work out which type of RCU read-side critical section a given
 	RCU traversal primitive corresponds to.
 	For example, consider the code shown in
-	Listing~\ref{lst:defer:Diverse RCU Read-Side Nesting}.
+	\cref{lst:defer:Diverse RCU Read-Side Nesting}.
 
 \begin{listing}
 \begin{VerbatimL}
@@ -635,11 +635,11 @@ Linux has four variants of doubly linked list, the circular
 \co{struct hlist_bl_head}/\co{struct hlist_bl_node}
 pairs.
 The former is laid out as shown in
-Figure~\ref{fig:defer:Linux Circular Linked List (list)},
+\cref{fig:defer:Linux Circular Linked List (list)},
 where the green (leftmost) boxes represent the list header and the blue
 (rightmost three) boxes represent the elements in the list.
 This notation is cumbersome, and will therefore be abbreviated as shown in
-Figure~\ref{fig:defer:Linux Linked List Abbreviated},
+\cref{fig:defer:Linux Linked List Abbreviated},
 which shows only the non-header (blue) elements.
 
 \begin{figure}
@@ -656,16 +656,16 @@ Linux's \co{hlist}\footnote{
 is a linear list, which means that
 it needs only one pointer for the header rather than the two
 required for the circular list, as shown in
-Figure~\ref{fig:defer:Linux Linear Linked List (hlist)}.
+\cref{fig:defer:Linux Linear Linked List (hlist)}.
 Thus, use of \co{hlist} can halve the memory consumption for the hash-bucket
 arrays of large hash tables.
 As before, this notation is cumbersome, so \co{hlist} structures will
 be abbreviated in the same way \co{list_head}-style lists are, as shown in
-Figure~\ref{fig:defer:Linux Linked List Abbreviated}.
+\cref{fig:defer:Linux Linked List Abbreviated}.
 
 A variant of Linux's \co{hlist}, named \co{hlist_nulls}, provides multiple
 distinct \co{NULL} pointers, but otherwise uses the same layout as shown in
-Figure~\ref{fig:defer:Linux Linear Linked List (hlist)}.
+\cref{fig:defer:Linux Linear Linked List (hlist)}.
 In this variant, a \co{->next} pointer having a zero low-order bit is
 considered to be a pointer.
 However, if the low-order bit is set to one, the upper bits identify
@@ -719,7 +719,7 @@ source tree, with helpful example code provided in the
 Another variant of Linux's \co{hlist} incorporates bit-locking,
 and is named \co{hlist_bl}.
 This variant uses the same layout as shown in
-Figure~\ref{fig:defer:Linux Linear Linked List (hlist)},
+\cref{fig:defer:Linux Linear Linked List (hlist)},
 but reserves the low-order bit of the head pointer (``first'' in the
 figure) to lock the list.
 This approach also reduces memory usage, as it allows what would otherwise
@@ -829,7 +829,7 @@ be a separate spinlock to be stored with the pointer itself.
 \end{sidewaystable*}
 
 The API members for these linked-list variants are summarized in
-Table~\ref{tab:defer:RCU-Protected List APIs}.
+\cref{tab:defer:RCU-Protected List APIs}.
 More information is available in the \path{Documentation/RCU}
 directory of the Linux-kernel source tree and at
 Linux Weekly News~\cite{PaulEMcKenney2019RCUAPI}.
@@ -870,7 +870,7 @@ kfree(p);				\lnlbl[kfree]
 \end{figure}
 
 The following discussion walks through this code, using
-Figure~\ref{fig:defer:RCU Replacement in Linked List} to illustrate
+\cref{fig:defer:RCU Replacement in Linked List} to illustrate
 the state changes.
 The triples in each element represent the values of fields \co{->a},
 \co{->b}, and \co{->c}, respectively.
@@ -889,29 +889,29 @@ with \co{5,2,3} in such a way that any given reader sees one of these
 two values.
 
 \begin{fcvref}[ln:defer:Canonical RCU Replacement Example (2nd)]
-Line~\lnref{kmalloc} allocates a replacement element,
+\Clnref{kmalloc} allocates a replacement element,
 resulting in the state as shown in the second row of
-Figure~\ref{fig:defer:RCU Replacement in Linked List}.
+\cref{fig:defer:RCU Replacement in Linked List}.
 At this point, no reader can hold a reference to the newly allocated
 element (as indicated by its green shading), and it is uninitialized
 (as indicated by the question marks).
 
-Line~\lnref{copy} copies the old element to the new one, resulting in the
+\Clnref{copy} copies the old element to the new one, resulting in the
 state as shown in the third row of
-Figure~\ref{fig:defer:RCU Replacement in Linked List}.
+\cref{fig:defer:RCU Replacement in Linked List}.
 The newly allocated element still cannot be referenced by readers, but
 it is now initialized.
 
-Line~\lnref{update1} updates \co{q->b} to the value ``2'', and
-line~\lnref{update2} updates \co{q->c} to the value ``3'',
+\Clnref{update1} updates \co{q->b} to the value ``2'', and
+\clnref{update2} updates \co{q->c} to the value ``3'',
 as shown on the fourth row of
-Figure~\ref{fig:defer:RCU Replacement in Linked List}.
+\cref{fig:defer:RCU Replacement in Linked List}.
 Note that the newly allocated structure is still inaccessible to readers.
 
-Now, line~\lnref{replace} does the replacement, so that the new element is
+Now, \clnref{replace} does the replacement, so that the new element is
 finally visible to readers, and hence is shaded red, as shown on
 the fifth row of
-Figure~\ref{fig:defer:RCU Replacement in Linked List}.
+\cref{fig:defer:RCU Replacement in Linked List}.
 At this point, as shown below, we have two versions of the list.
 Pre-existing readers might see the \co{5,6,7} element (which is
 therefore now shaded yellow), but
@@ -919,7 +919,7 @@ new readers will instead see the \co{5,2,3} element.
 But any given reader is guaranteed to see one set of values or the
 other, not a mixture of the two.
 
-After the \co{synchronize_rcu()} on line~\lnref{sync_rcu} returns,
+After the \co{synchronize_rcu()} on \clnref{sync_rcu} returns,
 a grace period will have elapsed, and so all reads that started before the
 \co{list_replace_rcu()} will have completed.
 In particular, any readers that might have been holding references
@@ -928,20 +928,20 @@ their RCU read-side critical sections, and are thus prohibited from
 continuing to hold a reference.
 Therefore, there can no longer be any readers holding references
 to the old element, as indicated its green shading in the sixth row of
-Figure~\ref{fig:defer:RCU Replacement in Linked List}.
+\cref{fig:defer:RCU Replacement in Linked List}.
 As far as the readers are concerned, we are back to having a single version
 of the list, but with the new element in place of the old.
 
-After the \co{kfree()} on line~\lnref{kfree} completes, the list will
+After the \co{kfree()} on \clnref{kfree} completes, the list will
 appear as shown on the final row of
-Figure~\ref{fig:defer:RCU Replacement in Linked List}.
+\cref{fig:defer:RCU Replacement in Linked List}.
 \end{fcvref}
 
 Despite the fact that RCU was named after the replacement case,
 the vast majority of RCU usage within the Linux kernel relies on
 the simple independent insertion and deletion, as was shown in
-Figure~\ref{fig:defer:Multiple RCU Data-Structure Versions} in
-Section~\ref{sec:defer:Maintain Multiple Versions of Recently Updated Objects}.
+\cref{fig:defer:Multiple RCU Data-Structure Versions} in
+\cref{sec:defer:Maintain Multiple Versions of Recently Updated Objects}.
 
 The next section looks at APIs that assist developers in debugging
 their code that makes use of RCU\@.
@@ -989,7 +989,7 @@ lockdep support &
 \label{tab:defer:RCU Diagnostic APIs}
 \end{table}
 
-Table~\ref{tab:defer:RCU Diagnostic APIs}
+\Cref{tab:defer:RCU Diagnostic APIs}
 shows RCU's diagnostic APIs.
 
 The \co{__rcu} marks an RCU-protected pointer, for example,
@@ -1081,7 +1081,7 @@ an RCU, RCU-bh, or RCU-sched read-side critical section.
 \label{fig:defer:RCU API Usage Constraints}
 \end{figure}
 
-Figure~\ref{fig:defer:RCU API Usage Constraints}
+\Cref{fig:defer:RCU API Usage Constraints}
 shows which APIs may be used in which in-kernel environments.
 The RCU read-side primitives may be used in any environment, including NMI,
 the RCU mutation and asynchronous grace-period primitives may be used in any
@@ -1105,7 +1105,7 @@ to complete, and maintenance of multiple versions.
 That said, it is possible to build higher-level constructs
 on top of RCU, including the reader-writer-locking, reference-counting,
 and existence-guarantee constructs listed in
-Section~\ref{sec:defer:RCU Usage}.
+\cref{sec:defer:RCU Usage}.
 Furthermore, I have no doubt that the Linux community will continue to
 find interesting new uses for RCU,
 just as they do for any of a number of synchronization
@@ -1116,7 +1116,7 @@ all of the things you can do with these APIs.
 
 However, for many people, a complete view of RCU must include sample
 RCU implementations.
-Appendix~\ref{chp:app:``Toy'' RCU Implementations} therefore presents a series
+\Cref{chp:app:``Toy'' RCU Implementations} therefore presents a series
 of ``toy'' RCU implementations of increasing complexity and capability,
 though others might prefer the classic
 ``User-Level Implementations of Read-Copy
diff --git a/defer/rcuexercises.tex b/defer/rcuexercises.tex
index 476819a9..d12d831c 100644
--- a/defer/rcuexercises.tex
+++ b/defer/rcuexercises.tex
@@ -15,7 +15,7 @@ suffice for most of these exercises.
 
 \EQuickQuiz{
 	The statistical-counter implementation shown in
-	Listing~\ref{lst:count:Per-Thread Statistical Counters}
+	\cref{lst:count:Per-Thread Statistical Counters}
 	(\path{count_end.c})
 	used a global lock to guard the summation in \co{read_count()},
 	which resulted in poor performance and negative scalability.
@@ -54,14 +54,14 @@ suffice for most of these exercises.
 	Why or why not?
 
 	See
-	Section~\ref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters}
+	\cref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters}
 	on
-	page~\pageref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters}
+	\cpageref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters}
 	for more details.
 }\EQuickQuizEnd
 
 \EQuickQuiz{
-	Section~\ref{sec:count:Applying Exact Limit Counters}
+	\Cref{sec:count:Applying Exact Limit Counters}
 	showed a fanciful pair of code fragments that dealt with counting
 	I/O accesses to removable devices.
 	These code fragments suffered from high overhead on the fastpath
@@ -78,8 +78,8 @@ suffice for most of these exercises.
 	device-removal code fragment to suit.
 
 	See
-	Section~\ref{sec:together:RCU and Counters for Removable I/O Devices}
+	\cref{sec:together:RCU and Counters for Removable I/O Devices}
 	on
-	Page~\pageref{sec:together:RCU and Counters for Removable I/O Devices}
+	\cpageref{sec:together:RCU and Counters for Removable I/O Devices}
 	for one solution to this problem.
 }\EQuickQuizEnd
diff --git a/defer/rcuintro.tex b/defer/rcuintro.tex
index 1bfe6265..a3c9cc58 100644
--- a/defer/rcuintro.tex
+++ b/defer/rcuintro.tex
@@ -145,7 +145,7 @@ the state, as indicated by the ``2 Versions'' in the figure.
 	data reflecting that reality.
 	Many of those algorithms are also able to tolerate some degree
 	of inconsistency within the in-computer data.
-	\cref{sec:datastruct:RCU-Protected Hash Table Discussion}
+	\Cref{sec:datastruct:RCU-Protected Hash Table Discussion}
 	discusses this point in more detail.
 
 	Please note that this need to tolerate inconsistent and stale
diff --git a/defer/rcurelated.tex b/defer/rcurelated.tex
index 4c27ed8f..9061b99d 100644
--- a/defer/rcurelated.tex
+++ b/defer/rcurelated.tex
@@ -35,11 +35,11 @@ As of 2021, Linux-kernel RCU is still under active development.
 However, in the mid 2010s, there was a welcome upsurge in RCU research
 and development across a number of communities and
 institutions~\cite{FransKaashoek2015ParallelOSHistory}.
-Section~\ref{sec:defer:RCU Uses} describes uses of RCU,
-Section~\ref{sec:defer:RCU Implementations} describes RCU implementations
+\Cref{sec:defer:RCU Uses} describes uses of RCU,
+\cref{sec:defer:RCU Implementations} describes RCU implementations
 (as well as work that both creates and uses an implementation),
 and finally,
-Section~\ref{sec:defer:RCU Validation} describes verification and validation
+\cref{sec:defer:RCU Validation} describes verification and validation
 of RCU and its uses.
 
 \subsubsection{RCU Uses}
@@ -145,7 +145,7 @@ pressed scalable non-zero indicators
 (SNZI)~\cite{FaithEllen:2007:SNZI} into service as a grace-period
 mechanism.
 The intended use is to implement software transactional memory
-(see Section~\ref{sec:future:Transactional Memory}), which
+(see \cref{sec:future:Transactional Memory}), which
 imposes linearizability requirements, which in turn seems to
 limit scalability.
 
@@ -225,7 +225,7 @@ This paper also made some interesting performance-evaluation choices that
 are discussed further in
 \cref{sec:future:Deferred Reclamation}
 on
-page~\ref{sec:future:Deferred Reclamation}.
+\cpageref{sec:future:Deferred Reclamation}.
 
 \ppl{Adam}{Belay} et al.~created an RCU implementation that guards the
 data structures used by TCP/IP's address-resolution protocol (ARP)
diff --git a/defer/rcuusage.tex b/defer/rcuusage.tex
index 6586b27f..b84e32a4 100644
--- a/defer/rcuusage.tex
+++ b/defer/rcuusage.tex
@@ -40,15 +40,15 @@ This section answers the question ``What is RCU?'' from the viewpoint
 of the uses to which RCU can be put.
 Because RCU is most frequently used to replace some existing mechanism,
 we look at it primarily in terms of its relationship to such mechanisms,
-as listed in Table~\ref{tab:defer:RCU Usage}.
+as listed in \cref{tab:defer:RCU Usage}.
 Following the sections listed in this table,
-Section~\ref{sec:defer:RCU Usage Summary} provides a summary.
+\cref{sec:defer:RCU Usage Summary} provides a summary.
 
 \subsubsection{RCU for Pre-BSD Routing}
 \label{sec:defer:RCU for Pre-BSD Routing}
 
-Listings~\ref{lst:defer:RCU Pre-BSD Routing Table Lookup}
-and~\ref{lst:defer:RCU Pre-BSD Routing Table Add/Delete}
+\Cref{lst:defer:RCU Pre-BSD Routing Table Lookup,%
+lst:defer:RCU Pre-BSD Routing Table Add/Delete}
 show code for an RCU-protected Pre-BSD routing table
 (\path{route_rcu.c}).
 The former shows data structures and \co{route_lookup()},
@@ -67,19 +67,19 @@ and the latter shows \co{route_add()} and \co{route_del()}.
 \end{listing}
 
 \begin{fcvref}[ln:defer:route_rcu:lookup]
-In Listing~\ref{lst:defer:RCU Pre-BSD Routing Table Lookup},
-line~\lnref{rh} adds the \co{->rh} field used by RCU reclamation,
-line~\lnref{re_freed} adds the \co{->re_freed} use-after-free-check field,
-lines~\lnref{lock}, \lnref{unlock1}, and~\lnref{unlock2}
+In \cref{lst:defer:RCU Pre-BSD Routing Table Lookup},
+\clnref{rh} adds the \co{->rh} field used by RCU reclamation,
+\clnref{re_freed} adds the \co{->re_freed} use-after-free-check field,
+\clnref{lock,unlock1,unlock2}
 add RCU read-side protection,
-and lines~\lnref{chk_freed} and~\lnref{abort} add the use-after-free check.
+and \clnref{chk_freed,abort} add the use-after-free check.
 \end{fcvref}
 \begin{fcvref}[ln:defer:route_rcu:add_del]
-In Listing~\ref{lst:defer:RCU Pre-BSD Routing Table Add/Delete},
-lines~\lnref{add:lock}, \lnref{add:unlock}, \lnref{del:lock},
-\lnref{del:unlock1}, and~\lnref{del:unlock2} add update-side locking,
-lines~\lnref{add:add_rcu} and~\lnref{del:del_rcu} add RCU update-side protection,
-line~\lnref{del:call_rcu} causes \co{route_cb()} to be invoked after
+In \cref{lst:defer:RCU Pre-BSD Routing Table Add/Delete},
+\clnref{add:lock,add:unlock,del:lock,%
+del:unlock1,del:unlock2} add update-side locking,
+\clnref{add:add_rcu,del:del_rcu} add RCU update-side protection,
+\clnref{del:call_rcu} causes \co{route_cb()} to be invoked after
 a grace period elapses,
 and \clnrefrange{cb:b}{cb:e} define \co{route_cb()}.
 This is minimal added code for a working concurrent implementation.
@@ -92,7 +92,7 @@ This is minimal added code for a working concurrent implementation.
 \label{fig:defer:Pre-BSD Routing Table Protected by RCU}
 \end{figure}
 
-Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by RCU}
+\Cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
 shows the performance on the read-only workload.
 RCU scales quite well, and offers nearly ideal performance.
 However, this data was generated using the \co{RCU_SIGNAL}
@@ -102,13 +102,13 @@ for which \co{rcu_read_lock()} and \co{rcu_read_unlock()}
 generate a small amount of code.
 What happens for the QSBR flavor of RCU, which generates no code at all
 for \co{rcu_read_lock()} and \co{rcu_read_unlock()}?
-(See Section~\ref{sec:defer:Introduction to RCU},
+(See \cref{sec:defer:Introduction to RCU},
 and especially
-Figure~\ref{fig:defer:QSBR: Waiting for Pre-Existing Readers},
+\cref{fig:defer:QSBR: Waiting for Pre-Existing Readers},
 for a discussion of RCU QSBR\@.)
 
 The answer to this is shown in
-Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR},
+\cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR},
 which shows that RCU QSBR's performance and scalability actually exceeds
 that of the ideal synchronization-free workload.
 
@@ -152,7 +152,7 @@ that of the ideal synchronization-free workload.
 	on modern microprocessors.
 	Such results can be thought of as similar to the celebrated
 	super-linear speedups (see
-	Section~\ref{sec:SMPdesign:Beyond Partitioning}
+	\cref{sec:SMPdesign:Beyond Partitioning}
 	for one such example), that is, of interest but also of limited
 	practical importance.
 	Nevertheless, one of the strengths of RCU is that its read-side
@@ -171,7 +171,7 @@ that of the ideal synchronization-free workload.
 	\co{->re_next.next} pointer also had zero offset, just the
 	same as the sequential variant.
 	And the answer, as can be seen in
-	Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR With Non-Initial rcu-head},
+	\cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR With Non-Initial rcu-head},
 	is that this causes RCU QSBR's performance to decrease to where
 	it is still very nearly ideal, but no longer super-ideal.
 }\QuickQuizEndB
@@ -240,7 +240,7 @@ These advantages and limitations are discussed in the following sections.
 
 The read-side performance advantages of Linux-kernel RCU over
 reader-writer locking are shown in
-Figure~\ref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking},
+\cref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking},
 which was generated on a 448-CPU 2.10\,GHz Intel x86 system.
 
 \QuickQuizSeries{%
@@ -328,7 +328,7 @@ which was generated on a 448-CPU 2.10\,GHz Intel x86 system.
 
 	Of course, it is also the case that the older results were obtained
 	on a different system than were those shown in
-	Figure~\ref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}.
+	\cref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}.
 	So which change had the most effect, Linus's commit or the change in
 	the system?
 	This question is left as an exercise to the reader.
@@ -336,7 +336,7 @@ which was generated on a 448-CPU 2.10\,GHz Intel x86 system.
 
 \QuickQuizE{
 	Why is there such large variation for the \co{rcu} trace in
-	Figure~\ref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}?
+	\cref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}?
 }\QuickQuizAnswerE{
 	Keep in mind that this is a log-log plot, so those large-seeming
 	\co{rcu} variances in reality span only a few hundred picoseconds.
@@ -371,7 +371,7 @@ from 30~runs, with the line being the median.
 A more moderate view may be obtained from a \co{CONFIG_PREEMPT} kernel,
 though RCU still beats reader-writer locking by between a factor of seven
 on a single CPU and by three orders of magnitude on 192~CPUs, as shown in
-Figure~\ref{fig:defer:Performance Advantage of Preemptible RCU Over Reader-Writer Locking},
+\cref{fig:defer:Performance Advantage of Preemptible RCU Over Reader-Writer Locking},
 which was generated on the same 448-CPU 2.10\,GHz x86 system.
 Note the high variability of reader-writer locking at larger numbers of CPUs.
 The error bars span the full range of data.
@@ -403,7 +403,7 @@ is exaggerated by the unrealistic zero-length critical sections.
 The performance advantages of RCU decrease as the overhead of the critical
 sections increase.
 This decrease can be seen in
-Figure~\ref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration},
+\cref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration},
 which was run on the same system as the previous plots.
 Here, the y-axis represents the sum of the overhead of the read-side
 primitives and that of the critical section and the x-axis represents
@@ -413,11 +413,11 @@ separations between the traces still represent significant differences.
 This figure shows non-preemptible RCU, but given that preemptible RCU's
 read-side overhead is only about three nanoseconds, its plot would be
 nearly identical to
-Figure~\ref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}.
+\cref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}.
 
 \QuickQuiz{
 	Why the larger error ranges for the submicrosecond durations in
-	Figure~\ref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}?
+	\cref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}?
 }\QuickQuizAnswer{
 	Because smaller disturbances result in greater relative errors
 	for smaller measurements.
@@ -425,7 +425,7 @@ Figure~\ref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of
 	is (as of 2020) less accurate than is the \co{udelay()} primitive
 	used for the data for durations of a microsecond or more.
 	It is instructive to compare to the zero-length case shown in
-	Figure~\ref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}.
+	\cref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}.
 }\QuickQuizEnd
 
 There are three traces for reader-writer locking, with the upper trace
@@ -570,7 +570,7 @@ RCU readers to finish,
 the RCU readers might well see the change more quickly than would
 batch-fair
 reader-writer-locking readers, as shown in
-Figure~\ref{fig:defer:Response Time of RCU vs. Reader-Writer Locking}.
+\cref{fig:defer:Response Time of RCU vs. Reader-Writer Locking}.
 
 Once the update is received, the rwlock writer cannot proceed until the
 last reader completes, and subsequent readers cannot proceed until the
@@ -608,7 +608,7 @@ Fortunately,
 there are a number of approaches that avoid inconsistency and stale
 data~\cite{PaulEdwardMcKenneyPhD,Arcangeli03}, and some
 methods based on reference counting are discussed in
-Section~\ref{sec:defer:Reference Counting}.
+\cref{sec:defer:Reference Counting}.
 
 \paragraph{Low-Priority RCU Readers Can Block High-Priority Reclaimers}
 
@@ -629,7 +629,7 @@ With the exception of userspace
 RCU~\cite{MathieuDesnoyers2009URCU,PaulMcKenney2013LWNURCU},
 expedited grace periods, and several of the ``toy''
 RCU implementations described in
-Appendix~\ref{chp:app:``Toy'' RCU Implementations},
+\cref{chp:app:``Toy'' RCU Implementations},
 RCU grace periods extend milliseconds.
 Although there are a number of techniques to render such long delays
 harmless, including use of the asynchronous interfaces where available
@@ -641,10 +641,9 @@ situations.
 
 In the best case, the conversion from reader-writer locking to RCU
 is quite simple, as shown in
-Listings~\ref{lst:defer:Converting Reader-Writer Locking to RCU: Data},
-\ref{lst:defer:Converting Reader-Writer Locking to RCU: Search},
-and
-\ref{lst:defer:Converting Reader-Writer Locking to RCU: Deletion},
+\cref{lst:defer:Converting Reader-Writer Locking to RCU: Data,%
+lst:defer:Converting Reader-Writer Locking to RCU: Search,%
+lst:defer:Converting Reader-Writer Locking to RCU: Deletion},
 all taken from
 Wikipedia~\cite{WikipediaRCU}.
 
@@ -849,7 +848,7 @@ waits for any previously acquired references to be released.
 
 Of course, RCU can also be combined with traditional reference counting,
 as discussed in
-Section~\ref{sec:together:Refurbish Reference Counting}.
+\cref{sec:together:Refurbish Reference Counting}.
 
 \begin{figure}
 \centering
@@ -867,8 +866,8 @@ Section~\ref{sec:together:Refurbish Reference Counting}.
 
 But why bother?
 Again, part of the answer is performance, as shown in
-Figures~\ref{fig:defer:Performance of RCU vs. Reference Counting}
-and~\ref{fig:defer:Performance of Preemptible RCU vs. Reference Counting},
+\cref{fig:defer:Performance of RCU vs. Reference Counting,%
+fig:defer:Performance of Preemptible RCU vs. Reference Counting},
 again showing data taken on a 448-CPU 2.1\,GHz Intel x86 system
 for non-preemptible and preemptible Linux-kernel RCU, respectively.
 Non-preemptible RCU's advantage over reference counting ranges from
@@ -887,7 +886,7 @@ one CPU up to about three orders of magnitude at 192~CPUs.
 However, as with reader-writer locking, the performance advantages of
 RCU are most pronounced for short-duration critical sections and for
 large numbers of CPUs, as shown in
-Figure~\ref{fig:defer:Response Time of RCU vs. Reference Counting}
+\cref{fig:defer:Response Time of RCU vs. Reference Counting}
 for the same system.
 In addition, as with reader-writer locking, many system calls (and thus
 any RCU read-side critical sections that they contain) complete in
@@ -983,7 +982,7 @@ Gamsa et al.~\cite{Gamsa99}
 discuss existence guarantees and describe how a mechanism
 resembling RCU can be used to provide these existence guarantees
 (see Section~5 on page 7 of the PDF), and
-Section~\ref{sec:locking:Lock-Based Existence Guarantees}
+\cref{sec:locking:Lock-Based Existence Guarantees}
 discusses how to guarantee existence via locking, along with the
 ensuing disadvantages of doing so.
 The effect is that if any RCU-protected data element is accessed
@@ -1026,27 +1025,27 @@ int delete(int key)
 \end{listing}
 
 \begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking]
-Listing~\ref{lst:defer:Existence Guarantees Enable Per-Element Locking}
+\Cref{lst:defer:Existence Guarantees Enable Per-Element Locking}
 demonstrates how RCU-based existence guarantees can enable
 per-element locking via a function that deletes an element from
 a hash table.
-Line~\lnref{hash} computes a hash function, and line~\lnref{rdlock} enters an RCU
+\Clnref{hash} computes a hash function, and \clnref{rdlock} enters an RCU
 read-side critical section.
-If line~\lnref{chkkey} finds that the corresponding bucket of the hash table is
+If \clnref{chkkey} finds that the corresponding bucket of the hash table is
 empty or that the element present is not the one we wish to delete,
-then line~\lnref{rdunlock1} exits the RCU read-side critical section and
-line~\lnref{ret_0:a}
+then \clnref{rdunlock1} exits the RCU read-side critical section and
+\clnref{ret_0:a}
 indicates failure.
 \end{fcvref}
 
 \QuickQuiz{
 	What if the element we need to delete is not the first element
 	of the list on
-        line~\ref{ln:defer:Existence Guarantees Enable Per-Element Locking:chkkey} of
-	Listing~\ref{lst:defer:Existence Guarantees Enable Per-Element Locking}?
+	\clnrefr{ln:defer:Existence Guarantees Enable Per-Element Locking:chkkey} of
+	\cref{lst:defer:Existence Guarantees Enable Per-Element Locking}?
 }\QuickQuizAnswer{
 	As with
-	Listing~\ref{lst:locking:Per-Element Locking Without Existence Guarantees},
+	\cref{lst:locking:Per-Element Locking Without Existence Guarantees},
 	this is a very simple hash table with no chaining, so the only
 	element in a given bucket is the first element.
 	The reader is again invited to adapt this example to a hash table with
@@ -1056,29 +1055,29 @@ indicates failure.
 }\QuickQuizEnd
 
 \begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking]
-Otherwise, line~\lnref{acq} acquires the update-side spinlock, and
-line~\lnref{chkkey2} then checks that the element is still the one that we want.
-If so, line~\lnref{rdunlock2} leaves the RCU read-side critical section,
-line~\lnref{remove} removes it from the table, line~\lnref{rel1} releases
-the lock, line~\lnref{sync_rcu} waits for all pre-existing RCU read-side critical
-sections to complete, line~\lnref{kfree} frees the newly removed element,
-and line~\lnref{ret_1} indicates success.
-If the element is no longer the one we want, line~\lnref{rel2} releases
-the lock, line~\lnref{rdunlock3} leaves the RCU read-side critical section,
-and line~\lnref{ret_0:b} indicates failure to delete the specified key.
+Otherwise, \clnref{acq} acquires the update-side spinlock, and
+\clnref{chkkey2} then checks that the element is still the one that we want.
+If so, \clnref{rdunlock2} leaves the RCU read-side critical section,
+\clnref{remove} removes it from the table, \clnref{rel1} releases
+the lock, \clnref{sync_rcu} waits for all pre-existing RCU read-side critical
+sections to complete, \clnref{kfree} frees the newly removed element,
+and \clnref{ret_1} indicates success.
+If the element is no longer the one we want, \clnref{rel2} releases
+the lock, \clnref{rdunlock3} leaves the RCU read-side critical section,
+and \clnref{ret_0:b} indicates failure to delete the specified key.
 \end{fcvref}
 
 \QuickQuizSeries{%
 \QuickQuizB{
 	\begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking]
 	Why is it OK to exit the RCU read-side critical section on
-	line~\lnref{rdunlock2} of
-	Listing~\ref{lst:defer:Existence Guarantees Enable Per-Element Locking}
-	before releasing the lock on line~\lnref{rel1}?
+	\clnref{rdunlock2} of
+	\cref{lst:defer:Existence Guarantees Enable Per-Element Locking}
+	before releasing the lock on \clnref{rel1}?
 	\end{fcvref}
 }\QuickQuizAnswerB{
 	\begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking]
-	First, please note that the second check on line~\lnref{chkkey2} is
+	First, please note that the second check on \clnref{chkkey2} is
 	necessary because some other
 	CPU might have removed this element while we were waiting
 	to acquire the lock.
@@ -1101,9 +1100,9 @@ and line~\lnref{ret_0:b} indicates failure to delete the specified key.
 \QuickQuizE{
 	\begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking]
 	Why not exit the RCU read-side critical section on
-	line~\lnref{rdunlock3} of
-	Listing~\ref{lst:defer:Existence Guarantees Enable Per-Element Locking}
-	before releasing the lock on line~\lnref{rel2}?
+	\clnref{rdunlock3} of
+	\cref{lst:defer:Existence Guarantees Enable Per-Element Locking}
+	before releasing the lock on \clnref{rel2}?
 	\end{fcvref}
 }\QuickQuizAnswerE{
 	Suppose we reverse the order of these two lines.
@@ -1112,24 +1111,24 @@ and line~\lnref{ret_0:b} indicates failure to delete the specified key.
 	\begin{enumerate}
 	\begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking]
 	\item	CPU~0 invokes \co{delete()}, and finds the element
-		to be deleted, executing through line~\lnref{rdunlock2}.
+		to be deleted, executing through \clnref{rdunlock2}.
 		It has not yet actually deleted the element, but
 		is about to do so.
 	\item	CPU~1 concurrently invokes \co{delete()}, attempting
 		to delete this same element.
 		However, CPU~0 still holds the lock, so CPU~1 waits
-		for it at line~\lnref{acq}.
-	\item	CPU~0 executes lines~\lnref{remove} and~\lnref{rel1},
-		and blocks at line~\lnref{sync_rcu} waiting for CPU~1
+		for it at \clnref{acq}.
+	\item	CPU~0 executes \clnref{remove,rel1},
+		and blocks at \clnref{sync_rcu} waiting for CPU~1
 		to exit its RCU read-side critical section.
-	\item	CPU~1 now acquires the lock, but the test on line~\lnref{chkkey2}
+	\item	CPU~1 now acquires the lock, but the test on \clnref{chkkey2}
 		fails because CPU~0 has already removed the element.
-		CPU~1 now executes line~\lnref{rel2}
-                (which we switched with line~\lnref{rdunlock3}
+		CPU~1 now executes \clnref{rel2}
+		(which we switched with \clnref{rdunlock3}
 		for the purposes of this Quick Quiz)
 		and exits its RCU read-side critical section.
 	\item	CPU~0 can now return from \co{synchronize_rcu()},
-		and thus executes line~\lnref{kfree}, sending the element to
+		and thus executes \clnref{kfree}, sending the element to
 		the freelist.
 	\item	CPU~1 now attempts to release a lock for an element
 		that has been freed, and, worse yet, possibly
@@ -1143,10 +1142,10 @@ and line~\lnref{ret_0:b} indicates failure to delete the specified key.
 Alert readers will recognize this as only a slight variation on
 the original ``RCU is a way of waiting for things to finish'' theme,
 which is addressed in
-Section~\ref{sec:defer:RCU is a Way of Waiting for Things to Finish}.
+\cref{sec:defer:RCU is a Way of Waiting for Things to Finish}.
 They might also note the deadlock-immunity advantages over the lock-based
 existence guarantees discussed in
-Section~\ref{sec:locking:Lock-Based Existence Guarantees}.
+\cref{sec:locking:Lock-Based Existence Guarantees}.
 
 \subsubsection{RCU Provides Type-Safe Memory}
 \label{sec:defer:RCU Provides Type-Safe Memory}
@@ -1183,7 +1182,7 @@ for the duration of any pre-existing RCU read-side critical sections.
 	during which at least one thread is always in an RCU read-side
 	critical section.
 	However, the key words in the description in
-	Section~\ref{sec:defer:RCU Provides Type-Safe Memory}
+	\cref{sec:defer:RCU Provides Type-Safe Memory}
 	are ``in-use'' and ``pre-existing''.
 	Keep in mind that a given RCU read-side critical section is
 	conceptually only permitted to gain references to data elements
@@ -1234,7 +1233,7 @@ Simpler is after all almost always better!
 \subsubsection{RCU is a Way of Waiting for Things to Finish}
 \label{sec:defer:RCU is a Way of Waiting for Things to Finish}
 
-As noted in Section~\ref{sec:defer:RCU Fundamentals}
+As noted in \cref{sec:defer:RCU Fundamentals}
 an important component
 of RCU is a way of waiting for RCU readers to finish.
 One of
@@ -1273,7 +1272,7 @@ In this example, the \co{timer_stop} function uses
 \co{synchronize_sched()} to ensure that all in-flight NMI
 notifications have completed before freeing the associated resources.
 A simplified version of this code is shown
-Listing~\ref{lst:defer:Using RCU to Wait for NMIs to Finish}.
+\cref{lst:defer:Using RCU to Wait for NMIs to Finish}.
 
 \begin{listing}
 \begin{fcvlabel}[ln:defer:Using RCU to Wait for NMIs to Finish]
@@ -1314,7 +1313,7 @@ void nmi_stop(void)				\lnlbl@nmi_stop:b$
 \begin{fcvref}[ln:defer:Using RCU to Wait for NMIs to Finish:struct]
 \Clnrefrange{b}{e} define a \co{profile_buffer} structure, containing a
 size and an indefinite array of entries.
-Line~\lnref{buf} defines a pointer to a profile buffer, which is
+\Clnref{buf} defines a pointer to a profile buffer, which is
 presumably initialized elsewhere to point to a dynamically allocated
 region of memory.
 \end{fcvref}
@@ -1326,14 +1325,14 @@ As such, it cannot be preempted, nor can it be interrupted by a normal
 interrupts handler, however, it is still subject to delays due to cache misses,
 ECC errors, and cycle stealing by other hardware threads within the same
 core.
-Line~\lnref{rcu_deref} gets a local pointer to the profile buffer using the
+\Clnref{rcu_deref} gets a local pointer to the profile buffer using the
 \co{rcu_dereference()} primitive to ensure memory ordering on
 DEC Alpha, and
-lines~\lnref{if_NULL} and~\lnref{ret:a} exit from this function if there is no
-profile buffer currently allocated, while lines~\lnref{if_oor} and~\lnref{ret:b}
+\clnref{if_NULL,ret:a} exit from this function if there is no
+profile buffer currently allocated, while \clnref{if_oor,ret:b}
 exit from this function if the \co{pcvalue} argument
 is out of range.
-Otherwise, line~\lnref{inc} increments the profile-buffer entry indexed
+Otherwise, \clnref{inc} increments the profile-buffer entry indexed
 by the \co{pcvalue} argument.
 Note that storing the size with the buffer guarantees that the
 range check matches the buffer, even if a large buffer is suddenly
@@ -1344,15 +1343,15 @@ replaced by a smaller one.
 \Clnrefrange{b}{e} define the \co{nmi_stop()} function,
 where the caller is responsible for mutual exclusion (for example,
 holding the correct lock).
-Line~\lnref{fetch} fetches a pointer to the profile buffer, and
-lines~\lnref{if_NULL} and~\lnref{ret} exit the function if there is no buffer.
-Otherwise, line~\lnref{NULL} \co{NULL}s out the profile-buffer pointer
+\Clnref{fetch} fetches a pointer to the profile buffer, and
+\clnref{if_NULL,ret} exit the function if there is no buffer.
+Otherwise, \clnref{NULL} \co{NULL}s out the profile-buffer pointer
 (using the \co{rcu_assign_pointer()} primitive to maintain
 memory ordering on weakly ordered machines),
-and line~\lnref{sync_sched} waits for an RCU Sched grace period to elapse,
+and \clnref{sync_sched} waits for an RCU Sched grace period to elapse,
 in particular, waiting for all non-preemptible regions of code,
 including NMI handlers, to complete.
-Once execution continues at line~\lnref{kfree}, we are guaranteed that
+Once execution continues at \clnref{kfree}, we are guaranteed that
 any instance of \co{nmi_profile()} that obtained a
 pointer to the old buffer has returned.
 It is therefore safe to free the buffer, in this case using the
@@ -1368,7 +1367,7 @@ It is therefore safe to free the buffer, in this case using the
 	in \co{nmi_profile()}, and to replace the
 	\co{synchronize_sched()} with \co{synchronize_rcu()},
 	perhaps as shown in
-	Listing~\ref{lst:defer:Using RCU to Wait for Mythical Preemptible NMIs to Finish}.
+	\cref{lst:defer:Using RCU to Wait for Mythical Preemptible NMIs to Finish}.
 %
 \begin{listing}
 \begin{VerbatimL}
@@ -1446,7 +1445,7 @@ as well as for any of a number of other synchronization primitives.
 \end{figure}
 
 In the meantime,
-Figure~\ref{fig:defer:RCU Areas of Applicability}
+\cref{fig:defer:RCU Areas of Applicability}
 shows some rough rules of thumb on where RCU is most helpful.
 
 As shown in the blue box at the top of the figure, RCU works best if
@@ -1500,7 +1499,7 @@ update-mostly workloads requiring
 consistent data are rarely good places to use RCU, though there are some
 exceptions~\cite{MathieuDesnoyers2012URCU}.
 In addition, as noted in
-Section~\ref{sec:defer:RCU Provides Type-Safe Memory},
+\cref{sec:defer:RCU Provides Type-Safe Memory},
 within the Linux kernel, the \co{SLAB_TYPESAFE_BY_RCU}
 slab-allocator flag provides type-safe memory to RCU readers, which can
 greatly simplify \IXacrl{nbs} and other lockless
diff --git a/defer/updates.tex b/defer/updates.tex
index 6e9ea0de..8b231011 100644
--- a/defer/updates.tex
+++ b/defer/updates.tex
@@ -17,7 +17,7 @@ scalability for writers.
 
 We have already seen one situation featuring high performance and
 scalability for writers, namely the counting algorithms surveyed in
-Chapter~\ref{chp:Counting}.
+\cref{chp:Counting}.
 These algorithms featured partially partitioned data structures so
 that updates can operate locally, while the more-expensive reads
 must sum across the entire data structure.
@@ -35,7 +35,7 @@ of the garbage collector.
 
 And of course, where feasible, fully partitioned or ``sharded'' systems
 provide excellent performance and scalability, as noted in
-Chapter~\ref{cha:Partitioning and Synchronization Design}.
+\cref{cha:Partitioning and Synchronization Design}.
 
 The next chapter will look at updates in the context of several types
 of data structures.
diff --git a/defer/whichtochoose.tex b/defer/whichtochoose.tex
index 8f2921b4..3940c4bc 100644
--- a/defer/whichtochoose.tex
+++ b/defer/whichtochoose.tex
@@ -9,9 +9,9 @@
 	  may be; custom will soon render it easy and agreeable.}
 	  {\emph{Pythagoras}}
 
-Section~\ref{sec:defer:Which to Choose? (Overview)}
+\Cref{sec:defer:Which to Choose? (Overview)}
 provides a high-level overview and then
-Section~\ref{sec:defer:Which to Choose? (Details)}
+\cref{sec:defer:Which to Choose? (Details)}
 provides a more detailed view
 of the differences between the deferred-processing techniques presented
 in this chapter.
@@ -19,7 +19,7 @@ This discussion assumes a linked data structure that is large enough
 that readers do not hold references from one traversal to another,
 and where elements might be added to and removed from the structure
 at any location and at any time.
-Section~\ref{sec:defer:Which to Choose? (Production Use)}
+\Cref{sec:defer:Which to Choose? (Production Use)}
 then points out a few publicly visible production uses of
 hazard pointers, sequence locking, and RCU\@.
 This discussion should help you to make an informed choice between
@@ -74,12 +74,12 @@ these techniques.
 \label{tab:defer:Which Deferred Technique to Choose? (Overview)}
 \end{table*}
 
-Table~\ref{tab:defer:Which Deferred Technique to Choose? (Overview)}
+\Cref{tab:defer:Which Deferred Technique to Choose? (Overview)}
 shows a few high-level properties that distinguish the deferred-reclamation
 techniques from one another.
 
 The ``Readers'' row summarizes the results presented in
-Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR},
+\cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR},
 which shows that all but reference counting are enjoy reasonably
 fast and scalable readers.
 
@@ -135,7 +135,7 @@ run concurrently with any update.
 	Yes they do, but the guarantee only applies unconditionally
 	in cases where a reference is already held.
 	With this in mind, please review the paragraph at the beginning of
-	Section~\ref{sec:defer:Which to Choose?}, especially the part
+	\cref{sec:defer:Which to Choose?}, especially the part
 	saying ``large enough that readers do not hold references from
 	one traversal to another''.
 }\QuickQuizEnd
@@ -252,7 +252,7 @@ But those wishing more detail should continue on to the next section.
 \label{tab:defer:Which Deferred Technique to Choose?  (Details)}
 \end{table*}
 
-Table~\ref{tab:defer:Which Deferred Technique to Choose? (Details)}
+\Cref{tab:defer:Which Deferred Technique to Choose? (Details)}
 provides more-detailed rules of thumb that can help you choose among the
 four deferred-processing techniques presented in this chapter.
 
@@ -293,7 +293,7 @@ be reduced by batching, so that each read-side operation covers more data.
 
 \QuickQuiz{
 	But didn't the answer to one of the quick quizzes in
-	Section~\ref{sec:defer:Hazard Pointers}
+	\cref{sec:defer:Hazard Pointers}
 	say that pairwise asymmetric barriers could eliminate the
 	read-side \co{smp_mb()} from hazard pointers?
 }\QuickQuizAnswer{
@@ -380,7 +380,7 @@ counting is used.
 This means that the complexity of reference-acquisition failure only
 needs to be dealt with for those few data elements:  The bulk of
 the reference acquisitions are unconditional, courtesy of RCU\@.
-See Section~\ref{sec:together:Refurbish Reference Counting}
+See \cref{sec:together:Refurbish Reference Counting}
 for more information on combining reference counting with other
 synchronization mechanisms.
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -perfbook 4/8] datastruct: Employ \cref{} and its variants
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
                   ` (2 preceding siblings ...)
  2021-05-18 12:20 ` [PATCH -perfbook 3/8] defer: Employ \cref{} and its variants, take three Akira Yokosawa
@ 2021-05-18 12:21 ` Akira Yokosawa
  2021-05-18 12:22 ` [PATCH -perfbook 5/8] debugging: " Akira Yokosawa
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:21 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 datastruct/datastruct.tex | 348 +++++++++++++++++++-------------------
 1 file changed, 174 insertions(+), 174 deletions(-)

diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 038f3923..a396b688 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -85,18 +85,18 @@ This section focuses on a single data structure, namely the hash table.
 This focused approach allows a much deeper investigation of how concurrency
 interacts with data structures, and also focuses on a data structure
 that is heavily used in practice.
-Section~\ref{sec:datastruct:Hash-Table Design}
+\Cref{sec:datastruct:Hash-Table Design}
 overviews the design, and
-Section~\ref{sec:datastruct:Hash-Table Implementation}
+\cref{sec:datastruct:Hash-Table Implementation}
 presents the implementation.
 Finally,
-Section~\ref{sec:datastruct:Hash-Table Performance}
+\cref{sec:datastruct:Hash-Table Performance}
 discusses the resulting performance and scalability.
 
 \subsection{Hash-Table Design}
 \label{sec:datastruct:Hash-Table Design}
 
-Chapter~\ref{cha:Partitioning and Synchronization Design}
+\Cref{cha:Partitioning and Synchronization Design}
 emphasized the need to apply partitioning in order to attain
 respectable performance and scalability, so partitionability
 must be a first-class criterion when selecting data structures.
@@ -134,17 +134,17 @@ excellent scalability.
 \label{sec:datastruct:Hash-Table Implementation}
 
 \begin{fcvref}[ln:datastruct:hash_bkt:struct]
-Listing~\ref{lst:datastruct:Hash-Table Data Structures}
+\Cref{lst:datastruct:Hash-Table Data Structures}
 (\path{hash_bkt.c})
 shows a set of data structures used in a simple fixed-sized hash
 table using chaining and per-hash-bucket locking, and
-Figure~\ref{fig:datastruct:Hash-Table Data-Structure Diagram}
+\cref{fig:datastruct:Hash-Table Data-Structure Diagram}
 diagrams how they fit together.
 The \co{hashtab} structure (\clnrefrange{tab:b}{tab:e} in
-Listing~\ref{lst:datastruct:Hash-Table Data Structures})
+\cref{lst:datastruct:Hash-Table Data Structures})
 contains four \co{ht_bucket} structures
 (\clnrefrange{bucket:b}{bucket:e} in
-Listing~\ref{lst:datastruct:Hash-Table Data Structures}),
+\cref{lst:datastruct:Hash-Table Data Structures}),
 with the \co{->ht_nbuckets} field controlling the number of buckets
 and the \co{->ht_cmp} field holding the pointer to key-comparison
 function.
@@ -152,7 +152,7 @@ Each such bucket contains a list header \co{->htb_head} and
 a lock \co{->htb_lock}.
 The list headers chain \co{ht_elem} structures
 (\clnrefrange{elem:b}{elem:e} in
-Listing~\ref{lst:datastruct:Hash-Table Data Structures})
+\cref{lst:datastruct:Hash-Table Data Structures})
 through their
 \co{->hte_next} fields, and each \co{ht_elem} structure also caches
 the corresponding element's hash value in the \co{->hte_hash} field.
@@ -173,13 +173,13 @@ which might contain a complex key.
 \label{fig:datastruct:Hash-Table Data-Structure Diagram}
 \end{figure}
 
-Figure~\ref{fig:datastruct:Hash-Table Data-Structure Diagram}
+\Cref{fig:datastruct:Hash-Table Data-Structure Diagram}
 shows bucket~0 containing two elements and bucket~2 containing one.
 
 \begin{fcvref}[ln:datastruct:hash_bkt:map_lock:map]
-Listing~\ref{lst:datastruct:Hash-Table Mapping and Locking}
+\Cref{lst:datastruct:Hash-Table Mapping and Locking}
 shows mapping and locking functions.
-Lines~\lnref{b} and~\lnref{e}
+\Clnref{b,e}
 show the macro \co{HASH2BKT()}, which maps from a hash value
 to the corresponding \co{ht_bucket} structure.
 This macro uses a simple modulus: if more aggressive hashing is required,
@@ -195,24 +195,24 @@ corresponding to the specified hash value.
 \end{listing}
 
 \begin{fcvref}[ln:datastruct:hash_bkt:lookup]
-Listing~\ref{lst:datastruct:Hash-Table Lookup}
+\Cref{lst:datastruct:Hash-Table Lookup}
 shows \co{hashtab_lookup()},
 which returns a pointer to the element with the specified hash and key if it
 exists, or \co{NULL} otherwise.
 This function takes both a hash value and a pointer to the key because
 this allows users of this function to use arbitrary keys and
 arbitrary hash functions.
-Line~\lnref{map} maps from the hash value to a pointer to the corresponding
+\Clnref{map} maps from the hash value to a pointer to the corresponding
 hash bucket.
 Each pass through the loop spanning
 \clnrefrange{loop:b}{loop:e} examines one element
 of the bucket's hash chain.
-Line~\lnref{hashmatch} checks to see if the hash values match, and if not,
-line~\lnref{next}
+\Clnref{hashmatch} checks to see if the hash values match, and if not,
+\clnref{next}
 proceeds to the next element.
-Line~\lnref{keymatch} checks to see if the actual key matches, and if so,
-line~\lnref{return} returns a pointer to the matching element.
-If no element matches, line~\lnref{ret_NULL} returns \co{NULL}.
+\Clnref{keymatch} checks to see if the actual key matches, and if so,
+\clnref{return} returns a pointer to the matching element.
+If no element matches, \clnref{ret_NULL} returns \co{NULL}.
 \end{fcvref}
 
 \begin{listing}
@@ -225,7 +225,7 @@ If no element matches, line~\lnref{ret_NULL} returns \co{NULL}.
 	\begin{fcvref}[ln:datastruct:hash_bkt:lookup]
 	But isn't the double comparison on
 	\clnrefrange{hashmatch}{return} in
-	Listing~\ref{lst:datastruct:Hash-Table Lookup} inefficient
+	\cref{lst:datastruct:Hash-Table Lookup} inefficient
 	in the case where the key fits into an unsigned long?
 	\end{fcvref}
 }\QuickQuizAnswer{
@@ -244,14 +244,14 @@ If no element matches, line~\lnref{ret_NULL} returns \co{NULL}.
 \label{lst:datastruct:Hash-Table Modification}
 \end{listing}
 
-Listing~\ref{lst:datastruct:Hash-Table Modification}
+\Cref{lst:datastruct:Hash-Table Modification}
 shows the \co{hashtab_add()} and \co{hashtab_del()} functions
 that add and delete elements from the hash table, respectively.
 
 \begin{fcvref}[ln:datastruct:hash_bkt:add_del:add]
 The \co{hashtab_add()} function simply sets the element's hash
-value on line~\lnref{set}, then adds it to the corresponding bucket on
-lines~\lnref{add:b} and~\lnref{add:e}.
+value on \clnref{set}, then adds it to the corresponding bucket on
+\clnref{add:b,add:e}.
 \end{fcvref}
 The \co{hashtab_del()} function simply removes the specified element
 from whatever hash chain it is on, courtesy of the doubly linked
@@ -267,22 +267,22 @@ or modifying this same bucket, for example, by invoking
 \label{lst:datastruct:Hash-Table Allocation and Free}
 \end{listing}
 
-Listing~\ref{lst:datastruct:Hash-Table Allocation and Free}
+\Cref{lst:datastruct:Hash-Table Allocation and Free}
 shows \co{hashtab_alloc()} and \co{hashtab_free()},
 which do hash-table allocation and freeing, respectively.
 \begin{fcvref}[ln:datastruct:hash_bkt:alloc_free:alloc]
 Allocation begins on
 \clnrefrange{alloc:b}{alloc:e} with allocation of the underlying memory.
-If line~\lnref{chk_NULL} detects that memory has been exhausted,
-line~\lnref{ret_NULL} returns
+If \clnref{chk_NULL} detects that memory has been exhausted,
+\clnref{ret_NULL} returns
 \co{NULL} to the caller.
-Otherwise, lines~\lnref{set_nbck} and~\lnref{set_cmp} initialize
+Otherwise, \clnref{set_nbck,set_cmp} initialize
 the number of buckets and the pointer to key-comparison function,
 and the loop
 spanning \clnrefrange{loop:b}{loop:e} initializes the buckets themselves,
 including the chain list header on
-line~\lnref{init_head} and the lock on line~\lnref{init_lock}.
-Finally, line~\lnref{return} returns a pointer to the newly allocated hash table.
+\clnref{init_head} and the lock on \clnref{init_lock}.
+Finally, \clnref{return} returns a pointer to the newly allocated hash table.
 \end{fcvref}
 \begin{fcvref}[ln:datastruct:hash_bkt:alloc_free:free]
 The \co{hashtab_free()} function on
@@ -302,7 +302,7 @@ The \co{hashtab_free()} function on
 The performance results for a single 28-core socket of a 2.1\,GHz
 Intel Xeon system using a bucket-locked hash table
 with 262,144 buckets are shown in
-Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo}.
+\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo}.
 The performance does scale nearly linearly, but it falls a far short
 of the ideal performance level, even at only 28~CPUs.
 Part of this shortfall is due to the fact that the lock acquisitions and
@@ -317,7 +317,7 @@ on two or more CPUs.
 \end{figure}
 
 And things only get worse with more CPUs, as can be seen in
-Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; 448 CPUs}.
+\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; 448 CPUs}.
 We do not need to show ideal performance: The performance for 29~CPUs
 and beyond is all too clearly worse than abysmal.
 This clearly underscores the dangers of extrapolating performance from a
@@ -349,7 +349,7 @@ We can test this by increasing the number of hash buckets.
 \end{figure}
 
 However, as can be seen in
-Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets},
+\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets},
 changing the number of buckets has almost no effect:
 Scalability is still abysmal.
 In particular, we still see a sharp dropoff at 29~CPUs and beyond.
@@ -357,13 +357,13 @@ Clearly something else is going on.
 
 The problem is that this is a multi-socket system, with CPUs~0--27
 and~225--251 mapped to the first socket as shown in
-Figure~\ref{fig:datastruct:NUMA Topology of System Under Test}.
+\cref{fig:datastruct:NUMA Topology of System Under Test}.
 Test runs confined to the first 28~CPUs therefore perform quite
 well, but tests that involve socket~0's CPUs~0--27 as well as
 socket~1's CPU~28 incur the overhead of passing data across
 socket boundaries.
 This can severely degrade performance, as was discussed in
-Section~\ref{sec:cpu:Hardware System Architecture}.
+\cref{sec:cpu:Hardware System Architecture}.
 In short, large multi-socket systems require good locality of reference
 in addition to full partitioning.
 The remainder of this chapter will discuss ways of providing good
@@ -458,7 +458,7 @@ the need for read-side synchronization can degrade performance in
 read-mostly situations.
 However, we can achieve both performance and scalability by using
 RCU, which was introduced in
-Section~\ref{sec:defer:Read-Copy Update (RCU)}.
+\cref{sec:defer:Read-Copy Update (RCU)}.
 Similar results can be achieved using hazard pointers
 (\path{hazptr.c})~\cite{MagedMichael04a}, which will be included in
 the performance results shown in this
@@ -469,17 +469,17 @@ section~\cite{McKenney:2013:SDS:2483852.2483867}.
 
 For an RCU-protected hash table with per-bucket locking,
 updaters use locking as shown in
-Section~\ref{sec:datastruct:Partitionable Data Structures},
+\cref{sec:datastruct:Partitionable Data Structures},
 but readers use RCU\@.
 The data structures remain as shown in
-Listing~\ref{lst:datastruct:Hash-Table Data Structures},
+\cref{lst:datastruct:Hash-Table Data Structures},
 and the \co{HASH2BKT()}, \co{hashtab_lock()}, and \co{hashtab_unlock()}
 functions remain as shown in
-Listing~\ref{lst:datastruct:Hash-Table Mapping and Locking}.
+\cref{lst:datastruct:Hash-Table Mapping and Locking}.
 However, readers use the lighter-weight concurrency-control embodied
 by \co{hashtab_lock_lookup()} and \co{hashtab_unlock_lookup()}
 shown in
-Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control}.
+\cref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control}.
 
 \begin{listing}
 \input{CodeSamples/datastruct/hash/hash_bkt_rcu@lock_unlock.fcv}
@@ -487,11 +487,11 @@ Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Contr
 \label{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control}
 \end{listing}
 
-Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Lookup}
+\Cref{lst:datastruct:RCU-Protected Hash-Table Lookup}
 shows \co{hashtab_lookup()} for the RCU-protected per-bucket-locked
 hash table.
 This is identical to that in
-Listing~\ref{lst:datastruct:Hash-Table Lookup}
+\cref{lst:datastruct:Hash-Table Lookup}
 except that \co{cds_list_for_each_entry()} is replaced
 by \co{cds_list_for_each_entry_rcu()}.
 Both of these primitives traverse the hash chain referenced
@@ -539,11 +539,11 @@ RCU read-side critical section, for example, the caller must invoke
 \label{lst:datastruct:RCU-Protected Hash-Table Modification}
 \end{listing}
 
-Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Modification}
+\Cref{lst:datastruct:RCU-Protected Hash-Table Modification}
 shows \co{hashtab_add()} and \co{hashtab_del()}, both of which
 are quite similar to their counterparts in the non-RCU hash table
 shown in
-Listing~\ref{lst:datastruct:Hash-Table Modification}.
+\cref{lst:datastruct:Hash-Table Modification}.
 The \co{hashtab_add()} function uses \co{cds_list_add_rcu()} instead
 of \co{cds_list_add()} in order to ensure proper ordering when
 an element is added to the hash table at the same time that it is
@@ -569,7 +569,7 @@ freeing or otherwise reusing the memory for the newly deleted element.
 \label{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
 \end{figure}
 
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
+\Cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
 shows the read-only performance of RCU-protected and hazard-pointer-protected
 hash tables against the previous section's per-bucket-locked implementation.
 As you can see, both RCU and hazard pointers perform and scale
@@ -587,7 +587,7 @@ RCU does slightly better than hazard pointers.
 \label{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo; Linear Scale}
 \end{figure}
 
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo; Linear Scale}
+\Cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo; Linear Scale}
 shows the same data on a linear scale.
 This drops the global-locking trace into the x-axis, but allows the
 non-ideal performance of RCU and hazard pointers to be more readily
@@ -621,7 +621,7 @@ advantage depends on the workload.
 But why is RCU's performance a factor of five less than ideal?
 One possibility is that the per-thread counters manipulated by
 \co{rcu_read_lock()} and \co{rcu_read_unlock()} are slowing things down.
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR; Linear Scale}
+\Cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR; Linear Scale}
 therefore adds the results for the QSBR variant of RCU, whose read-side
 primitives do nothing.
 And although QSBR does perform slightly better than does RCU, it is still
@@ -634,7 +634,7 @@ about a factor of five short of ideal.
 \label{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
 \end{figure}
 
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
+\Cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
 adds completely unsynchronized results, which works because this
 is a read-only benchmark with nothing to synchronize.
 Even with no synchronization whatsoever, performance still falls far
@@ -642,12 +642,12 @@ short of ideal.
 
 The problem is that this system has sockets with 28 cores, which have
 the modest cache sizes shown in
-Figure~\ref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}
-on page~\pageref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}.
+\cref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}
+on \cpageref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}.
 Each hash bucket (\co{struct ht_bucket}) occupies 56~bytes and each
 element (\co{struct zoo_he}) occupies 72~bytes for the RCU and QSBR runs.
 The benchmark generating
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
+\cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
 used 262,144 buckets and up to 262,144 elements, for a total of
 33,554,448~bytes, which not only overflows the 1,048,576-byte L2 caches
 by more than a factor of thirty, but is also uncomfortably close to the
@@ -681,8 +681,8 @@ to about half again faster than that of either QSBR or RCU\@.
 \QuickQuiz{
 	How can we be so sure that the hash-table size is at fault here,
 	especially given that
-	Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
-	on page~\pageref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
+	\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
+	on \cpageref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
 	shows that varying hash-table size has almost
 	no effect?
 	Might the problem instead be something like false sharing?
@@ -703,7 +703,7 @@ to about half again faster than that of either QSBR or RCU\@.
 
 	Still unconvinced?
 	Then look at the log-log plot in
-	Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
+	\cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
 	which shows performance for 448 CPUs as a function of the
 	hash-table size, that is, number of buckets and maximum number
 	of elements.
@@ -719,7 +719,7 @@ to about half again faster than that of either QSBR or RCU\@.
 	This near-ideal performance is consistent with that for the
 	pre-BSD routing table shown in
 	\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
-	on page~\pageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
+	on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
 	even at 448 CPUs.
 	However, the performance drops significantly (this is a log-log
 	plot) at about 8,000~elements, which is where the 1,048,576-byte
@@ -734,8 +734,8 @@ to about half again faster than that of either QSBR or RCU\@.
 	a factor of 25.
 
 	The reason that
-	Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
-	on page~\pageref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
+	\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
+	on \cpageref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
 	shows little effect is that its data was gathered from
 	bucket-locked hash tables, where locking overhead and contention
 	drowned out cache-capacity effects.
@@ -757,7 +757,7 @@ to about half again faster than that of either QSBR or RCU\@.
 
 What if the memory footprint is reduced still further?
 \Cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR}
-on page~\pageref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR With Non-Initial rcu-head}
+on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR With Non-Initial rcu-head}
 shows that RCU attains very nearly ideal performance on the much smaller
 data structure represented by the pre-BSD routing table.
 
@@ -799,7 +799,7 @@ data structure represented by the pre-BSD routing table.
 As noted earlier, Schr\"odinger is surprised by the popularity of his
 cat~\cite{ErwinSchroedinger1935Cat}, but recognizes the need to reflect
 this popularity in his design.
-Figure~\ref{fig:datastruct:Read-Side Cat-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo at 64 CPUs}
+\Cref{fig:datastruct:Read-Side Cat-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo at 64 CPUs}
 shows the results of 64-CPU runs, varying the number of CPUs that are
 doing nothing but looking up the cat.
 Both RCU and hazard pointers respond well to this challenge, but bucket
@@ -830,7 +830,7 @@ in point.
 
 If we were only ever going to read the data, we would not need any
 concurrency control to begin with.
-Figure~\ref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
+\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
 therefore shows the effect of updates on readers.
 At the extreme left-hand side of this graph, all but one of the CPUs
 are doing lookups, while to the right all 448 CPUs are doing updates.
@@ -854,9 +854,9 @@ execution, greatly reducing memory-barrier overhead in the read-only case.
 \end{figure}
 
 Where
-Figure~\ref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
+\cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
 showed the effect of increasing update rates on lookups,
-Figure~\ref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
+\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
 shows the effect of increasing update rates on the updates themselves.
 Again, at the left-hand side of the figure all but one of the CPUs are
 doing lookups and at the right-hand side of the figure all 448 CPUs are
@@ -885,7 +885,7 @@ not recommended for production use.
 \QuickQuiz{
 	The dangers of extrapolating from 28 CPUs to 448 CPUs was
 	made quite clear in
-	Section~\ref{sec:datastruct:Hash-Table Performance}.
+	\cref{sec:datastruct:Hash-Table Performance}.
 	But why should extrapolating up from 448 CPUs be any safer?
 }\QuickQuizAnswer{
 	In theory, it isn't any safer, and a useful exercise would be
@@ -943,7 +943,7 @@ minute.
 In this case, the two veterinarians would disagree on the state of the
 cat for the second period of thirty seconds following the last heartbeat,
 as fancifully depicted in
-Figure~\ref{fig:datastruct:Even Veterinarians Disagree}.
+\cref{fig:datastruct:Even Veterinarians Disagree}.
 
 \pplsur{Weiner}{Heisenberg} taught us to live with this sort of
 uncertainty~\cite{WeinerHeisenberg1927Uncertain}, which is a good
@@ -959,9 +959,9 @@ Furthermore, most computing systems are intended to interact with
 the outside world.
 Consistency with the outside world is therefore of paramount importance.
 However, as we saw in
-Figure~\ref{fig:defer:Response Time of RCU vs. Reader-Writer Locking}
+\cref{fig:defer:Response Time of RCU vs. Reader-Writer Locking}
 on
-page~\pageref{fig:defer:Response Time of RCU vs. Reader-Writer Locking},
+\cpageref{fig:defer:Response Time of RCU vs. Reader-Writer Locking},
 increased internal consistency can come at the expense of degraded
 external consistency.
 Techniques such as RCU and hazard pointers give up some degree of
@@ -990,7 +990,7 @@ or all of the above.
 Fixed-size hash tables are perfectly partitionable, but resizable hash
 tables pose partitioning challenges when growing or shrinking, as
 fancifully depicted in
-Figure~\ref{fig:datastruct:Partitioning Problems}.
+\cref{fig:datastruct:Partitioning Problems}.
 However, it turns out that it is possible to construct high-performance
 scalable RCU-protected hash tables, as described in the following sections.
 
@@ -1004,7 +1004,7 @@ The first (and simplest) was developed for the Linux kernel by
 \ppl{Herbert}{Xu}~\cite{HerbertXu2010RCUResizeHash}, and is described in the
 following sections.
 The other two are covered briefly in
-Section~\ref{sec:datastruct:Other Resizable Hash Tables}.
+\cref{sec:datastruct:Other Resizable Hash Tables}.
 
 The key insight behind the first hash-table implementation is that
 each data element can have two sets of list pointers, with one set
@@ -1073,7 +1073,7 @@ which is the subject of the next section.
 Resizing is accomplished by the classic approach of inserting a level
 of indirection, in this case, the \co{ht} structure shown on
 \clnrefrange{ht:b}{ht:e} of
-Listing~\ref{lst:datastruct:Resizable Hash-Table Data Structures}
+\cref{lst:datastruct:Resizable Hash-Table Data Structures}
 (\path{hash_resize.c}).
 The \co{hashtab} structure shown on
 \clnrefrange{hashtab:b}{hashtab:e} contains only a
@@ -1092,17 +1092,17 @@ we should be able to make good use of RCU\@.
 \end{listing}
 
 The \co{ht} structure represents a specific size of the hash table,
-as specified by the \co{->ht_nbuckets} field on line~\lnref{ht:nbuckets}.
+as specified by the \co{->ht_nbuckets} field on \clnref{ht:nbuckets}.
 The size is stored in the same structure containing the array of
 buckets (\co{->ht_bkt[]} on
-line~\lnref{ht:bkt}) in order to avoid mismatches between
+\clnref{ht:bkt}) in order to avoid mismatches between
 the size and the array.
 The \co{->ht_resize_cur} field on
-line~\lnref{ht:resize_cur} is equal to $-1$ unless a resize
+\clnref{ht:resize_cur} is equal to $-1$ unless a resize
 operation
 is in progress, in which case it indicates the index of the bucket whose
 elements are being inserted into the new hash table, which is referenced
-by the \co{->ht_new} field on line~\lnref{ht:new}.
+by the \co{->ht_new} field on \clnref{ht:new}.
 If there is no resize operation in progress, \co{->ht_new} is \co{NULL}.
 Thus, a resize operation proceeds by allocating a new \co{ht} structure
 and referencing it via the \co{->ht_new} pointer, then advancing
@@ -1113,10 +1113,10 @@ Once all old readers have completed, the old hash table's \co{ht} structure
 may be freed.
 
 The \co{->ht_idx} field on
-line~\lnref{ht:idx} indicates which of the two sets of
+\clnref{ht:idx} indicates which of the two sets of
 list pointers are being used by this instantiation of the hash table,
 and is used to index the \co{->hte_next[]} array in the \co{ht_elem}
-structure on line~\lnref{ht_elem:next}.
+structure on \clnref{ht_elem:next}.
 
 The \co{->ht_cmp()}, \co{->ht_gethash()}, and \co{->ht_getkey()} fields on
 \clnrefrange{ht:cmp}{ht:getkey}
@@ -1158,7 +1158,7 @@ the old table.
 
 \begin{fcvref}[ln:datastruct:hash_resize:get_bucket]
 Bucket selection is shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
+\cref{lst:datastruct:Resizable Hash-Table Bucket Selection},
 which shows \co{ht_get_bucket()} on
 \clnrefrange{single:b}{single:e} and \co{ht_search_bucket()} on
 \clnrefrange{hsb:b}{hsb:e}.
@@ -1167,26 +1167,26 @@ corresponding to the specified key in the specified hash table, without
 making any allowances for resizing.
 It also stores the bucket index corresponding to the key into the location
 referenced by parameter~\co{b} on
-line~\lnref{single:gethash}, and the corresponding
+\clnref{single:gethash}, and the corresponding
 hash value corresponding to the key into the location
-referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
-Line~\lnref{single:return} then returns a reference to the corresponding bucket.
+referenced by parameter~\co{h} (if non-\co{NULL}) on \clnref{single:h}.
+\Clnref{single:return} then returns a reference to the corresponding bucket.
 
 The \co{ht_search_bucket()} function searches for the specified key
 within the specified hash-table version.
-Line~\lnref{hsb:get_curbkt} obtains a reference to the bucket corresponding
+\Clnref{hsb:get_curbkt} obtains a reference to the bucket corresponding
 to the specified key.
 The loop spanning \clnrefrange{hsb:loop:b}{hsb:loop:e} searches
-that bucket, so that if line~\lnref{hsb:match} detects a match,
-line~\lnref{hsb:ret_match} returns a pointer to the enclosing data element.
+that bucket, so that if \clnref{hsb:match} detects a match,
+\clnref{hsb:ret_match} returns a pointer to the enclosing data element.
 Otherwise, if there is no match,
-line~\lnref{hsb:ret_NULL} returns \co{NULL} to indicate
+\clnref{hsb:ret_NULL} returns \co{NULL} to indicate
 failure.
 \end{fcvref}
 
 \QuickQuiz{
 	How does the code in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
+	\cref{lst:datastruct:Resizable Hash-Table Bucket Selection}
 	protect against the resizing process progressing past the
 	selected bucket?
 }\QuickQuizAnswer{
@@ -1206,33 +1206,33 @@ operation.
 \end{listing}
 
 Read-side concurrency control is provided by RCU as was shown in
-Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control},
+\cref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control},
 but the update-side concurrency\-/control functions
 \co{hashtab_lock_mod()} and \co{hashtab_unlock_mod()}
 must now deal with the possibility of a
 concurrent resize operation as shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control}.
+\cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control}.
 
 \begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod:l]
 The \co{hashtab_lock_mod()} spans
 \clnrefrange{b}{e} in the listing.
-Line~\lnref{rcu_lock} enters an RCU read-side critical section to prevent
+\Clnref{rcu_lock} enters an RCU read-side critical section to prevent
 the data structures from being freed during the traversal,
-line~\lnref{refhashtbl} acquires a reference to the current hash table, and then
-line~\lnref{refbucket} obtains a reference to the bucket in this hash table
+\clnref{refhashtbl} acquires a reference to the current hash table, and then
+\clnref{refbucket} obtains a reference to the bucket in this hash table
 corresponding to the key.
-Line~\lnref{acq_bucket} acquires that bucket's lock, which will prevent any concurrent
+\Clnref{acq_bucket} acquires that bucket's lock, which will prevent any concurrent
 resizing operation from distributing that bucket, though of course it
 will have no effect if that bucket has already been distributed.
 \Clnrefrange{lsp0b}{lsp0e} store the bucket pointer and
 pointer-set index into their respective fields in the
 \co{ht_lock_state} structure, which communicates the information to
 \co{hashtab_add()}, \co{hashtab_del()}, and \co{hashtab_unlock_mod()}.
-Line~\lnref{ifresized} then checks to see if a concurrent resize
+\Clnref{ifresized} then checks to see if a concurrent resize
 operation has already distributed this bucket across the new hash table,
-and if not, line~\lnref{lsp1_1} indicates that there is no
+and if not, \clnref{lsp1_1} indicates that there is no
 already-resized hash bucket and
-line~\lnref{fastret1} returns with the selected hash bucket's
+\clnref{fastret1} returns with the selected hash bucket's
 lock held (thus preventing a concurrent resize operation from distributing
 this bucket) and also within an RCU read-side critical section.
 \IX{Deadlock} is avoided because the old table's locks are always acquired
@@ -1241,9 +1241,9 @@ than two versions from existing at a given time, thus preventing a
 deadlock cycle.
 
 Otherwise, a concurrent resize operation has already distributed this
-bucket, so line~\lnref{new_hashtbl} proceeds to the new hash table,
-line~\lnref{get_newbkt} selects the bucket corresponding to the key,
-and line~\lnref{acq_newbkt} acquires the bucket's lock.
+bucket, so \clnref{new_hashtbl} proceeds to the new hash table,
+\clnref{get_newbkt} selects the bucket corresponding to the key,
+and \clnref{acq_newbkt} acquires the bucket's lock.
 \Clnrefrange{lsp1b}{lsp1e} store the bucket pointer and
 pointer-set index into their respective fields in the
 \co{ht_lock_state} structure, which again communicates this information to
@@ -1262,11 +1262,11 @@ section.
 \begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod:ul]
 The \co{hashtab_unlock_mod()} function releases the lock(s) acquired by
 \co{hashtab_lock_mod()}.
-Line~\lnref{relbkt0} releases the lock on the old \co{ht_bucket} structure.
-In the unlikely event that line~\lnref{ifbkt1} determines that a resize
-operation is in progress, line~\lnref{relbkt1} releases the lock on the
+\Clnref{relbkt0} releases the lock on the old \co{ht_bucket} structure.
+In the unlikely event that \clnref{ifbkt1} determines that a resize
+operation is in progress, \clnref{relbkt1} releases the lock on the
 new \co{ht_bucket} structure.
-Either way, line~\lnref{rcu_unlock} exits the RCU read-side critical
+Either way, \clnref{rcu_unlock} exits the RCU read-side critical
 section.
 \end{fcvref}
 
@@ -1298,23 +1298,23 @@ Now that we have bucket selection and concurrency control in place,
 we are ready to search and update our resizable hash table.
 The \co{hashtab_lookup()}, \co{hashtab_add()}, and \co{hashtab_del()}
 functions are shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}.
+\cref{lst:datastruct:Resizable Hash-Table Access Functions}.
 
 \begin{fcvref}[ln:datastruct:hash_resize:access:lkp]
 The \co{hashtab_lookup()} function on
 \clnrefrange{b}{e} of the listing does
 hash lookups.
-Line~\lnref{get_curtbl} fetches the current hash table and
-line~\lnref{get_curbkt} searches the bucket corresponding to the
+\Clnref{get_curtbl} fetches the current hash table and
+\clnref{get_curbkt} searches the bucket corresponding to the
 specified key.
-Line~\lnref{ret} returns a pointer to the searched-for element
+\Clnref{ret} returns a pointer to the searched-for element
 or \co{NULL} when the search fails.
 The caller must be within an RCU read-side critical section.
 \end{fcvref}
 
 \QuickQuiz{
 	The \co{hashtab_lookup()} function in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
+	\cref{lst:datastruct:Resizable Hash-Table Access Functions}
 	ignores concurrent resize operations.
 	Doesn't this mean that readers might miss an element that was
 	previously added during a resize operation?
@@ -1329,12 +1329,12 @@ The caller must be within an RCU read-side critical section.
 \begin{fcvref}[ln:datastruct:hash_resize:access:add]
 The \co{hashtab_add()} function on \clnrefrange{b}{e} of the listing adds
 new data elements to the hash table.
-Line~\lnref{htbp} picks up the current \co{ht_bucket} structure into which the
-new element is to be added, and line~\lnref{i} picks up the index of
+\Clnref{htbp} picks up the current \co{ht_bucket} structure into which the
+new element is to be added, and \clnref{i} picks up the index of
 the pointer pair.
-Line~\lnref{add} adds the new element to the current hash bucket.
-If line~\lnref{ifnew} determines that this bucket has been distributed
-to a new version of the hash table, then line~\lnref{addnew} also adds the
+\Clnref{add} adds the new element to the current hash bucket.
+If \clnref{ifnew} determines that this bucket has been distributed
+to a new version of the hash table, then \clnref{addnew} also adds the
 new element to the corresponding new bucket.
 The caller is required to handle concurrency, for example, by invoking
 \co{hashtab_lock_mod()} before the call to \co{hashtab_add()} and invoking
@@ -1345,10 +1345,10 @@ The caller is required to handle concurrency, for example, by invoking
 The \co{hashtab_del()} function on
 \clnrefrange{b}{e} of the listing removes
 an existing element from the hash table.
-Line~\lnref{i} picks up the index of the pointer pair
-and line~\lnref{del} removes the specified element from the current table.
-If line~\lnref{ifnew} determines that this bucket has been distributed
-to a new version of the hash table, then line~\lnref{delnew} also removes
+\Clnref{i} picks up the index of the pointer pair
+and \clnref{del} removes the specified element from the current table.
+If \clnref{ifnew} determines that this bucket has been distributed
+to a new version of the hash table, then \clnref{delnew} also removes
 the specified element from the corresponding new bucket.
 As with \co{hashtab_add()}, the caller is responsible for concurrency
 control and this concurrency control suffices for synchronizing with
@@ -1357,7 +1357,7 @@ a concurrent resize operation.
 
 \QuickQuiz{
 	The \co{hashtab_add()} and \co{hashtab_del()} functions in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
+	\cref{lst:datastruct:Resizable Hash-Table Access Functions}
 	can update two hash buckets while a resize operation is progressing.
 	This might cause poor performance if the frequency of resize operation
 	is not negligible.
@@ -1366,8 +1366,8 @@ a concurrent resize operation.
 	Yes, at least assuming that a slight increase in the cost of
 	\co{hashtab_lookup()} is acceptable.
 	One approach is shown in
-	Listings~\ref{lst:datastruct:Resizable Hash-Table Access Functions (Fewer Updates)}
-	and~\ref{lst:datastruct:Resizable Hash-Table Update-Side Locking Function (Fewer Updates)}
+	\cref{lst:datastruct:Resizable Hash-Table Access Functions (Fewer Updates),%
+	lst:datastruct:Resizable Hash-Table Update-Side Locking Function (Fewer Updates)}
 	(\path{hash_resize_s.c}).
 
 \begin{listing}
@@ -1407,42 +1407,42 @@ a concurrent resize operation.
 
 \begin{fcvref}[ln:datastruct:hash_resize:resize]
 The actual resizing itself is carried out by \co{hashtab_resize}, shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Resizing} on
-page~\pageref{lst:datastruct:Resizable Hash-Table Resizing}.
-Line~\lnref{trylock} conditionally acquires the top-level \co{->ht_lock}, and if
-this acquisition fails, line~\lnref{ret_busy} returns \co{-EBUSY} to indicate that
+\cref{lst:datastruct:Resizable Hash-Table Resizing} on
+\cpageref{lst:datastruct:Resizable Hash-Table Resizing}.
+\Clnref{trylock} conditionally acquires the top-level \co{->ht_lock}, and if
+this acquisition fails, \clnref{ret_busy} returns \co{-EBUSY} to indicate that
 a resize is already in progress.
-Otherwise, line~\lnref{get_curtbl} picks up a reference to the current hash table,
+Otherwise, \clnref{get_curtbl} picks up a reference to the current hash table,
 and \clnrefrange{alloc:b}{alloc:e} allocate a new hash table of the desired size.
 If a new set of hash/key functions have been specified, these are
 used for the new table, otherwise those of the old table are preserved.
-If line~\lnref{chk_nomem} detects memory-allocation failure,
-line~\lnref{rel_nomem} releases \co{->ht_lock}
-and line~\lnref{ret_nomem} returns a failure indication.
+If \clnref{chk_nomem} detects memory-allocation failure,
+\clnref{rel_nomem} releases \co{->ht_lock}
+and \clnref{ret_nomem} returns a failure indication.
 
-Line~\lnref{get_curidx} picks up the current table's index and
-line~\lnref{put_curidx} stores its inverse to
+\Clnref{get_curidx} picks up the current table's index and
+\clnref{put_curidx} stores its inverse to
 the new hash table, thus ensuring that the two hash tables avoid overwriting
 each other's linked lists.
-Line~\lnref{set_newtbl} then starts the bucket-distribution process by
+\Clnref{set_newtbl} then starts the bucket-distribution process by
 installing a reference to the new table into the \co{->ht_new} field of
 the old table.
-Line~\lnref{sync_rcu} ensures that all readers who are not aware of the
+\Clnref{sync_rcu} ensures that all readers who are not aware of the
 new table complete before the resize operation continues.
 
 Each pass through the loop spanning \clnrefrange{loop:b}{loop:e} distributes the contents
 of one of the old hash table's buckets into the new hash table.
-Line~\lnref{get_oldcur} picks up a reference to the old table's current bucket
-and line~\lnref{acq_oldcur} acquires that bucket's spinlock.
+\Clnref{get_oldcur} picks up a reference to the old table's current bucket
+and \clnref{acq_oldcur} acquires that bucket's spinlock.
 \end{fcvref}
 
 \QuickQuiz{
 	\begin{fcvref}[ln:datastruct:hash_resize:resize]
 	In the \co{hashtab_resize()} function in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Resizing},
-	what guarantees that the update to \co{->ht_new} on line~\lnref{set_newtbl}
+	\cref{lst:datastruct:Resizable Hash-Table Resizing},
+	what guarantees that the update to \co{->ht_new} on \clnref{set_newtbl}
 	will be seen as happening before the update to \co{->ht_resize_cur}
-	on line~\lnref{update_resize} from the perspective of
+	on \clnref{update_resize} from the perspective of
 	\co{hashtab_add()} and \co{hashtab_del()}?
 	In other words, what prevents \co{hashtab_add()}
 	and \co{hashtab_del()} from dereferencing
@@ -1450,12 +1450,12 @@ and line~\lnref{acq_oldcur} acquires that bucket's spinlock.
 	\end{fcvref}
 }\QuickQuizAnswer{
 	\begin{fcvref}[ln:datastruct:hash_resize:resize]
-	The \co{synchronize_rcu()} on line~\lnref{sync_rcu} of
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Resizing}
+	The \co{synchronize_rcu()} on \clnref{sync_rcu} of
+	\cref{lst:datastruct:Resizable Hash-Table Resizing}
 	ensures that all pre-existing RCU readers have completed between
 	the time that we install the new hash-table reference on
-	line~\lnref{set_newtbl} and the time that we update \co{->ht_resize_cur} on
-	line~\lnref{update_resize}.
+	\clnref{set_newtbl} and the time that we update \co{->ht_resize_cur} on
+	\clnref{update_resize}.
 	This means that any reader that sees a non-negative value
 	of \co{->ht_resize_cur} cannot have started before the
 	assignment to \co{->ht_new}, and thus must be able to see
@@ -1465,7 +1465,7 @@ and line~\lnref{acq_oldcur} acquires that bucket's spinlock.
 	\co{hashtab_del()} functions must be enclosed
 	in RCU read-side critical sections, courtesy of
 	\co{hashtab_lock_mod()} and \co{hashtab_unlock_mod()} in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency
+	\cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency
 	Control}.
 	\end{fcvref}
 }\QuickQuizEnd
@@ -1475,30 +1475,30 @@ Each pass through the loop spanning
 \clnrefrange{loop_list:b}{loop_list:e} adds one data element
 from the current old-table bucket to the corresponding new-table bucket,
 holding the new-table bucket's lock during the add operation.
-Line~\lnref{update_resize} updates
+\Clnref{update_resize} updates
 \co{->ht_resize_cur} to indicate that this bucket has been distributed.
-Finally, line~\lnref{rel_oldcur} releases the old-table bucket lock.
+Finally, \clnref{rel_oldcur} releases the old-table bucket lock.
 
-Execution reaches line~\lnref{rcu_assign} once all old-table buckets have been distributed
+Execution reaches \clnref{rcu_assign} once all old-table buckets have been distributed
 across the new table.
-Line~\lnref{rcu_assign} installs the newly created table as the current one, and
-line~\lnref{sync_rcu_2} waits for all old readers (who might still be referencing
+\Clnref{rcu_assign} installs the newly created table as the current one, and
+\clnref{sync_rcu_2} waits for all old readers (who might still be referencing
 the old table) to complete.
-Then line~\lnref{rel_master} releases the resize-serialization lock,
-line~\lnref{free} frees
-the old hash table, and finally line~\lnref{ret_success} returns success.
+Then \clnref{rel_master} releases the resize-serialization lock,
+\clnref{free} frees
+the old hash table, and finally \clnref{ret_success} returns success.
 \end{fcvref}
 
 \QuickQuiz{
 	\begin{fcvref}[ln:datastruct:hash_resize:resize]
-	Why is there a \co{WRITE_ONCE()} on line~\lnref{update_resize}
-	in Listing~\ref{lst:datastruct:Resizable Hash-Table Resizing}?
+	Why is there a \co{WRITE_ONCE()} on \clnref{update_resize}
+	in \cref{lst:datastruct:Resizable Hash-Table Resizing}?
 	\end{fcvref}
 }\QuickQuizAnswer{
 	\begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod]
 	Together with the \co{READ_ONCE()}
-	on line~\lnref{l:ifresized} in \co{hashtab_lock_mod()}
-	of Listing~\ref{lst:datastruct:Resizable Hash-Table Update-Side
+	on \clnref{l:ifresized} in \co{hashtab_lock_mod()}
+	of \cref{lst:datastruct:Resizable Hash-Table Update-Side
 	Concurrency Control},
 	it tells the compiler that the non-initialization accesses
 	to \co{->ht_resize_cur} must remain because reads
@@ -1518,7 +1518,7 @@ the old hash table, and finally line~\lnref{ret_success} returns success.
 \end{figure}
 % Data from CodeSamples/datastruct/hash/data/hps.resize.2020.09.05a
 
-Figure~\ref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
+\Cref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
 compares resizing hash tables to their fixed-sized counterparts
 for 262,144 and 2,097,152 elements in the hash table.
 The figure shows three traces for each element count, one
@@ -1549,7 +1549,7 @@ can only be expected to produce a sharp decrease in performance,
 as in fact is shown in the graph.
 But worse yet, the hash-table elements occupy 128\,MB, which overflows
 each socket's 39\,MB L3 cache, with performance consequences analogous
-to those described in Section~\ref{sec:cpu:Costs of Operations}.
+to those described in \cref{sec:cpu:Costs of Operations}.
 The resulting cache overflow means that the memory system is involved
 even for a read-only benchmark, and as you can see from the sublinear
 portions of the lower three traces, the memory system can be a serious
@@ -1558,7 +1558,7 @@ bottleneck.
 \QuickQuiz{
 	How much of the difference in performance between the large and
 	small hash tables shown in
-	Figure~\ref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
+	\cref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
 	was due to long hash chains and how much was due to
 	memory-system bottlenecks?
 }\QuickQuizAnswer{
@@ -1579,8 +1579,8 @@ bottleneck.
 	the middle of
 	\cref{fig:datastruct:Effect of Memory-System Bottlenecks on Hash Tables}.
 	The other six traces are identical to their counterparts in
-	Figure~\ref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
-	on page~\pageref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}.
+	\cref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
+	on \cpageref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}.
 	The gap between this new trace and the lower set of three
 	traces is a rough measure of how much of the difference in
 	performance was due to hash-chain length, and the gap between
@@ -1621,7 +1621,7 @@ bottleneck.
 }\QuickQuizEnd
 
 Referring to the last column of
-Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz},
+\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz},
 we recall that the first 28~CPUs are in the first socket, on a
 one-CPU-per-core basis, which explains the sharp decrease in performance
 of the resizable hash table beyond 28~CPUs.
@@ -1690,7 +1690,7 @@ extraneous data element due to key mismatches.
 
 The process of shrinking a relativistic hash table by a factor of two
 is shown in
-Figure~\ref{fig:datastruct:Shrinking a Relativistic Hash Table},
+\cref{fig:datastruct:Shrinking a Relativistic Hash Table},
 in this case shrinking a two-bucket hash table into a one-bucket
 hash table, otherwise known as a linear list.
 This process works by coalescing pairs of buckets in the old larger hash
@@ -1746,7 +1746,7 @@ state~(f).
 
 Growing a relativistic hash table reverses the shrinking process,
 but requires more grace-period steps, as shown in
-Figure~\ref{fig:datastruct:Growing a Relativistic Hash Table}.
+\cref{fig:datastruct:Growing a Relativistic Hash Table}.
 The initial state~(a) is at the top of this figure, with time advancing
 from top to bottom.
 
@@ -1814,12 +1814,12 @@ library~\cite{MathieuDesnoyers2009URCU}.
 
 The preceding sections have focused on data structures that enhance
 concurrency due to partitionability
-(Section~\ref{sec:datastruct:Partitionable Data Structures}),
+(\cref{sec:datastruct:Partitionable Data Structures}),
 efficient handling of read-mostly access patterns
-(Section~\ref{sec:datastruct:Read-Mostly Data Structures}),
+(\cref{sec:datastruct:Read-Mostly Data Structures}),
 or application of read-mostly techniques to avoid
 non-partitionability
-(Section~\ref{sec:datastruct:Non-Partitionable Data Structures}).
+(\cref{sec:datastruct:Non-Partitionable Data Structures}).
 This section gives a brief review of other data structures.
 
 One of the hash table's greatest advantages for parallel use is that it
@@ -1870,7 +1870,7 @@ represents an early academic use of a technique resembling
 RCU~\cite{Pugh90}.
 
 Concurrent double-ended queues were discussed in
-Section~\ref{sec:SMPdesign:Double-Ended Queue},
+\cref{sec:SMPdesign:Double-Ended Queue},
 and concurrent stacks and queues have a long history~\cite{Treiber86},
 though not normally the most impressive performance or scalability.
 They are nevertheless a common feature of concurrent
@@ -1907,7 +1907,7 @@ alone to the set of CPU families in common use today.
 \label{sec:datastruct:Specialization}
 
 The resizable hash table presented in
-Section~\ref{sec:datastruct:Non-Partitionable Data Structures}
+\cref{sec:datastruct:Non-Partitionable Data Structures}
 used an opaque type for the key.
 This allows great flexibility, permitting any sort of key to be
 used, but it also incurs significant overhead due to the calls via
@@ -1924,8 +1924,8 @@ This overhead can be eliminated by specializing a hash-table implementation
 to a given key type and hash function, for example, by using C++ templates.
 Doing so eliminates the \co{->ht_cmp()}, \co{->ht_gethash()}, and
 \co{->ht_getkey()} function pointers in the \co{ht} structure shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Data Structures} on
-page~\pageref{lst:datastruct:Resizable Hash-Table Data Structures}.
+\cref{lst:datastruct:Resizable Hash-Table Data Structures} on
+\cpageref{lst:datastruct:Resizable Hash-Table Data Structures}.
 It also eliminates the corresponding calls through these pointers,
 which could allow the compiler to inline the resulting fixed functions,
 eliminating not only the overhead of the call instruction, but the
@@ -1980,8 +1980,8 @@ days of four-kilobyte address spaces.
 The hash tables discussed in this chapter made almost no attempt to conserve
 memory.
 For example, the \co{->ht_idx} field in the \co{ht} structure in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Data Structures} on
-page~\pageref{lst:datastruct:Resizable Hash-Table Data Structures}
+\cref{lst:datastruct:Resizable Hash-Table Data Structures} on
+\cpageref{lst:datastruct:Resizable Hash-Table Data Structures}
 always has a value of either zero or one, yet takes up a full 32 bits
 of memory.
 It could be eliminated, for example, by stealing a bit from the
@@ -2023,11 +2023,11 @@ Despite these disadvantages, bit-spinlocks are extremely useful when
 memory is at a premium.
 
 One aspect of the second opportunity was covered in
-Section~\ref{sec:datastruct:Other Resizable Hash Tables},
+\cref{sec:datastruct:Other Resizable Hash Tables},
 which presented resizable hash tables that require only one
 set of bucket-list pointers in place of the pair of sets required
 by the resizable hash table presented in
-Section~\ref{sec:datastruct:Non-Partitionable Data Structures}.
+\cref{sec:datastruct:Non-Partitionable Data Structures}.
 Another approach would be to use singly linked bucket lists in
 place of the doubly linked lists used in this chapter.
 One downside of this approach is that deletion would then require
@@ -2052,7 +2052,7 @@ Modern computers typically move data between CPUs and main memory in
 fixed-sized blocks that range in size from 32 bytes to 256 bytes.
 These blocks are called \emph{\IXpl{cache line}}, and are extremely important
 to high performance and scalability, as was discussed in
-Section~\ref{sec:cpu:Overheads}.
+\cref{sec:cpu:Overheads}.
 One timeworn way to kill both performance and scalability is to
 place incompatible variables into the same cacheline.
 For example, suppose that a resizable hash table data element had
@@ -2076,7 +2076,7 @@ struct hash_elem {
 \end{listing}
 
 One way to solve this problem on systems with 64-byte cache line is shown in
-Listing~\ref{lst:datastruct:Alignment for 64-Byte Cache Lines}.
+\cref{lst:datastruct:Alignment for 64-Byte Cache Lines}.
 Here \GCC's \co{aligned} attribute is used to force the \co{->counter}
 and the \co{ht_elem} structure into separate cache lines.
 This would allow CPUs to traverse the hash bucket list at full speed
@@ -2120,10 +2120,10 @@ cache geometry:
 	or task.
 	We saw several very effective examples of this rule of thumb
 	in the counter implementations in
-	Chapter~\ref{chp:Counting}.
+	\cref{chp:Counting}.
 \item	Going one step further, partition your data on a per-CPU,
 	per-thread, or per-task basis, as was discussed in
-	Chapter~\ref{chp:Data Ownership}.
+	\cref{chp:Data Ownership}.
 \end{enumerate}
 
 There has been some work towards automated trace-based rearrangement
@@ -2167,7 +2167,7 @@ to a given situation.
 
 This chapter has focused primarily on hash tables, including resizable
 hash tables, which are not fully partitionable.
-Section~\ref{sec:datastruct:Other Data Structures} gave a quick
+\Cref{sec:datastruct:Other Data Structures} gave a quick
 overview of a few non-hash-table data structures.
 Nevertheless, this exposition of hash tables is an excellent introduction
 to the many issues surrounding high-performance scalable data access,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -perfbook 5/8] debugging: Employ \cref{} and its variants
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
                   ` (3 preceding siblings ...)
  2021-05-18 12:21 ` [PATCH -perfbook 4/8] datastruct: Employ \cref{} and its variants Akira Yokosawa
@ 2021-05-18 12:22 ` Akira Yokosawa
  2021-05-18 12:23 ` [PATCH -perfbook 6/8] formal: " Akira Yokosawa
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:22 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 debugging/debugging.tex | 140 ++++++++++++++++++++--------------------
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/debugging/debugging.tex b/debugging/debugging.tex
index 7d739a62..3a110c8f 100644
--- a/debugging/debugging.tex
+++ b/debugging/debugging.tex
@@ -32,25 +32,25 @@ of software, and is worth intensive study in its own right.
 However, this book is primarily about concurrency, so this chapter will do
 little more than scratch the surface of this critically important topic.
 
-Section~\ref{sec:debugging:Introduction}
+\Cref{sec:debugging:Introduction}
 introduces the philosophy of debugging.
-Section~\ref{sec:debugging:Tracing}
+\Cref{sec:debugging:Tracing}
 discusses tracing,
-Section~\ref{sec:debugging:Assertions}
+\cref{sec:debugging:Assertions}
 discusses assertions, and
-Section~\ref{sec:debugging:Static Analysis}
+\cref{sec:debugging:Static Analysis}
 discusses static analysis.
-Section~\ref{sec:debugging:Code Review}
+\Cref{sec:debugging:Code Review}
 describes some unconventional approaches to code review that can
 be helpful when the fabled 10,000 eyes happen not to be looking at your code.
-Section~\ref{sec:debugging:Probability and Heisenbugs}
+\Cref{sec:debugging:Probability and Heisenbugs}
 overviews the use of probability for validating parallel software.
 Because performance and scalability are first-class requirements
 for parallel programming,
-Section~\ref{sec:debugging:Performance Estimation} covers these
+\cref{sec:debugging:Performance Estimation} covers these
 topics.
 Finally,
-Section~\ref{sec:debugging:Summary}
+\cref{sec:debugging:Summary}
 gives a fanciful summary and a short list of statistical traps to avoid.
 
 But never forget that the three best debugging tools are a thorough
@@ -64,13 +64,13 @@ sleep!
 	  programming must be the process of putting them in.}
 	 {\emph{Edsger W.~Dijkstra}}
 
-Section~\ref{sec:debugging:Where Do Bugs Come From?}
+\Cref{sec:debugging:Where Do Bugs Come From?}
 discusses the sources of bugs, and
-Section~\ref{sec:debugging:Required Mindset}
+\cref{sec:debugging:Required Mindset}
 overviews the mindset required when validating software.
-Section~\ref{sec:debugging:When Should Validation Start?}
+\Cref{sec:debugging:When Should Validation Start?}
 discusses when you should start validation, and
-Section~\ref{sec:debugging:The Open Source Way} describes the
+\cref{sec:debugging:The Open Source Way} describes the
 surprisingly effective open-source regimen of code review and
 community testing.
 
@@ -383,13 +383,13 @@ This can work well, if properly organized.
 
 Some people might see vigorous validation as a form of torture, as
 depicted in
-Figure~\ref{fig:debugging:Validation and the Geneva Convention}.\footnote{
+\cref{fig:debugging:Validation and the Geneva Convention}.\footnote{
 	The cynics among us might question whether these people are
 	afraid that validation will find bugs that they will then be
 	required to fix.}
 Such people might do well to remind themselves that, Tux cartoons aside,
 they are really torturing an inanimate object, as shown in
-Figure~\ref{fig:debugging:Rationalizing Validation}.
+\cref{fig:debugging:Rationalizing Validation}.
 Rest assured that those who fail to torture their code are doomed to be
 tortured by it!
 
@@ -414,8 +414,8 @@ by design.
 But why wait until you have code before validating your design?\footnote{
 	The old saying ``First we must code, then we have incentive to
 	think'' notwithstanding.}
-Hopefully reading Chapters~\ref{chp:Hardware and its Habits}
-and~\ref{chp:Tools of the Trade} provided you with the information
+Hopefully reading \cref{chp:Hardware and its Habits,%
+chp:Tools of the Trade} provided you with the information
 required to avoid some regrettably common design flaws,
 but discussing your design with a colleague or even simply writing it
 down can help flush out additional flaws.
@@ -483,7 +483,7 @@ you will waste coding those design bugs.
 	than the word ``testing''.
 	The word ``validation'' includes formal methods as well as
 	testing, for more on which please see
-	Chapter~\ref{chp:Formal Verification}.
+	\cref{chp:Formal Verification}.
 
 	But as long as we are bringing up things that everyone should
 	know, let's remind ourselves that Darwinian evolution is
@@ -662,7 +662,7 @@ fastpath to tell you what is going wrong, namely, these tools often have
 excessive overheads.
 There are special tracing technologies for this purpose, which typically
 leverage data ownership techniques
-(see Chapter~\ref{chp:Data Ownership})
+(see \cref{chp:Data Ownership})
 to minimize the overhead of runtime data collection.
 One example within the Linux kernel is
 ``trace events''~\cite{StevenRostedt2010perfTraceEventP1,StevenRostedt2010perfTraceEventP2,StevenRostedt2010perfTraceEventP3,StevenRostedt2010perfHP+DeathlyMacros},
@@ -670,8 +670,8 @@ which uses per-CPU buffers to allow data to be collected with
 extremely low overhead.
 Even so, enabling tracing can sometimes change timing enough to
 hide bugs, resulting in \emph{heisenbugs}, which are discussed in
-Section~\ref{sec:debugging:Probability and Heisenbugs}
-and especially Section~\ref{sec:debugging:Hunting Heisenbugs}.
+\cref{sec:debugging:Probability and Heisenbugs}
+and especially \cref{sec:debugging:Hunting Heisenbugs}.
 In the kernel, BPF can do data reduction in the kernel, reducing
 the overhead of transmitting the needed information from the kernel
 to userspace~\cite{BrendanGregg2019BPFperftools}.
@@ -1167,7 +1167,7 @@ of it occurring on the one hand, having fixed only one of several
 related bugs on the other hand, or made some ineffectual unrelated
 change on yet a third hand.
 In short, what is the answer to the eternal question posed by
-Figure~\ref{fig:cpu:Passed-the-stress-test}?
+\cref{fig:cpu:Passed-the-stress-test}?
 
 Unfortunately, the honest answer is that an infinite amount of testing
 is required to attain absolute certainty.
@@ -1205,7 +1205,7 @@ is required to attain absolute certainty.
 	\end{enumerate}
 	Of course, if your code is small enough, formal validation
 	may be helpful, as discussed in
-	Chapter~\ref{chp:Formal Verification}.
+	\cref{chp:Formal Verification}.
 	But beware: formal validation of your code will not find
 	errors in your assumptions, misunderstanding of the
 	requirements, misunderstanding of the software or hardware
@@ -1309,7 +1309,7 @@ After all, if we were to run the test enough times that the probability
 of seeing at least one failure becomes 99\,\%, if there are no failures,
 there is only 1\,\% probability of this ``success'' being due to dumb luck.
 And if we plug $f=0.1$ into
-Equation~\ref{eq:debugging:Binomial Failure Rate} and vary $n$,
+\cref{eq:debugging:Binomial Failure Rate} and vary $n$,
 we find that 43 runs gives us a 98.92\,\% chance of at least one test failing
 given the original 10\,\% per-test failure rate,
 while 44 runs gives us a 99.03\,\% chance of at least one test failing.
@@ -1317,7 +1317,7 @@ So if we run the test on our fix 44 times and see no failures, there
 is a 99\,\% probability that our fix really did help.
 
 But repeatedly plugging numbers into
-Equation~\ref{eq:debugging:Binomial Failure Rate}
+\cref{eq:debugging:Binomial Failure Rate}
 can get tedious, so let's solve for $n$:
 
 \begin{eqnarray}
@@ -1334,14 +1334,14 @@ Finally the number of tests required is given by:
 \end{equation}
 
 Plugging $f=0.1$ and $F_n=0.99$ into
-Equation~\ref{eq:debugging:Binomial Number of Tests Required}
+\cref{eq:debugging:Binomial Number of Tests Required}
 gives 43.7, meaning that we need 44 consecutive successful test
 runs to be 99\,\% certain that our fix was a real improvement.
 This matches the number obtained by the previous method, which
 is reassuring.
 
 \QuickQuiz{
-	In Equation~\ref{eq:debugging:Binomial Number of Tests Required},
+	In \cref{eq:debugging:Binomial Number of Tests Required},
 	are the logarithms base-10, base-2, or base-$\euler$?
 }\QuickQuizAnswer{
 	It does not matter.
@@ -1358,7 +1358,7 @@ is reassuring.
 \label{fig:debugging:Number of Tests Required for 99 Percent Confidence Given Failure Rate}
 \end{figure}
 
-Figure~\ref{fig:debugging:Number of Tests Required for 99 Percent Confidence Given Failure Rate}
+\Cref{fig:debugging:Number of Tests Required for 99 Percent Confidence Given Failure Rate}
 shows a plot of this function.
 Not surprisingly, the less frequently each test run fails, the more
 test runs are required to be 99\,\% confident that the bug has been
@@ -1384,7 +1384,7 @@ How many failure-free test runs are required?
 An order of magnitude improvement from a 30\,\% failure rate would be
 a 3\,\% failure rate.
 Plugging these numbers into
-Equation~\ref{eq:debugging:Binomial Number of Tests Required} yields:
+\cref{eq:debugging:Binomial Number of Tests Required} yields:
 
 \begin{equation}
 	n = \frac{\log\left(1 - 0.99\right)}{\log\left(1 - 0.03\right)} = 151.2
@@ -1397,7 +1397,7 @@ This is why making tests run more quickly and making failures more
 probable are essential skills in the development of highly reliable
 software.
 These skills will be covered in
-Section~\ref{sec:debugging:Hunting Heisenbugs}.
+\cref{sec:debugging:Hunting Heisenbugs}.
 
 \subsection{Statistics Abuse for Discrete Testing}
 \label{sec:debugging:Statistics Abuse for Discrete Testing}
@@ -1440,7 +1440,7 @@ intuitive derivation may be found in the first edition of
 this book~\cite[Equations 11.8--11.26]{McKenney2014ParallelProgramming-e1}.
 
 Let's try reworking the example from
-Section~\ref{sec:debugging:Statistics Abuse for Discrete Testing}
+\cref{sec:debugging:Statistics Abuse for Discrete Testing}
 using the Poisson distribution.
 Recall that this example involved a test with a 30\,\% failure rate per
 hour, and that the question was how long the test would need to run
@@ -1448,7 +1448,7 @@ error-free
 on a alleged fix to be 99\,\% certain that the fix actually reduced the
 failure rate.
 In this case, $m$ is zero, so that
-Equation~\ref{eq:debugging:Poisson Probability} reduces to:
+\cref{eq:debugging:Poisson Probability} reduces to:
 
 \begin{equation}
 	F_0 =  \euler^{-\lambda}
@@ -1464,10 +1464,10 @@ to 0.01 and solving for $\lambda$, resulting in:
 Because we get $0.3$ failures per hour, the number of hours required
 is $4.6/0.3 = 14.3$, which is within 10\,\% of the 13 hours
 calculated using the method in
-Section~\ref{sec:debugging:Statistics Abuse for Discrete Testing}.
+\cref{sec:debugging:Statistics Abuse for Discrete Testing}.
 Given that you normally won't know your failure rate to anywhere near
 10\,\%, the simpler method described in
-Section~\ref{sec:debugging:Statistics Abuse for Discrete Testing}
+\cref{sec:debugging:Statistics Abuse for Discrete Testing}
 is almost always good and sufficient.
 
 More generally, if we have $n$ failures per unit time, and we want to
@@ -1487,7 +1487,7 @@ following formula:
 	of failure?
 }\QuickQuizAnswer{
 	We set $n$ to $3$ and $P$ to $99.9$ in
-	Equation~\ref{eq:debugging:Error-Free Test Duration}, resulting in:
+	\cref{eq:debugging:Error-Free Test Duration}, resulting in:
 
 	\begin{equation}
 		T = - \frac{1}{3} \ln \frac{100 - 99.9}{100} = 2.3
@@ -1508,7 +1508,7 @@ failures in the second run was due to random chance?
 In other words, how confident should we be that the fix actually
 had some effect on the bug?
 This probability may be calculated by summing
-Equation~\ref{eq:debugging:Poisson Probability} as follows:
+\cref{eq:debugging:Poisson Probability} as follows:
 
 \begin{equation}
 	F_0 + F_1 + \dots + F_{m - 1} + F_m =
@@ -1551,7 +1551,7 @@ that the fix actually had some relationship to the bug.\footnote{
 
 	In particular, the \co{bfloat(cdf_poisson(2,24));} command
 	results in \co{1.181617112359357b-8}, which matches the value
-	given by Equation~\ref{eq:debugging:Possion CDF}.
+	given by \cref{eq:debugging:Possion CDF}.
 
 \begin{table}
 \renewcommand*{\arraystretch}{1.25}
@@ -1597,14 +1597,14 @@ that the fix actually had some relationship to the bug.\footnote{
 	you need a 30-hour error-free run.
 
 	Alternatively, you can use the rough-and-ready method described in
-	Section~\ref{sec:debugging:Statistics Abuse for Discrete Testing}.
+	\cref{sec:debugging:Statistics Abuse for Discrete Testing}.
 }\QuickQuizEndB
 %
 \QuickQuizE{
 	But wait!!!
 	Given that there has to be \emph{some} number of failures
 	(including the possibility of zero failures), shouldn't
-	Equation~\ref{eq:debugging:Possion CDF}
+	\cref{eq:debugging:Possion CDF}
 	approach the value $1$ as $m$ goes to infinity?
 }\QuickQuizAnswerE{
 	Indeed it should.
@@ -1686,7 +1686,7 @@ These are followed by discussion in
 \label{sec:debugging:Add Delay}
 
 Consider the count-lossy code in
-Section~\ref{sec:count:Why Isn't Concurrent Counting Trivial?}.
+\cref{sec:count:Why Isn't Concurrent Counting Trivial?}.
 Adding \co{printf()} statements will likely greatly reduce or even
 eliminate the lost counts.
 However, converting the load-add-store sequence to a load-add-delay-store
@@ -1906,7 +1906,7 @@ and a subsequent wait for an RCU callback to be
 invoked after completion of the RCU grace period.
 This distinction between an \co{rcutorture} error and near miss is
 shown in
-Figure~\ref{fig:debugging:RCU Errors and Near Misses}.
+\cref{fig:debugging:RCU Errors and Near Misses}.
 To qualify as a full-fledged error, an RCU read-side critical section
 must extend from the \co{call_rcu()} that initiated a grace period,
 through the remainder of the previous grace period, through the
@@ -1957,9 +1957,9 @@ changes you make to add debugging code.
 
 The alert reader might have noticed that this section was fuzzy and
 qualitative, in stark contrast to the precise mathematics of
-Sections~\ref{sec:debugging:Statistics for Discrete Testing},
-~\ref{sec:debugging:Statistics Abuse for Discrete Testing},
-and~\ref{sec:debuggingStatistics for Continuous Testing}.
+\cref{sec:debugging:Statistics for Discrete Testing,%
+sec:debugging:Statistics Abuse for Discrete Testing,%
+sec:debuggingStatistics for Continuous Testing}.
 If you love precision and mathematics, you may be disappointed to
 learn that the situations to which this section applies are far
 more common than those to which the preceding sections apply.
@@ -1972,7 +1972,7 @@ In this all-too-common case, statistics cannot help you.\footnote{
 	Although if you know what your program is supposed to do and
 	if your program is small enough (both less likely that you
 	might think), then the formal-verification tools described in
-	Chapter~\ref{chp:Formal Verification}
+	\cref{chp:Formal Verification}
 	can be helpful.}
 That is to say, statistics cannot help you \emph{directly}.
 But statistics can be of great indirect help---\emph{if} you have the
@@ -2285,11 +2285,11 @@ The remainder of this section looks at ways of resolving this conflict.
 
 The following sections discuss ways of dealing with these measurement
 errors, with
-Section~\ref{sec:debugging:Isolation}
+\cref{sec:debugging:Isolation}
 covering isolation techniques that may be used to prevent some forms of
 interference,
 and with
-Section~\ref{sec:debugging:Detecting Interference}
+\cref{sec:debugging:Detecting Interference}
 covering methods for detecting interference so as to reject measurement
 data that might have been corrupted by that interference.
 
@@ -2381,7 +2381,7 @@ interference.
 	Nevertheless, if for some reason you must keep the code under
 	test within the application, you will very likely need to use
 	the techniques discussed in
-	Section~\ref{sec:debugging:Detecting Interference}.
+	\cref{sec:debugging:Detecting Interference}.
 }\QuickQuizEnd
 
 Of course, if it is in fact the interference that is producing the
@@ -2396,9 +2396,9 @@ as described in the next section.
 
 If you cannot prevent interference, perhaps you can detect it
 and reject results from any affected test runs.
-Section~\ref{sec:debugging:Detecting Interference Via Measurement}
+\Cref{sec:debugging:Detecting Interference Via Measurement}
 describes methods of rejection involving additional measurements,
-while Section~\ref{sec:debugging:Detecting Interference Via Statistics}
+while \cref{sec:debugging:Detecting Interference Via Statistics}
 describes statistics-based rejection.
 
 \subsubsection{Detecting Interference Via Measurement}
@@ -2450,7 +2450,7 @@ int runtest(void)
 Opening and reading files is not the way to low overhead, and it is
 possible to get the count of context switches for a given thread
 by using the \co{getrusage()} system call, as shown in
-Listing~\ref{lst:debugging:Using getrusage() to Detect Context Switches}.
+\cref{lst:debugging:Using getrusage() to Detect Context Switches}.
 This same system call can be used to detect minor page faults (\co{ru_minflt})
 and major page faults (\co{ru_majflt}).
 
@@ -2504,7 +2504,7 @@ Otherwise, the remainder of the list is rejected.
 \label{lst:debugging:Statistical Elimination of Interference}
 \end{listing}
 
-Listing~\ref{lst:debugging:Statistical Elimination of Interference}
+\Cref{lst:debugging:Statistical Elimination of Interference}
 shows a simple \co{sh}/\co{awk} script implementing this notion.
 Input consists of an x-value followed by an arbitrarily long list of y-values,
 and output consists of one line for each input line, with fields as follows:
@@ -2541,50 +2541,50 @@ This script takes three optional arguments as follows:
 
 \begin{fcvref}[ln:debugging:datablows:whole]
 \Clnrefrange{param:b}{param:e} of
-Listing~\ref{lst:debugging:Statistical Elimination of Interference}
+\cref{lst:debugging:Statistical Elimination of Interference}
 set the default values for the parameters, and
 \clnrefrange{parse:b}{parse:e} parse
 any command-line overriding of these parameters.
 \end{fcvref}
 \begin{fcvref}[ln:debugging:datablows:whole:awk]
-The \co{awk} invocation on line~\lnref{invoke} sets the values of the
+The \co{awk} invocation on \clnref{invoke} sets the values of the
 \co{divisor}, \co{relerr}, and \co{trendbreak} variables to their
 \co{sh} counterparts.
 In the usual \co{awk} manner,
 \clnrefrange{copy:b}{end} are executed on each input
 line.
-The loop spanning lines~\lnref{copy:b} and~\lnref{copy:e} copies
+The loop spanning \clnref{copy:b,copy:e} copies
 the input y-values to the
-\co{d} array, which line~\lnref{asort} sorts into increasing order.
-Line~\lnref{comp_i} computes the number of trustworthy y-values
+\co{d} array, which \clnref{asort} sorts into increasing order.
+\Clnref{comp_i} computes the number of trustworthy y-values
 by applying \co{divisor} and rounding up.
 
 \Clnrefrange{delta}{comp_max:e} compute the \co{maxdelta}
 lower bound on the upper bound of y-values.
-To this end, line~\lnref{maxdelta} multiplies the difference in values over
+To this end, \clnref{maxdelta} multiplies the difference in values over
 the trusted region of data by the \co{divisor}, which projects the
 difference in values across the trusted region across the entire
 set of y-values.
 However, this value might well be much smaller than the relative error,
-so line~\lnref{maxdelta1} computes the absolute error (\co{d[i] * relerr})
+so \clnref{maxdelta1} computes the absolute error (\co{d[i] * relerr})
 and adds
 that to the difference \co{delta} across the trusted portion of the data.
-Lines~\lnref{comp_max:b} and~\lnref{comp_max:e} then compute the maximum of
+\Clnref{comp_max:b,comp_max:e} then compute the maximum of
 these two values.
 
 Each pass through the loop spanning \clnrefrange{add:b}{add:e}
 attempts to add another
 data value to the set of good data.
 \Clnrefrange{chk_engh}{break} compute the trend-break delta,
-with line~\lnref{chk_engh} disabling this
+with \clnref{chk_engh} disabling this
 limit if we don't yet have enough values to compute a trend,
-and with line~\lnref{mul_avr} multiplying \co{trendbreak} by the average
+and with \clnref{mul_avr} multiplying \co{trendbreak} by the average
 difference between pairs of data values in the good set.
-If line~\lnref{chk_max} determines that the candidate data value would exceed the
+If \clnref{chk_max} determines that the candidate data value would exceed the
 lower bound on the upper bound (\co{maxdelta}) \emph{and}
 that the difference between the candidate data value
 and its predecessor exceeds the trend-break difference (\co{maxdiff}),
-then line~\lnref{break} exits the loop: We have the full good set of data.
+then \clnref{break} exits the loop: We have the full good set of data.
 
 \Clnrefrange{comp_stat:b}{comp_stat:e} then compute and print
 statistics.
@@ -2615,7 +2615,7 @@ statistics.
 
 	Of course, it is possible to create a script similar to
 	that in
-	Listing~\ref{lst:debugging:Statistical Elimination of Interference}
+	\cref{lst:debugging:Statistical Elimination of Interference}
 	that uses standard deviation rather than absolute difference
 	to get a similar effect,
 	and this is left as an exercise for the interested reader.
@@ -2641,9 +2641,9 @@ statistics.
 Although statistical interference detection can be quite useful, it should
 be used only as a last resort.
 It is far better to avoid interference in the first place
-(Section~\ref{sec:debugging:Isolation}), or, failing that,
+(\cref{sec:debugging:Isolation}), or, failing that,
 detecting interference via measurement
-(Section~\ref{sec:debugging:Detecting Interference Via Measurement}).
+(\cref{sec:debugging:Detecting Interference Via Measurement}).
 
 \section{Summary}
 \label{sec:debugging:Summary}
@@ -2655,7 +2655,7 @@ Although validation never will be an exact science, much can be gained
 by taking an organized approach to it, as an organized approach will
 help you choose the right validation tools for your job, avoiding
 situations like the one fancifully depicted in
-Figure~\ref{fig:debugging:Choose Validation Methods Wisely}.
+\cref{fig:debugging:Choose Validation Methods Wisely}.
 
 \begin{figure}
 \centering
@@ -2671,11 +2671,11 @@ Problem~\cite{AlanMTuring1937HaltingProblem,GeoffreyKPullum2000HaltingProblem}.
 Fortunately for us, there is a huge number of special cases in which
 we can not only work out whether a program will halt, but also
 estimate how long it will run before halting, as discussed in
-Section~\ref{sec:debugging:Performance Estimation}.
+\cref{sec:debugging:Performance Estimation}.
 Furthermore, in cases where a given program might or might not work
 correctly, we can often establish estimates for what fraction of the
 time it will work correctly, as discussed in
-Section~\ref{sec:debugging:Probability and Heisenbugs}.
+\cref{sec:debugging:Probability and Heisenbugs}.
 
 Nevertheless, unthinking reliance on these estimates is brave to the
 point of foolhardiness.
@@ -2735,7 +2735,7 @@ ten orders of magnitude, which poses a severe challenge to
 today's testing methodologies.
 One important tool that can sometimes be applied with good effect to
 such situations is formal verification, the subject of the next chapter,
-and, more speculatively, Section~\ref{sec:future:Formal Regression Testing?}.
+and, more speculatively, \cref{sec:future:Formal Regression Testing?}.
 
 The topic of choosing a validation plan, be it testing, formal
 verification, or both, is taken up by
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -perfbook 6/8] formal: Employ \cref{} and its variants
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
                   ` (4 preceding siblings ...)
  2021-05-18 12:22 ` [PATCH -perfbook 5/8] debugging: " Akira Yokosawa
@ 2021-05-18 12:23 ` Akira Yokosawa
  2021-05-18 12:24 ` [PATCH -perfbook 7/8] together, advsync, memorder: " Akira Yokosawa
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:23 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 formal/axiomatic.tex  |  90 +++++++++++++++----------------
 formal/dyntickrcu.tex |   2 +-
 formal/formal.tex     |   6 +--
 formal/sat.tex        |   2 +-
 formal/spinhint.tex   | 122 +++++++++++++++++++++---------------------
 formal/stateless.tex  |   2 +-
 6 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/formal/axiomatic.tex b/formal/axiomatic.tex
index c7a22343..12bba26c 100644
--- a/formal/axiomatic.tex
+++ b/formal/axiomatic.tex
@@ -36,7 +36,7 @@ exists
 
 Although the PPCMEM tool can solve the famous ``independent reads of
 independent writes'' (IRIW) litmus test shown in
-Listing~\ref{lst:formal:IRIW Litmus Test}, doing so requires no less than
+\cref{lst:formal:IRIW Litmus Test}, doing so requires no less than
 fourteen CPU hours and generates no less than ten gigabytes of state space.
 That said, this situation is a great improvement over that before the advent
 of PPCMEM, where solving this problem required perusing volumes of
@@ -72,7 +72,7 @@ converts litmus tests to theorems that might be proven or disproven
 over this set of axioms.
 The resulting tool, called ``herd'',  conveniently takes as input the
 same litmus tests as PPCMEM, including the IRIW litmus test shown in
-Listing~\ref{lst:formal:IRIW Litmus Test}.
+\cref{lst:formal:IRIW Litmus Test}.
 
 \begin{listing}
 \begin{fcvlabel}[ln:formal:Expanded IRIW Litmus Test]
@@ -113,7 +113,7 @@ That said, the problem is exponential in nature, so we should expect
 herd to exhibit exponential slowdowns for larger problems.
 And this is exactly what happens, for example, if we add four more writes
 per writing CPU as shown in
-Listing~\ref{lst:formal:Expanded IRIW Litmus Test},
+\cref{lst:formal:Expanded IRIW Litmus Test},
 herd slows down by a factor of more than 50,000, requiring more than
 15 \emph{minutes} of CPU time.
 Adding threads also results in exponential
@@ -124,15 +124,15 @@ useful for checking key parallel algorithms, including the queued-lock
 handoff on x86 systems.
 The weaknesses of the herd tool are similar to those of PPCMEM, which
 were described in
-Section~\ref{sec:formal:PPCMEM Discussion}.
+\cref{sec:formal:PPCMEM Discussion}.
 There are some obscure (but very real) cases for which the PPCMEM and
 herd tools disagree, and as of 2021 many but not all of these disagreements
 was resolved.
 
 It would be helpful if the litmus tests could be written in C
-(as in Listing~\ref{lst:formal:Meaning of PPCMEM Litmus Test})
+(as in \cref{lst:formal:Meaning of PPCMEM Litmus Test})
 rather than assembly
-(as in Listing~\ref{lst:formal:PPCMEM Litmus Test}).
+(as in \cref{lst:formal:PPCMEM Litmus Test}).
 This is now possible, as will be described in the following sections.
 
 \subsection{Axiomatic Approaches and Locking}
@@ -141,7 +141,7 @@ This is now possible, as will be described in the following sections.
 Axiomatic approaches may also be applied to higher-level
 languages and also to higher-level synchronization primitives, as
 exemplified by the lock-based litmus test shown in
-Listing~\ref{lst:formal:Locking Example} (\path{C-Lock1.litmus}).
+\cref{lst:formal:Locking Example} (\path{C-Lock1.litmus}).
 This litmus test can be modeled by the Linux kernel memory model
 (LKMM)~\cite{Alglave:2018:FSC:3173162.3177156,LucMaranget2018lock.cat}.
 As expected, the \co{herd} tool's output features the string \co{Never},
@@ -149,8 +149,8 @@ correctly indicating that \co{P1()} cannot see \co{x} having a value
 of one.\footnote{
 	The output of the \co{herd} tool is compatible with that
 	of PPCMEM, so feel free to look at
-	Listings~\ref{lst:formal:PPCMEM Detects an Error}
-	and~\ref{lst:formal:PPCMEM on Repaired Litmus Test}
+	\cref{lst:formal:PPCMEM Detects an Error,%
+	lst:formal:PPCMEM on Repaired Litmus Test}
 	for examples showing the output format.}
 
 \begin{listing}
@@ -161,7 +161,7 @@ of one.\footnote{
 
 \QuickQuiz{
 	What do you have to do to run \co{herd} on litmus tests like
-	that shown in Listing~\ref{lst:formal:Locking Example}?
+	that shown in \cref{lst:formal:Locking Example}?
 }\QuickQuizAnswer{
 	Get version v4.17 (or later) of the Linux-kernel source code,
 	then follow the instructions in \path{tools/memory-model/README}
@@ -177,7 +177,7 @@ of one.\footnote{
 \end{listing}
 
 Of course, if \co{P0()} and \co{P1()} use different locks, as shown in
-Listing~\ref{lst:formal:Broken Locking Example} (\path{C-Lock2.litmus}),
+\cref{lst:formal:Broken Locking Example} (\path{C-Lock2.litmus}),
 then all bets are off.
 And in this case, the \co{herd} tool's output features the string
 \co{Sometimes}, correctly indicating that use of different locks allows
@@ -188,7 +188,7 @@ And in this case, the \co{herd} tool's output features the string
 	Why not simply emulate locking with atomic operations?
 }\QuickQuizAnswer{
 	In a word, performance, as can be seen in
-	Table~\ref{tab:formal:Locking: Modeling vs. Emulation Time (s)}.
+	\cref{tab:formal:Locking: Modeling vs. Emulation Time (s)}.
 	The first column shows the number of herd processes modeled.
 	The second column shows the herd runtime when modeling
 	\co{spin_lock()} and \co{spin_unlock()} directly in herd's
@@ -264,16 +264,16 @@ The next section looks at RCU\@.
 Axiomatic approaches can also analyze litmus tests involving
 RCU~\cite{Alglave:2018:FSC:3173162.3177156}.
 To that end,
-Listing~\ref{lst:formal:Canonical RCU Removal Litmus Test}
+\cref{lst:formal:Canonical RCU Removal Litmus Test}
 (\path{C-RCU-remove.litmus})
 shows a litmus test corresponding to the canonical RCU-mediated
 removal from a linked list.
 As with the locking litmus test, this RCU litmus test can be
 modeled by LKMM, with similar performance advantages compared
 to modeling emulations of RCU\@.
-Line~\lnref{head} shows \co{x} as the list head, initially
+\Clnref{head} shows \co{x} as the list head, initially
 referencing \co{y}, which in turn is initialized to the value
-\co{2} on line~\lnref{tail:1}.
+\co{2} on \clnref{tail:1}.
 
 \begin{listing}
 \input{CodeSamples/formal/herd/C-RCU-remove@whole.fcv}
@@ -283,23 +283,23 @@ referencing \co{y}, which in turn is initialized to the value
 
 \co{P0()} on \clnrefrange{P0start}{P0end}
 removes element \co{y} from the list by replacing it with element \co{z}
-(line~\lnref{assignnewtail}),
-waits for a grace period (line~\lnref{sync}),
-and finally zeroes \co{y} to emulate \co{free()} (line~\lnref{free}).
+(\clnref{assignnewtail}),
+waits for a grace period (\clnref{sync}),
+and finally zeroes \co{y} to emulate \co{free()} (\clnref{free}).
 \co{P1()} on \clnrefrange{P1start}{P1end}
 executes within an RCU read-side critical section
 (\clnrefrange{rl}{rul}),
-picking up the list head (line~\lnref{rderef}) and then
-loading the next element (line~\lnref{read}).
+picking up the list head (\clnref{rderef}) and then
+loading the next element (\clnref{read}).
 The next element should be non-zero, that is, not yet freed
-(line~\lnref{exists_}).
+(\clnref{exists_}).
 Several other variables are output for debugging purposes
-(line~\lnref{locations_}).
+(\clnref{locations_}).
 
 The output of the \co{herd} tool when running this litmus test features
 \co{Never}, indicating that \co{P0()} never accesses a freed element,
 as expected.
-Also as expected, removing line~\lnref{sync} results in \co{P0()}
+Also as expected, removing \clnref{sync} results in \co{P0()}
 accessing a freed element, as indicated by the \co{Sometimes} in
 the \co{herd} output.
 \end{fcvref}
@@ -313,7 +313,7 @@ the \co{herd} output.
 \begin{fcvref}[ln:formal:C-RomanPenyaev-list-rcu-rr:whole]
 A litmus test for a more complex example proposed by
 \ppl{Roman}{Penyaev}~\cite{RomanPenyaev2018rrRCU} is shown in
-Listing~\ref{lst:formal:Complex RCU Litmus Test}
+\cref{lst:formal:Complex RCU Litmus Test}
 (\path{C-RomanPenyaev-list-rcu-rr.litmus}).
 In this example, readers (modeled by \co{P0()} on
 \clnrefrange{P0start}{P0end}) access a linked list
@@ -342,7 +342,7 @@ In the Linux kernel, this would be a doubly linked circular list,
 but \co{herd} is currently incapable of modeling such a beast.
 The strategy is instead to use a singly linked linear list that
 is long enough that the end is never reached.
-Line~\lnref{rrcache} defines variable \co{c}, which is used to
+\Clnref{rrcache} defines variable \co{c}, which is used to
 cache the list pointer between successive RCU read-side critical
 sections.
 
@@ -350,46 +350,46 @@ Again, \co{P0()} on \clnrefrange{P0start}{P0end} models readers.
 This process models a pair of successive readers traversing round-robin
 through the list, with the first reader on \clnrefrange{rl1}{rul1}
 and the second reader on \clnrefrange{rl2}{rul2}.
-Line~\lnref{rdcache} fetches the pointer cached in \co{c}, and if
-line~\lnref{rdckcache} sees that the pointer was \co{NULL},
-line~\lnref{rdinitcache} restarts at the beginning of the list.
-In either case, line~\lnref{rdnext} advances to the next list element,
-and line~\lnref{rdupdcache} stores a pointer to this element back into
+\Clnref{rdcache} fetches the pointer cached in \co{c}, and if
+\clnref{rdckcache} sees that the pointer was \co{NULL},
+\clnref{rdinitcache} restarts at the beginning of the list.
+In either case, \clnref{rdnext} advances to the next list element,
+and \clnref{rdupdcache} stores a pointer to this element back into
 variable \co{c}.
 \Clnrefrange{rl2}{rul2} repeat this process, but using
 registers \co{r3} and \co{r4} instead of \co{r1} and \co{r2}.
 As with
-Listing~\ref{lst:formal:Canonical RCU Removal Litmus Test},
+\cref{lst:formal:Canonical RCU Removal Litmus Test},
 this litmus test stores zero to emulate \co{free()}, so
-line~\lnref{exists_} checks for any of these four registers being
+\clnref{exists_} checks for any of these four registers being
 \co{NULL}, also known as zero.
 
 Because \co{P0()} leaks an RCU-protected pointer from its first
 RCU read-side critical section to its second, \co{P1()} must carry
 out its update (removing \co{x}) very carefully.
-Line~\lnref{updremove} removes \co{x} by linking \co{w} to \co{y}.
-Line~\lnref{updsync1} waits for readers, after which no subsequent reader
+\Clnref{updremove} removes \co{x} by linking \co{w} to \co{y}.
+\Clnref{updsync1} waits for readers, after which no subsequent reader
 has a path to \co{x} via the linked list.
-Line~\lnref{updrdcache} fetches \co{c}, and if line~\lnref{updckcache}
+\Clnref{updrdcache} fetches \co{c}, and if \clnref{updckcache}
 determines that \co{c} references the newly removed \co{x},
-line~\lnref{updinitcache} sets \co{c} to \co{NULL}
-and line~\lnref{updsync2} again waits for readers, after which no
+\clnref{updinitcache} sets \co{c} to \co{NULL}
+and \clnref{updsync2} again waits for readers, after which no
 subsequent reader can fetch \co{x} from \co{c}.
-In either case, line~\lnref{updfree} emulates \co{free()} by storing
+In either case, \clnref{updfree} emulates \co{free()} by storing
 zero to \co{x}.
 
 \QuickQuiz{
 	\begin{fcvref}[ln:formal:C-RomanPenyaev-list-rcu-rr:whole]
-	In Listing~\ref{lst:formal:Complex RCU Litmus Test},
+	In \cref{lst:formal:Complex RCU Litmus Test},
 	why couldn't a reader fetch \co{c} just before \co{P1()}
-	zeroed it on line~\lnref{updinitcache}, and then later
+	zeroed it on \clnref{updinitcache}, and then later
 	store this same value back into \co{c} just after it was
 	zeroed, thus defeating the zeroing operation?
 	\end{fcvref}
 }\QuickQuizAnswer{
 	\begin{fcvref}[ln:formal:C-RomanPenyaev-list-rcu-rr:whole]
 	Because the reader advances to the next element on
-	line~\lnref{rdnext}, thus avoiding storing a pointer to the
+	\clnref{rdnext}, thus avoiding storing a pointer to the
 	same element as was fetched.
 	\end{fcvref}
 }\QuickQuizEnd
@@ -405,9 +405,9 @@ in the \co{herd} output.
 \QuickQuizSeries{%
 \QuickQuizB{
 	\begin{fcvref}[ln:formal:C-RomanPenyaev-list-rcu-rr:whole]
-	In Listing~\ref{lst:formal:Complex RCU Litmus Test},
+	In \cref{lst:formal:Complex RCU Litmus Test},
 	why not have just one call to \co{synchronize_rcu()}
-	immediately before line~\lnref{updfree}?
+	immediately before \clnref{updfree}?
 	\end{fcvref}
 }\QuickQuizAnswerB{
 	\begin{fcvref}[ln:formal:C-RomanPenyaev-list-rcu-rr:whole]
@@ -418,8 +418,8 @@ in the \co{herd} output.
 %
 \QuickQuizE{
 	\begin{fcvref}[ln:formal:C-RomanPenyaev-list-rcu-rr:whole]
-	Also in Listing~\ref{lst:formal:Complex RCU Litmus Test},
-	can't line~\lnref{updfree} be \co{WRITE_ONCE()} instead
+	Also in \cref{lst:formal:Complex RCU Litmus Test},
+	can't \clnref{updfree} be \co{WRITE_ONCE()} instead
 	of \co{smp_store_release()}?
 	\end{fcvref}
 }\QuickQuizAnswerE{
diff --git a/formal/dyntickrcu.tex b/formal/dyntickrcu.tex
index ea534b10..f1494f20 100644
--- a/formal/dyntickrcu.tex
+++ b/formal/dyntickrcu.tex
@@ -503,7 +503,7 @@ re-entering dynticks-idle mode (for example, that same task blocking).
 	In fact, one must instead explicitly model lack of memory barriers,
 	for example, as shown in
 	\cref{lst:formal:QRCU Unordered Summation} on
-	page~\pageref{lst:formal:QRCU Unordered Summation}.
+	\cpageref{lst:formal:QRCU Unordered Summation}.
 }\QuickQuizEndB
 %
 \QuickQuizE{
diff --git a/formal/formal.tex b/formal/formal.tex
index d0ac6d4f..0f954861 100644
--- a/formal/formal.tex
+++ b/formal/formal.tex
@@ -174,7 +174,7 @@ The larger overarching software construct is of course validated by testing.
 
 	Perhaps someday formal verification will be used heavily for
 	validation, including for what is now known as regression testing.
-	Section~\ref{sec:future:Formal Regression Testing?} looks at
+	\Cref{sec:future:Formal Regression Testing?} looks at
 	what would be required to make this possibility a reality.
 }\QuickQuizEnd
 
@@ -203,10 +203,10 @@ formal-verification tools require your code to be hand-translated
 to a special-purpose language.
 For example, a complex implementation of the dynticks interface for
 preemptible RCU that was presented in
-Section~\ref{sec:formal:Promela Parable: dynticks and Preemptible RCU}
+\cref{sec:formal:Promela Parable: dynticks and Preemptible RCU}
 turned out to
 have a much simpler alternative implementation, as discussed in
-Section~\ref{sec:formal:Simplicity Avoids Formal Verification}.
+\cref{sec:formal:Simplicity Avoids Formal Verification}.
 All else being equal, a simpler implementation is much better than
 a proof of correctness for a complex implementation.
 
diff --git a/formal/sat.tex b/formal/sat.tex
index 85b67ed0..cc85e4fc 100644
--- a/formal/sat.tex
+++ b/formal/sat.tex
@@ -36,7 +36,7 @@ One example is the C bounded model checker, or \co{cbmc}, which is
 available as part of many Linux distributions.
 This tool is quite easy to use, with \co{cbmc test.c} sufficing to
 validate \path{test.c}, resulting in the processing flow shown in
-Figure~\ref{fig:formal:CBMC Processing Flow}.
+\cref{fig:formal:CBMC Processing Flow}.
 This ease of use is exceedingly important because it opens the door
 to formal verification being incorporated into regression-testing
 frameworks.
diff --git a/formal/spinhint.tex b/formal/spinhint.tex
index d05bab16..98784801 100644
--- a/formal/spinhint.tex
+++ b/formal/spinhint.tex
@@ -12,18 +12,18 @@ This section features the general-purpose Promela and Spin tools,
 which may be used to carry out a full
 state-space search of many types of multi-threaded code.
 They are used to verifying data communication protocols.
-Section~\ref{sec:formal:Promela and Spin}
+\Cref{sec:formal:Promela and Spin}
 introduces Promela and Spin, including a couple of warm-up exercises
 verifying both non-atomic and atomic increment.
-Section~\ref{sec:formal:How to Use Promela}
+\Cref{sec:formal:How to Use Promela}
 describes use of Promela, including example command lines and a
 comparison of Promela syntax to that of C\@.
-Section~\ref{sec:formal:Promela Example: Locking}
+\Cref{sec:formal:Promela Example: Locking}
 shows how Promela may be used to verify locking,
-\ref{sec:formal:Promela Example: QRCU}
+\cref{sec:formal:Promela Example: QRCU}
 uses Promela to verify an unusual implementation of RCU named ``QRCU'',
 and finally
-Section~\ref{sec:formal:Promela Parable: dynticks and Preemptible RCU}
+\cref{sec:formal:Promela Parable: dynticks and Preemptible RCU}
 applies Promela to early versions of RCU's dyntick-idle implementation.
 
 \subsection{Promela and Spin}
@@ -64,12 +64,12 @@ more complex uses.
 \label{sec:formal:Warm-Up: Non-Atomic Increment}
 
 \begin{fcvref}[ln:formal:promela:increment:whole]
-Listing~\ref{lst:formal:Promela Code for Non-Atomic Increment}
+\Cref{lst:formal:Promela Code for Non-Atomic Increment}
 demonstrates the textbook race condition
 resulting from non-atomic increment.
-Line~\lnref{nprocs} defines the number of processes to run (we will vary this
-to see the effect on state space), line~\lnref{count} defines the counter,
-and line~\lnref{prog} is used to implement the assertion that appears on
+\Clnref{nprocs} defines the number of processes to run (we will vary this
+to see the effect on state space), \clnref{count} defines the counter,
+and \clnref{prog} is used to implement the assertion that appears on
 \clnrefrange{assert:b}{assert:e}.
 
 \begin{listing}
@@ -85,7 +85,7 @@ block later in the code.
 Because simple Promela statements are each assumed atomic, we must
 break the increment into the two statements on
 \clnrefrange{incr:b}{incr:e}.
-The assignment on line~\lnref{setprog} marks the process's completion.
+The assignment on \clnref{setprog} marks the process's completion.
 Because the Spin system will fully search the state space, including
 all possible sequences of states, there is no need for the loop
 that would be used for conventional stress testing.
@@ -113,13 +113,13 @@ initializes the i-th
 incrementer's progress cell, runs the i-th incrementer's process, and
 then increments the variable \co{i}.
 The second block of the \co{do-od} on
-line~\lnref{block2} exits the loop once
+\clnref{block2} exits the loop once
 these processes have been started.
 
 The atomic block on \clnrefrange{assert:b}{assert:e} also contains
 a similar \co{do-od}
 loop that sums up the progress counters.
-The \co{assert()} statement on line~\lnref{assert} verifies that
+The \co{assert()} statement on \clnref{assert} verifies that
 if all processes
 have been completed, then all counts have been correctly recorded.
 \end{fcvref}
@@ -140,7 +140,7 @@ cc -DSAFETY -o pan pan.c    # Compile the model
 \end{listing}
 
 This will produce output as shown in
-Listing~\ref{lst:formal:Non-Atomic Increment Spin Output}.
+\cref{lst:formal:Non-Atomic Increment Spin Output}.
 The first line tells us that our assertion was violated (as expected
 given the non-atomic increment!).
 The second line that a \co{trail} file was written describing how the
@@ -167,7 +167,7 @@ spin -t -p increment.spin
 \end{listing*}
 
 This gives the output shown in
-Listing~\ref{lst:formal:Non-Atomic Increment Error Trail}.
+\cref{lst:formal:Non-Atomic Increment Error Trail}.
 As can be seen, the first portion of the init block created both
 incrementer processes, both of which first fetched the counter,
 then both incremented and stored it, losing a count.
@@ -178,13 +178,13 @@ The assertion then triggered, after which the global state is displayed.
 
 It is easy to fix this example by placing the body of the incrementer
 processes in an atomic block as shown in
-Listing~\ref{lst:formal:Promela Code for Atomic Increment}.
+\cref{lst:formal:Promela Code for Atomic Increment}.
 One could also have simply replaced the pair of statements with
 \co{counter = counter + 1}, because Promela statements are
 atomic.
 Either way, running this modified model gives us an error-free traversal
 of the state space, as shown in
-Listing~\ref{lst:formal:Atomic Increment Spin Output}.
+\cref{lst:formal:Atomic Increment Spin Output}.
 
 \begin{listing}
 \input{CodeSamples/formal/promela/atomicincrement@incrementer.fcv}
@@ -199,7 +199,7 @@ Listing~\ref{lst:formal:Atomic Increment Spin Output}.
 \label{lst:formal:Atomic Increment Spin Output}
 \end{listing}
 
-Table~\ref{tab:advsync:Memory Usage of Increment Model}
+\Cref{tab:advsync:Memory Usage of Increment Model}
 shows the number of states and memory consumed
 as a function of number of incrementers modeled
 (by redefining \co{NUMPROCS}):
@@ -422,14 +422,14 @@ fi
 	After all, they are not part of the algorithm.
 	One example of a complex assertion (to be discussed in more
 	detail later) is as shown in
-	Listing~\ref{lst:formal:Complex Promela Assertion}.
+	\cref{lst:formal:Complex Promela Assertion}.
 
 	There is no reason to evaluate this assertion
 	non-atomically, since it is not actually part of the algorithm.
 	Because each statement contributes to state, we can reduce
 	the number of useless states by enclosing it in an \co{atomic}
 	block as shown in
-	Listing~\ref{lst:formal:Atomic Block for Complex Promela Assertion}.
+	\cref{lst:formal:Atomic Block for Complex Promela Assertion}.
 
 \item	Promela does not provide functions.
 	You must instead use C preprocessor macros.
@@ -485,19 +485,19 @@ Since locks are generally useful, \co{spin_lock()} and
 \co{spin_unlock()}
 macros are provided in \path{lock.h}, which may be included from
 multiple Promela models, as shown in
-Listing~\ref{lst:formal:Promela Code for Spinlock}.
+\cref{lst:formal:Promela Code for Spinlock}.
 The \co{spin_lock()} macro contains an infinite \co{do-od} loop
 spanning \clnrefrange{dood:b}{dood:e},
-courtesy of the single guard expression of ``1'' on line~\lnref{one}.
+courtesy of the single guard expression of ``1'' on \clnref{one}.
 The body of this loop is a single atomic block that contains
 an \co{if-fi} statement.
 The \co{if-fi} construct is similar to the \co{do-od} construct, except
 that it takes a single pass rather than looping.
-If the lock is not held on line~\lnref{notheld}, then
-line~\lnref{acq} acquires it and
-line~\lnref{break} breaks out of the enclosing \co{do-od} loop (and also exits
+If the lock is not held on \clnref{notheld}, then
+\clnref{acq} acquires it and
+\clnref{break} breaks out of the enclosing \co{do-od} loop (and also exits
 the atomic block).
-On the other hand, if the lock is already held on line~\lnref{held},
+On the other hand, if the lock is already held on \clnref{held},
 we do nothing (\co{skip}), and fall out of the \co{if-fi} and the
 atomic block so as to take another pass through the outer
 loop, repeating until the lock is available.
@@ -530,36 +530,36 @@ weak memory ordering must be explicitly coded.
 
 \begin{fcvref}[ln:formal:promela:lock:spin]
 These macros are tested by the Promela code shown in
-Listing~\ref{lst:formal:Promela Code to Test Spinlocks}.
+\cref{lst:formal:Promela Code to Test Spinlocks}.
 This code is similar to that used to test the increments,
 with the number of locking processes defined by the \co{N_LOCKERS}
-macro definition on line~\lnref{nlockers}.
-The mutex itself is defined on line~\lnref{mutex},
+macro definition on \clnref{nlockers}.
+The mutex itself is defined on \clnref{mutex},
 an array to track the lock owner
-on line~\lnref{array}, and line~\lnref{sum} is used by assertion
+on \clnref{array}, and \clnref{sum} is used by assertion
 code to verify that only one process holds the lock.
 \end{fcvref}
 
 \begin{fcvref}[ln:formal:promela:lock:spin:locker]
 The locker process is on \clnrefrange{b}{e}, and simply loops forever
-acquiring the lock on line~\lnref{lock}, claiming it on line~\lnref{claim},
-unclaiming it on line~\lnref{unclaim}, and releasing it on line~\lnref{unlock}.
+acquiring the lock on \clnref{lock}, claiming it on \clnref{claim},
+unclaiming it on \clnref{unclaim}, and releasing it on \clnref{unlock}.
 \end{fcvref}
 
 \begin{fcvref}[ln:formal:promela:lock:spin:init]
 The init block on \clnrefrange{b}{e} initializes the current locker's
-havelock array entry on line~\lnref{array}, starts the current locker on
-line~\lnref{start}, and advances to the next locker on line~\lnref{next}.
+havelock array entry on \clnref{array}, starts the current locker on
+\clnref{start}, and advances to the next locker on \clnref{next}.
 Once all locker processes are spawned, the \co{do-od} loop
-moves to line~\lnref{chkassert}, which checks the assertion.
-Lines~\lnref{sum} and~\lnref{j} initialize the control variables,
+moves to \clnref{chkassert}, which checks the assertion.
+\Clnref{sum,j} initialize the control variables,
 \clnrefrange{atm:b}{atm:e} atomically sum the havelock array entries,
-line~\lnref{assert} is the assertion, and line~\lnref{break} exits the loop.
+\clnref{assert} is the assertion, and \clnref{break} exits the loop.
 \end{fcvref}
 
 We can run this model by placing the two code fragments of
-Listings~\ref{lst:formal:Promela Code for Spinlock}
-and~\ref{lst:formal:Promela Code to Test Spinlocks} into
+\cref{lst:formal:Promela Code for Spinlock,%
+lst:formal:Promela Code to Test Spinlocks} into
 files named \path{lock.h} and \path{lock.spin}, respectively, and then running
 the following commands:
 
@@ -577,7 +577,7 @@ cc -DSAFETY -o pan pan.c
 \end{listing}
 
 The output will look something like that shown in
-Listing~\ref{lst:formal:Output for Spinlock Test}.
+\cref{lst:formal:Output for Spinlock Test}.
 As expected, this run has no assertion failures (``errors: 0'').
 
 \QuickQuizSeries{%
@@ -676,7 +676,7 @@ but is unlikely to ever be included in the Linux kernel.
 \end{listing}
 
 Returning to the Promela code for QRCU, the global variables are as shown in
-Listing~\ref{lst:formal:QRCU Global Variables}.
+\cref{lst:formal:QRCU Global Variables}.
 This example uses locking and includes \path{lock.h}.
 Both the number of readers and writers can be varied using the
 two \co{#define} statements, giving us not one but two ways to create
@@ -706,19 +706,19 @@ Finally, the \co{mutex} variable is used to serialize updaters' slowpaths.
 
 \begin{fcvref}[ln:formal:promela:qrcu:reader]
 QRCU readers are modeled by the \co{qrcu_reader()} process shown in
-Listing~\ref{lst:formal:QRCU Reader Process}.
+\cref{lst:formal:QRCU Reader Process}.
 A \co{do-od} loop spans \clnrefrange{do}{od},
 with a single guard of ``1''
-on line~\lnref{one} that makes it an infinite loop.
-Line~\lnref{curidx} captures the current value of the global index,
+on \clnref{one} that makes it an infinite loop.
+\Clnref{curidx} captures the current value of the global index,
 and \clnrefrange{atm:b}{atm:e}
 atomically increment it (and break from the infinite loop)
 if its value was non-zero (\co{atomic_inc_not_zero()}).
-Line~\lnref{cs:entry} marks entry into the RCU read-side critical section, and
-line~\lnref{cs:exit} marks exit from this critical section,
+\Clnref{cs:entry} marks entry into the RCU read-side critical section, and
+\clnref{cs:exit} marks exit from this critical section,
 both lines for the benefit of
 the \co{assert()} statement that we shall encounter later.
-Line~\lnref{atm:dec} atomically decrements the same counter that we incremented,
+\Clnref{atm:dec} atomically decrements the same counter that we incremented,
 thereby exiting the RCU read-side critical section.
 \end{fcvref}
 
@@ -730,22 +730,22 @@ thereby exiting the RCU read-side critical section.
 
 \begin{fcvref}[ln:formal:promela:qrcu:sum_unordered]
 The C-preprocessor macro shown in
-Listing~\ref{lst:formal:QRCU Unordered Summation}
+\cref{lst:formal:QRCU Unordered Summation}
 sums the pair of counters so as to emulate weak memory ordering.
 \Clnrefrange{fetch:b}{fetch:e} fetch one of the counters,
-and line~\lnref{sum_other} fetches the other
+and \clnref{sum_other} fetches the other
 of the pair and sums them.
 The atomic block consists of a single \co{do-od} statement.
 This \co{do-od} statement (spanning \clnrefrange{do}{od}) is unusual in that
 it contains two unconditional
-branches with guards on lines~\lnref{g1} and~\lnref{g2}, which causes Promela to
+branches with guards on \clnref{g1,g2}, which causes Promela to
 non-deterministically choose one of the two (but again, the full
 state-space search causes Promela to eventually make all possible
 choices in each applicable situation).
 The first branch fetches the zero-th counter and sets \co{i} to 1 (so
-that line~\lnref{sum_other} will fetch the first counter), while the second
+that \clnref{sum_other} will fetch the first counter), while the second
 branch does the opposite, fetching the first counter and setting \co{i}
-to 0 (so that line~\lnref{sum_other} will fetch the second counter).
+to 0 (so that \clnref{sum_other} will fetch the second counter).
 \end{fcvref}
 
 \QuickQuiz{
@@ -764,20 +764,20 @@ to 0 (so that line~\lnref{sum_other} will fetch the second counter).
 \begin{fcvref}[ln:formal:promela:qrcu:updater]
 With the \co{sum_unordered} macro in place, we can now proceed
 to the update-side process shown in
-Listing~\ref{lst:formal:QRCU Updater Process}.
+\cref{lst:formal:QRCU Updater Process}.
 The update-side process repeats indefinitely, with the corresponding
 \co{do-od} loop ranging over \clnrefrange{do}{od}.
 Each pass through the loop first snapshots the global \co{readerprogress}
 array into the local \co{readerstart} array on
 \clnrefrange{atm1:b}{atm1:e}.
-This snapshot will be used for the assertion on line~\lnref{assert}.
-Line~\lnref{sum_unord} invokes \co{sum_unordered}, and then
+This snapshot will be used for the assertion on \clnref{assert}.
+\Clnref{sum_unord} invokes \co{sum_unordered}, and then
 \clnrefrange{reinvoke:b}{reinvoke:e}
 re-invoke \co{sum_unordered} if the fastpath is potentially
 usable.
 
 \Clnrefrange{slow:b}{slow:e} execute the slowpath code if need be, with
-lines~\lnref{acq} and~\lnref{rel} acquiring and releasing the update-side lock,
+\clnref{acq,rel} acquiring and releasing the update-side lock,
 \clnrefrange{flip_idx:b}{flip_idx:e} flipping the index, and
 \clnrefrange{wait:b}{wait:e} waiting for
 all pre-existing readers to complete.
@@ -849,7 +849,7 @@ this update still be in progress.
 
 \begin{fcvref}[ln:formal:promela:qrcu:init]
 All that remains is the initialization block shown in
-Listing~\ref{lst:formal:QRCU Initialization Process}.
+\cref{lst:formal:QRCU Initialization Process}.
 This block simply initializes the counter pair on
 \clnrefrange{i_ctr:b}{i_ctr:e},
 spawns the reader processes on
@@ -908,7 +908,7 @@ cc -DSAFETY [-DCOLLAPSE] -o pan pan.c
 \end{table}
 
 The output shows that this model passes all of the cases shown in
-Table~\ref{tab:advsync:Memory Usage of QRCU Model}.
+\cref{tab:advsync:Memory Usage of QRCU Model}.
 It would be nice to run three readers and three
 updaters, however, simple extrapolation indicates that this will
 require about half a terabyte of memory.
@@ -943,7 +943,7 @@ a long run due to incomplete search resulting from a too-tight
 depth limit.
 This run took a little more than 3~days on a \Power{9} server.
 The result is shown in
-Listing~\ref{lst:formal:spinhint:3 Readers 3 Updaters QRCU Spin Output with -DMA=96}.
+\cref{lst:formal:spinhint:3 Readers 3 Updaters QRCU Spin Output with -DMA=96}.
 This Spin run completed successfully with a total memory
 usage of only 6.5\,GB, which is almost two orders of magnitude
 lower than the \co{-DCOLLAPSE} usage of about half a terabyte.
@@ -973,7 +973,7 @@ lower than the \co{-DCOLLAPSE} usage of about half a terabyte.
 	runs with \co{-DCOLLAPSE} and with \co{-DMA=88}
 	(two readers and three updaters).
 	The diff of outputs from those runs is shown in
-	Listing~\ref{lst:formal:promela:Spin Output Diff of -DCOLLAPSE and -DMA=88}.
+	\cref{lst:formal:promela:Spin Output Diff of -DCOLLAPSE and -DMA=88}.
 	As you can see, they agree on the numbers of states
 	(stored and matched).
 }\QuickQuizEnd
@@ -1029,7 +1029,7 @@ lower than the \co{-DCOLLAPSE} usage of about half a terabyte.
 \label{tab:formal:promela:QRCU Spin Result Summary}
 \end{table*}
 
-For reference, Table~\ref{tab:formal:promela:QRCU Spin Result Summary}
+For reference, \cref{tab:formal:promela:QRCU Spin Result Summary}
 summarizes the Spin results with \co{-DCOLLAPSE} and \co{-DMA=N}
 compiler flags.
 The memory usage is obtained with minimal sufficient
@@ -1038,7 +1038,7 @@ Hashtable sizes for \co{-DCOLLAPSE} runs are tweaked by
 the \co{-wN} option of \co{./pan} to avoid using too much
 memory hashing small state spaces.
 Hence the memory usage is smaller than what is shown in
-Table~\ref{tab:advsync:Memory Usage of QRCU Model}, where the
+\cref{tab:advsync:Memory Usage of QRCU Model}, where the
 hashtable size starts from the default of \co{-w24}.
 The runtime is from a \Power{9} server, which shows that \co{-DMA=N}
 suffers up to about an order of magnitude higher CPU overhead
diff --git a/formal/stateless.tex b/formal/stateless.tex
index 0622a5d6..ef393a60 100644
--- a/formal/stateless.tex
+++ b/formal/stateless.tex
@@ -26,7 +26,7 @@ Although the jury is still out on this question, stateless model
 checkers such as Nidhugg~\cite{CarlLeonardsson2014Nidhugg} have in
 some cases handled larger programs~\cite{SMC-TreeRCU}, and with
 similar ease of use, as illustrated by
-Figure~\ref{fig:formal:Nidhugg Processing Flow}.
+\cref{fig:formal:Nidhugg Processing Flow}.
 In addition, Nidhugg was more than an order of magnitude faster than
 was \co{cbmc} for some Linux-kernel RCU verification scenarios.
 Of course, Nidhugg's speed and scalability advantages are tied to
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -perfbook 7/8] together, advsync, memorder: Employ \cref{} and its variants
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
                   ` (5 preceding siblings ...)
  2021-05-18 12:23 ` [PATCH -perfbook 6/8] formal: " Akira Yokosawa
@ 2021-05-18 12:24 ` Akira Yokosawa
  2021-05-18 12:25 ` [PATCH -perfbook 8/8] easy, future, appendix: " Akira Yokosawa
  2021-05-18 18:30 ` [PATCH -perfbook 0/8] Employ cleveref macros, take four Paul E. McKenney
  8 siblings, 0 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:24 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 advsync/advsync.tex   |  2 +-
 advsync/rt.tex        |  4 ++--
 memorder/memorder.tex | 24 ++++++++++++------------
 together/applyrcu.tex |  4 ++--
 together/refcnt.tex   |  4 ++--
 5 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/advsync/advsync.tex b/advsync/advsync.tex
index c6fd6490..48a71b74 100644
--- a/advsync/advsync.tex
+++ b/advsync/advsync.tex
@@ -247,7 +247,7 @@ saw his first computer, which had but one CPU\@.
 \Clnrefrange{struct:b}{struct:e} show the \co{node_t} structure,
 which contains an arbitrary value and a pointer to the next structure
 on the stack and
-\Clnref{top} shows the top-of-stack pointer.
+\clnref{top} shows the top-of-stack pointer.
 
 The \co{list_push()} function spans \clnrefrange{push:b}{push:e}.
 \Clnref{push:alloc} allocates a new node and
diff --git a/advsync/rt.tex b/advsync/rt.tex
index e939a029..62fcf123 100644
--- a/advsync/rt.tex
+++ b/advsync/rt.tex
@@ -835,7 +835,7 @@ As usual, the answer seems to be ``It depends,'' as discussed in the
 following sections.
 \Cref{sec:advsync:Event-Driven Real-Time Support}
 considers event-driven real-time systems, and
-\Cref{sec:advsync:Polling-Loop Real-Time Support}
+\cref{sec:advsync:Polling-Loop Real-Time Support}
 considers real-time systems that use a CPU-bound polling loop.
 
 \subsubsection{Event-Driven Real-Time Support}
@@ -1297,7 +1297,7 @@ has become zero, in other words, if this corresponds to the outermost
 If so, \clnref{bar2} prevents the compiler from reordering this nesting
 update with \clnref{chks}'s check for special handling.
 If special handling is required, then the call to
-\co{rcu_read_unlock_special()} on \lnref{unls} carries it out.
+\co{rcu_read_unlock_special()} on \clnref{unls} carries it out.
 
 There are several types of special handling that can be required, but
 we will focus on that required when the RCU read-side critical section
diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 8c9be547..e9ad119d 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -774,7 +774,7 @@ Adding ordering usually slows things down.
 Of course, there are situations where adding instructions speeds things
 up, as was shown by
 \cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR} on
-page~\pageref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR},
+\cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR},
 but careful benchmarking is required in such cases.
 And even then, it is quite possible that although you sped things up
 a little bit on \emph{your} system, you might well have slowed things
@@ -1537,7 +1537,7 @@ interchangeably.
 \end{listing}
 
 \begin{fcvref}[ln:formal:C-CCIRIW+o+o+o-o+o-o:whole]
-\cref{lst:memorder:Cache-Coherent IRIW Litmus Test}
+\Cref{lst:memorder:Cache-Coherent IRIW Litmus Test}
 (\path{C-CCIRIW+o+o+o-o+o-o.litmus})
 shows a litmus test that tests for cache coherence,
 where ``IRIW'' stands
@@ -1889,7 +1889,7 @@ Dependencies do not provide cumulativity,
 which is why the ``C'' column is blank for the \co{READ_ONCE()} row
 of \cref{tab:memorder:Linux-Kernel Memory-Ordering Cheat Sheet}
 on
-page~\pageref{tab:memorder:Linux-Kernel Memory-Ordering Cheat Sheet}.
+\cpageref{tab:memorder:Linux-Kernel Memory-Ordering Cheat Sheet}.
 However, as indicated by the ``C'' in their ``C'' column,
 release operations do provide cumulativity.
 Therefore,
@@ -2090,7 +2090,7 @@ same variable is not necessarily the store that started last.
 This should not come as a surprise to anyone who carefully examined
 \cref{fig:memorder:A Variable With More Simultaneous Values}
 on
-page~\pageref{fig:memorder:A Variable With More Simultaneous Values}.
+\cpageref{fig:memorder:A Variable With More Simultaneous Values}.
 
 \begin{listing}
 \input{CodeSamples/formal/litmus/C-2+2W+o-wmb-o+o-wmb-o@whole.fcv}
@@ -2168,7 +2168,7 @@ Of course, just the passage of time by itself is not enough, as
 was seen in
 \cref{lst:memorder:Load-Buffering Litmus Test (No Ordering)}
 on
-page~\pageref{lst:memorder:Load-Buffering Litmus Test (No Ordering)},
+\cpageref{lst:memorder:Load-Buffering Litmus Test (No Ordering)},
 which has nothing but store-to-load links and, because it provides
 absolutely no ordering, still can trigger its \co{exists} clause.
 However, as long as each thread provides even the weakest possible
@@ -2208,7 +2208,7 @@ next section.
 A minimal release-acquire chain was shown in
 \cref{lst:memorder:Enforcing Ordering of Load-Buffering Litmus Test}
 on
-page~\pageref{lst:memorder:Enforcing Ordering of Load-Buffering Litmus Test},
+\cpageref{lst:memorder:Enforcing Ordering of Load-Buffering Litmus Test},
 but these chains can be much longer, as shown in
 \cref{lst:memorder:Long LB Release-Acquire Chain}
 (\path{C-LB+a-r+a-r+a-r+a-r.litmus}).
@@ -2227,7 +2227,7 @@ it turns out that they can tolerate one load-to-store step, despite
 such steps being counter-temporal, as shown in
 \cref{fig:memorder:Load-to-Store is Counter-Temporal}
 on
-page~\pageref{fig:memorder:Load-to-Store is Counter-Temporal}.
+\cpageref{fig:memorder:Load-to-Store is Counter-Temporal}.
 For example,
 \cref{lst:memorder:Long ISA2 Release-Acquire Chain}
 (\path{C-ISA2+o-r+a-r+a-r+a-o.litmus})
@@ -2450,7 +2450,7 @@ In short, use of \co{READ_ONCE()}, \co{WRITE_ONCE()}, \co{barrier()},
 \co{volatile}, and other primitives called out in
 \cref{tab:memorder:Linux-Kernel Memory-Ordering Cheat Sheet}
 on
-page~\pageref{tab:memorder:Linux-Kernel Memory-Ordering Cheat Sheet}
+\cpageref{tab:memorder:Linux-Kernel Memory-Ordering Cheat Sheet}
 are valuable tools in preventing the compiler from
 optimizing your parallel algorithm out of existence.
 Compilers are starting to provide other mechanisms for avoiding
@@ -2571,7 +2571,7 @@ if (p == &reserve_int) {
 	For example, if \co{a} and \co{b} are equal, \co{cp+a-b}
 	is an identity function, including preserving the dependency.
 \item	Comparisons can break dependencies.
-	\cref{lst:memorder:Breakable Dependencies With Comparisons}
+	\Cref{lst:memorder:Breakable Dependencies With Comparisons}
 	shows how this can happen.
 	Here global pointer \co{gp} points to a dynamically allocated
 	integer, but if memory is low, it might instead point to
@@ -3299,7 +3299,7 @@ RCU callback invocation in the lower left.\footnote{
 	For more detail, please see
 	\crefrange{fig:defer:RCU Reader and Later Grace Period}{fig:defer:RCU Reader Within Grace Period}
 	starting on
-	page~\pageref{fig:defer:RCU Reader and Later Grace Period}.}
+	\cpageref{fig:defer:RCU Reader and Later Grace Period}.}
 
 \begin{figure}
 \centering
@@ -4469,7 +4469,7 @@ void synchronize_rcu(void)
 		(Recall that \clnref{if,store} have been reordered by the
 		compiler to follow \clnref{acqmutex}).
 	\item	CPU~0 invokes \co{update_counter_and_wait()} from
-		\lnref{call1}.
+		\clnref{call1}.
 	\item	CPU~0 invokes \co{rcu_gp_ongoing()} on itself at
 		\clnref{call2}, and \clnref{load} sees that CPU~0 is
 		in a quiescent state.
@@ -4479,7 +4479,7 @@ void synchronize_rcu(void)
 		already holds the lock, CPU~1 blocks waiting for this
 		lock to become available.
 		Because the compiler reordered \clnref{if,store} to follow
-		\lnref{acqmutex}, CPU~1 does not clear its own counter,
+		\clnref{acqmutex}, CPU~1 does not clear its own counter,
 		despite having been online.
 	\item	CPU~0 invokes \co{rcu_gp_ongoing()} on CPU~1 at
 		\clnref{call2}, and \clnref{load} sees that CPU~1 is
diff --git a/together/applyrcu.tex b/together/applyrcu.tex
index 8173ae12..4f4de79f 100644
--- a/together/applyrcu.tex
+++ b/together/applyrcu.tex
@@ -28,7 +28,7 @@ were required to acquire a global
 lock, and thus incurred high overhead and suffered poor scalability.
 The code for the lock-based implementation is shown in
 \cref{lst:count:Per-Thread Statistical Counters} on
-Page~\pageref{lst:count:Per-Thread Statistical Counters}.
+\cpageref{lst:count:Per-Thread Statistical Counters}.
 
 \QuickQuiz{
 	Why on earth did we need that global lock in the first place?
@@ -198,7 +198,7 @@ references.
 
 \QuickQuiz{
 	Wow!
-	\cref{lst:together:RCU and Per-Thread Statistical Counters}
+	\Cref{lst:together:RCU and Per-Thread Statistical Counters}
 	contains 70 lines of code, compared to only 42 in
 	\cref{lst:count:Per-Thread Statistical Counters}.
 	Is this extra complexity really worth it?
diff --git a/together/refcnt.tex b/together/refcnt.tex
index ae8644e4..19ca6bb4 100644
--- a/together/refcnt.tex
+++ b/together/refcnt.tex
@@ -158,7 +158,7 @@ atomic counting with check and release memory barrier (``CAM'') is described in
 \cref{sec:together:Atomic Counting With Check and Release Memory Barrier}.
 Use of hazard pointers is described in
 \cref{sec:defer:Hazard Pointers}
-on page~\ref{sec:defer:Hazard Pointers}
+on \cpageref{sec:defer:Hazard Pointers}
 and in
 \cref{sec:together:Hazard-Pointer Helpers}.
 
@@ -175,7 +175,7 @@ of compiler optimizations.
 This is the method of choice when the lock is required to protect
 other operations in addition to the reference count, but where
 a reference to the object must be held after the lock is released.
-\cref{lst:together:Simple Reference-Count API} shows a simple
+\Cref{lst:together:Simple Reference-Count API} shows a simple
 API that might be used to implement simple non-atomic reference
 counting---although simple reference counting is almost always
 open-coded instead.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -perfbook 8/8] easy, future, appendix: Employ \cref{} and its variants
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
                   ` (6 preceding siblings ...)
  2021-05-18 12:24 ` [PATCH -perfbook 7/8] together, advsync, memorder: " Akira Yokosawa
@ 2021-05-18 12:25 ` Akira Yokosawa
  2021-05-18 18:30 ` [PATCH -perfbook 0/8] Employ cleveref macros, take four Paul E. McKenney
  8 siblings, 0 replies; 10+ messages in thread
From: Akira Yokosawa @ 2021-05-18 12:25 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

Note:

In toyrcu.tex, one "enumerate" list is converted to a "sequence"
list to allow \cref{} to print "step n".

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 ack.tex                              | 40 ++++++++++++++--------------
 appendix/questions/after.tex         |  8 +++---
 appendix/questions/removelocking.tex |  2 +-
 appendix/toyrcu/toyrcu.tex           | 10 +++----
 appendix/whymb/whymemorybarriers.tex |  2 +-
 easy/easy.tex                        |  6 ++---
 future/cpu.tex                       | 10 +++----
 future/formalregress.tex             |  2 +-
 future/htm.tex                       |  2 +-
 future/tm.tex                        |  2 +-
 glossary.tex                         |  2 +-
 summary.tex                          |  2 +-
 12 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/ack.tex b/ack.tex
index ce64d325..5ca46498 100644
--- a/ack.tex
+++ b/ack.tex
@@ -17,7 +17,7 @@
 
 Akira Yokosawa is this book's \LaTeX\ advisor, which perhaps most
 notably includes the care and feeding of the style guide laid out
-in Appendix~\ref{chp:app:styleguide:Style Guide}.
+in \cref{chp:app:styleguide:Style Guide}.
 This work includes table layout, listings, fonts, rendering of math,
 acronyms, bibliography formatting, epigraphs, hyperlinks, paper size.
 Akira also perfected the cross-referencing of quick quizzes, allowing
@@ -35,27 +35,27 @@ hyperlinked line-number references) from the source files.
 \section{Reviewers}
 
 \begin{itemize}
-\item	Alan Stern (\Cref{chp:Advanced Synchronization: Memory Ordering}).
-\item	Andy Whitcroft (\Cref{sec:defer:RCU Fundamentals},
-	\Cref{sec:defer:RCU Linux-Kernel API}).
-\item	Artem Bityutskiy (\Cref{chp:Advanced Synchronization: Memory Ordering},
-	\Cref{chp:app:whymb:Why Memory Barriers?}).
-\item	Dave Keck (\Cref{chp:app:whymb:Why Memory Barriers?}).
+\item	Alan Stern (\cref{chp:Advanced Synchronization: Memory Ordering}).
+\item	Andy Whitcroft (\cref{sec:defer:RCU Fundamentals},
+	\cref{sec:defer:RCU Linux-Kernel API}).
+\item	Artem Bityutskiy (\cref{chp:Advanced Synchronization: Memory Ordering},
+	\cref{chp:app:whymb:Why Memory Barriers?}).
+\item	Dave Keck (\cref{chp:app:whymb:Why Memory Barriers?}).
 \item	David S. Horner
-	(\Cref{sec:formal:Promela Parable: dynticks and Preemptible RCU}).
-\item	Gautham Shenoy (\Cref{sec:defer:RCU Fundamentals},
-	\Cref{sec:defer:RCU Linux-Kernel API}).
-\item	``jarkao2'', AKA LWN guest \#41960 (\Cref{sec:defer:RCU Linux-Kernel API}).
-\item	Jonathan Walpole (\Cref{sec:defer:RCU Linux-Kernel API}).
-\item	Josh Triplett (\Cref{chp:Formal Verification}).
-\item	Michael Factor (\Cref{sec:future:Transactional Memory}).
-\item	Mike Fulton (\Cref{sec:defer:RCU Fundamentals}).
+	(\cref{sec:formal:Promela Parable: dynticks and Preemptible RCU}).
+\item	Gautham Shenoy (\cref{sec:defer:RCU Fundamentals},
+	\cref{sec:defer:RCU Linux-Kernel API}).
+\item	``jarkao2'', AKA LWN guest \#41960 (\cref{sec:defer:RCU Linux-Kernel API}).
+\item	Jonathan Walpole (\cref{sec:defer:RCU Linux-Kernel API}).
+\item	Josh Triplett (\cref{chp:Formal Verification}).
+\item	Michael Factor (\cref{sec:future:Transactional Memory}).
+\item	Mike Fulton (\cref{sec:defer:RCU Fundamentals}).
 \item	Peter Zijlstra
-	(\Cref{sec:defer:RCU Usage}). % Lanin and Shasha citation.
-\item	Richard Woodruff (\Cref{chp:app:whymb:Why Memory Barriers?}).
-\item	Suparna Bhattacharya (\Cref{chp:Formal Verification}).
+	(\cref{sec:defer:RCU Usage}). % Lanin and Shasha citation.
+\item	Richard Woodruff (\cref{chp:app:whymb:Why Memory Barriers?}).
+\item	Suparna Bhattacharya (\cref{chp:Formal Verification}).
 \item	Vara Prasad
-	(\Cref{sec:formal:Promela Parable: dynticks and Preemptible RCU}).
+	(\cref{sec:formal:Promela Parable: dynticks and Preemptible RCU}).
 \end{itemize}
 
 Reviewers whose feedback took the extremely welcome form of a patch
@@ -97,7 +97,7 @@ Tony Breeds.
 
 \ListContributions
 
-Figure~\ref{fig:defer:RCU Areas of Applicability} was adapted from
+\Cref{fig:defer:RCU Areas of Applicability} was adapted from
 \ppl{Fedor}{Pikus}'s ``When to use RCU'' slide~\cite{FedorPikus2017RCUthenWhat}.
 The discussion of mechanical reference counters in
 \cref{sec:defer:Reference Counting}
diff --git a/appendix/questions/after.tex b/appendix/questions/after.tex
index b641417d..195179d9 100644
--- a/appendix/questions/after.tex
+++ b/appendix/questions/after.tex
@@ -92,16 +92,16 @@ instructions in that time.
 One possible reason is given by the following sequence of events:
 \begin{enumerate}
 \item	Consumer obtains timestamp
-	(\Cref{lst:app:questions:After Consumer Function},
+	(\cref{lst:app:questions:After Consumer Function},
 	\clnref{consumer:tod}).
 \item	Consumer is preempted.
 \item	An arbitrary amount of time passes.
 \item	Producer obtains timestamp
-	(\Cref{lst:app:questions:After Producer Function},
+	(\cref{lst:app:questions:After Producer Function},
 	\clnref{producer:tod}).
 \item	Consumer starts running again, and picks up the producer's
 	timestamp
-	(\Cref{lst:app:questions:After Consumer Function},
+	(\cref{lst:app:questions:After Consumer Function},
 	\clnref{consumer:prodtod}).
 \end{enumerate}
 
@@ -189,7 +189,7 @@ In summary, if you acquire an \IXh{exclusive}{lock}, you {\em know} that
 anything you do while holding that lock will appear to happen after
 anything done by any prior holder of that lock, at least give or
 take \IXacrl{tle}
-(see Section~\ref{sec:future:Semantic Differences}).
+(see \cref{sec:future:Semantic Differences}).
 No need to worry about which CPU did or did not execute a memory
 barrier, no need to worry about the CPU or compiler reordering
 operations---life is simple.
diff --git a/appendix/questions/removelocking.tex b/appendix/questions/removelocking.tex
index dae8355a..842ee28c 100644
--- a/appendix/questions/removelocking.tex
+++ b/appendix/questions/removelocking.tex
@@ -14,7 +14,7 @@ than its locked counterpart, a few of which are discussed in
 However, lockless algorithms are not guaranteed to perform and scale
 well, as shown by
 \cref{fig:count:Atomic Increment Scalability on x86} on
-page~\pageref{fig:count:Atomic Increment Scalability on x86}.
+\cpageref{fig:count:Atomic Increment Scalability on x86}.
 Furthermore, as a general rule, the more complex the algorithm,
 the greater the advantage of combining locking with selected
 lockless techniques, even with significant hardware support,
diff --git a/appendix/toyrcu/toyrcu.tex b/appendix/toyrcu/toyrcu.tex
index be77c045..9e5c01d5 100644
--- a/appendix/toyrcu/toyrcu.tex
+++ b/appendix/toyrcu/toyrcu.tex
@@ -161,7 +161,7 @@ in the next section.
 \section{Per-Thread Lock-Based RCU}
 \label{sec:app:toyrcu:Per-Thread Lock-Based RCU}
 
-\cref{lst:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
+\Cref{lst:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
 (\path{rcu_lock_percpu.h} and \path{rcu_lock_percpu.c})
 shows an implementation based on one lock per thread.
 The \co{rcu_read_lock()} and \co{rcu_read_unlock()} functions
@@ -550,7 +550,7 @@ checking of \co{rcu_refcnt}.
 	\begin{fcvref}[ln:defer:rcu_rcpg]
 	Both flips are absolutely required.
 	To see this, consider the following sequence of events:
-	\begin{enumerate}
+	\begin{sequence}
 	\item	\Clnref{r:lock:cur:b} of \co{rcu_read_lock()} in
 		\cref{lst:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
 		picks up \co{rcu_idx}, finding its value to be zero.
@@ -579,16 +579,16 @@ checking of \co{rcu_refcnt}.
 		\co{rcu_refcnt[0]}, not \co{rcu_refcnt[1]}!)
 		\label{sec:app:toyrcu:rcu_rcgp:RCU Grace Period Start}
 	\item	The grace period that started in
-		step~\ref{sec:app:toyrcu:rcu_rcgp:RCU Grace Period Start}
+		\cref{sec:app:toyrcu:rcu_rcgp:RCU Grace Period Start}
 		has been allowed to end, despite
 		the fact that the RCU read-side critical section
 		that started beforehand in
-		step~\ref{sec:app:toyrcu:rcu_rcgp:RCU Read Side Start}
+		\cref{sec:app:toyrcu:rcu_rcgp:RCU Read Side Start}
 		has not completed.
 		This violates RCU semantics, and could allow the update
 		to free a data element that the RCU read-side critical
 		section was still referencing.
-	\end{enumerate}
+	\end{sequence}
 
 	Exercise for the reader: What happens if \co{rcu_read_lock()}
 	is preempted for a very long time (hours!) just after
diff --git a/appendix/whymb/whymemorybarriers.tex b/appendix/whymb/whymemorybarriers.tex
index ea9fd14b..19fe4b4f 100644
--- a/appendix/whymb/whymemorybarriers.tex
+++ b/appendix/whymb/whymemorybarriers.tex
@@ -926,7 +926,7 @@ With this latter approach the sequence of operations might be as follows:
 \QuickQuiz{
 	After \cref{seq:app:whymb:Store buffers: All copies shared}
 	in \cref{sec:app:whymb:Store Buffers and Memory Barriers} on
-	page~\pageref{seq:app:whymb:Store buffers: All copies shared},
+	\cpageref{seq:app:whymb:Store buffers: All copies shared},
 	both CPUs might drop the cache line containing the new value of
 	``b''.
 	Wouldn't that cause this new value to be lost?
diff --git a/easy/easy.tex b/easy/easy.tex
index 1ac5b419..008db374 100644
--- a/easy/easy.tex
+++ b/easy/easy.tex
@@ -133,7 +133,7 @@ Linux kernel:
 	this point on the scale.
 	Many developers assume that this function has much
 	stronger ordering semantics than it actually possesses.
-	Chapter~\ref{chp:Advanced Synchronization: Memory Ordering} contains the
+	\Cref{chp:Advanced Synchronization: Memory Ordering} contains the
 	information needed to avoid this mistake, as does the
 	Linux-kernel source tree's \path{Documentation} and
 	\path{tools/memory-model} directories.
@@ -153,7 +153,7 @@ Linux kernel:
 	 {\emph{Alan J.~Perlis}}
 
 The set of useful programs resembles the Mandelbrot set
-(shown in Figure~\ref{fig:easy:Mandelbrot Set})
+(shown in \cref{fig:easy:Mandelbrot Set})
 in that it does
 not have a clear-cut smooth boundary---if it did, the halting problem
 would be solvable.
@@ -316,6 +316,6 @@ be worthwhile.
 
 Exceptions aside, we must continue to shave the software ``Mandelbrot
 set'' so that our programs remain maintainable, as shown in
-Figure~\ref{fig:easy:Shaving the Mandelbrot Set}.
+\cref{fig:easy:Shaving the Mandelbrot Set}.
 
 \QuickQuizAnswersChp{qqzeasy}
diff --git a/future/cpu.tex b/future/cpu.tex
index 205bf248..02d383da 100644
--- a/future/cpu.tex
+++ b/future/cpu.tex
@@ -53,15 +53,15 @@ With that in mind, consider the following scenarios:
 
 \begin{enumerate}
 \item	Uniprocessor \"Uber Alles
-	(\Cref{fig:future:Uniprocessor \"Uber Alles}),
+	(\cref{fig:future:Uniprocessor \"Uber Alles}),
 \item	Multithreaded Mania
-	(\Cref{fig:future:Multithreaded Mania}),
+	(\cref{fig:future:Multithreaded Mania}),
 \item	More of the Same
-	(\Cref{fig:future:More of the Same}), and
+	(\cref{fig:future:More of the Same}), and
 \item	Crash Dummies Slamming into the Memory Wall
-	(\Cref{fig:future:Crash Dummies Slamming into the Memory Wall}).
+	(\cref{fig:future:Crash Dummies Slamming into the Memory Wall}).
 \item	Astounding Accelerators
-	(\Cref{fig:future:Astounding Accelerators}).
+	(\cref{fig:future:Astounding Accelerators}).
 \end{enumerate}
 
 Each of these scenarios is covered in the following sections.
diff --git a/future/formalregress.tex b/future/formalregress.tex
index 91eee602..f7de68bd 100644
--- a/future/formalregress.tex
+++ b/future/formalregress.tex
@@ -254,7 +254,7 @@ is more than two orders of magnitude faster than emulation!
 
 \QuickQuiz{
 \begin{fcvref}[ln:future:formalregress:C-SB+l-o-o-u+l-o-o-u-C:whole]
-	Why bother with a separate \co{filter} command on line~\lnref{filter_} of
+	Why bother with a separate \co{filter} command on \clnref{filter_} of
 	\cref{lst:future:Emulating Locking with cmpxchg}
 	instead of just adding the condition to the \co{exists} clause?
 	And wouldn't it be simpler to use \co{xchg_acquire()} instead
diff --git a/future/htm.tex b/future/htm.tex
index 22e40c19..51ed3ca3 100644
--- a/future/htm.tex
+++ b/future/htm.tex
@@ -925,7 +925,7 @@ as discussed in the next section.
 Although it will likely be some time before HTM's area of applicability
 can be as crisply delineated as that shown for RCU in
 \cref{fig:defer:RCU Areas of Applicability} on
-page~\pageref{fig:defer:RCU Areas of Applicability}, that is no reason not to
+\cpageref{fig:defer:RCU Areas of Applicability}, that is no reason not to
 start moving in that direction.
 
 HTM seems best suited to update-heavy workloads involving relatively
diff --git a/future/tm.tex b/future/tm.tex
index 0755fafb..f9a4017a 100644
--- a/future/tm.tex
+++ b/future/tm.tex
@@ -994,7 +994,7 @@ internally~\cite{UCAM-CL-TR-579,KeirFraser2007withoutLocks,Gu:2019:PSE:3358807.3
 		needlessly throttle updates, as noted in their
 		Section~6.2.1.
 		See \cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
-		on Page~\pageref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
+		\cpageref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
 		of this book to see that the venerable asynchronous
 		\co{call_rcu()} primitive enables RCU to perform and
 		scale quite well with large numbers of updaters.
diff --git a/glossary.tex b/glossary.tex
index 9c566369..60993be2 100644
--- a/glossary.tex
+++ b/glossary.tex
@@ -73,7 +73,7 @@
 	In contrast, the memory consistency model for a given machine
 	describes the order in which loads and stores to groups of
 	variables will appear to occur.
-	See Section~\ref{sec:memorder:Cache Coherence}
+	See \cref{sec:memorder:Cache Coherence}
 	for more information.
 \item[\IX{Cache-Coherence Protocol}:]
 	A communications protocol, normally implemented in hardware,
diff --git a/summary.tex b/summary.tex
index ef2f903e..63187998 100644
--- a/summary.tex
+++ b/summary.tex
@@ -92,7 +92,7 @@ and parallel real-time computing.
 critically important topic of memory ordering, presenting techniques
 and tools to help you not only solve memory-ordering problems, but
 also to avoid them completely.
-\cref{chp:Ease of Use} presented a brief overview of the surprisingly
+\Cref{chp:Ease of Use} presented a brief overview of the surprisingly
 important topic of ease of use.
 
 Last, but definitely not least, \cref{chp:Conflicting Visions of the Future}
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH -perfbook 0/8] Employ cleveref macros, take four
  2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
                   ` (7 preceding siblings ...)
  2021-05-18 12:25 ` [PATCH -perfbook 8/8] easy, future, appendix: " Akira Yokosawa
@ 2021-05-18 18:30 ` Paul E. McKenney
  8 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2021-05-18 18:30 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Tue, May 18, 2021 at 09:13:41PM +0900, Akira Yokosawa wrote:
> Hi Paul,
> 
> This is (hopefully) the final round of \cref{} related updates.
> 
> Patches 1/8 and 2/8 are unrelated updates in build scripts.
> Patch 1/8 adds patterns to catch font specifiers in .svg
> figures recently added/updated.
> Patch 2/8 gets rid of noindentafter.sty in our repository.
> Recent distros have this package on their own.
> 
> Patches 3/8--8/8 are \cref{} related updates.

Applied and pushed, thank you!  And I have hereby run out of excuses
for avoiding doing several updates on my todo list.  ;-)

> I'm thinking of adding a few patterns to cleverefcheck.pl
> to catch indents by white spaces.

That sounds good -- automated error-checking can be quite helpful.

> At least, "sh ./utilities/cleverefcheck.sh" should pass
> after this patch set is applied.
 
 And it does!

> Note:
> 
>     styleguide.tex is not checked by the shell script.
>     I'll update it as far as I can, but it won't be
>     warning-free due to its nature.

Understood, and it makes sense for it to be unchecked.  It would be a
bit annoying for the script to complain about "don't do it this way"
examples in that file.  ;-)

							Thanx, Paul

>         Thanks, Akira
> 
> --
> Akira Yokosawa (8):
>   fixsvgfonts: Add pattern for 'sans-serif'
>   Omit noindentafter.sty
>   defer: Employ \cref{} and its variants, take three
>   datastruct: Employ \cref{} and its variants
>   debugging: Employ \cref{} and its variants
>   formal: Employ \cref{} and its variants
>   together, advsync, memorder: Employ \cref{} and its variants
>   easy, future, appendix: Employ \cref{} and its variants
> 
>  Makefile                             |   1 -
>  ack.tex                              |  40 +--
>  advsync/advsync.tex                  |   2 +-
>  advsync/rt.tex                       |   4 +-
>  appendix/questions/after.tex         |   8 +-
>  appendix/questions/removelocking.tex |   2 +-
>  appendix/toyrcu/toyrcu.tex           |  10 +-
>  appendix/whymb/whymemorybarriers.tex |   2 +-
>  datastruct/datastruct.tex            | 348 +++++++++++++--------------
>  debugging/debugging.tex              | 140 +++++------
>  defer/rcuapi.tex                     |  90 +++----
>  defer/rcuexercises.tex               |  12 +-
>  defer/rcuintro.tex                   |   2 +-
>  defer/rcurelated.tex                 |  10 +-
>  defer/rcuusage.tex                   | 181 +++++++-------
>  defer/updates.tex                    |   4 +-
>  defer/whichtochoose.tex              |  18 +-
>  easy/easy.tex                        |   6 +-
>  formal/axiomatic.tex                 |  90 +++----
>  formal/dyntickrcu.tex                |   2 +-
>  formal/formal.tex                    |   6 +-
>  formal/sat.tex                       |   2 +-
>  formal/spinhint.tex                  | 122 +++++-----
>  formal/stateless.tex                 |   2 +-
>  future/cpu.tex                       |  10 +-
>  future/formalregress.tex             |   2 +-
>  future/htm.tex                       |   2 +-
>  future/tm.tex                        |   2 +-
>  glossary.tex                         |   2 +-
>  memorder/memorder.tex                |  24 +-
>  noindentafter.sty                    | 194 ---------------
>  summary.tex                          |   2 +-
>  together/applyrcu.tex                |   4 +-
>  together/refcnt.tex                  |   4 +-
>  utilities/fixsvgfonts-urwps.sh       |   2 +
>  utilities/fixsvgfonts.sh             |   2 +
>  36 files changed, 581 insertions(+), 773 deletions(-)
>  delete mode 100644 noindentafter.sty
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-18 18:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-18 12:13 [PATCH -perfbook 0/8] Employ cleveref macros, take four Akira Yokosawa
2021-05-18 12:15 ` [PATCH -perfbook 1/8] fixsvgfonts: Add pattern for 'sans-serif' Akira Yokosawa
2021-05-18 12:19 ` [PATCH -perfbook 2/8] Omit noindentafter.sty Akira Yokosawa
2021-05-18 12:20 ` [PATCH -perfbook 3/8] defer: Employ \cref{} and its variants, take three Akira Yokosawa
2021-05-18 12:21 ` [PATCH -perfbook 4/8] datastruct: Employ \cref{} and its variants Akira Yokosawa
2021-05-18 12:22 ` [PATCH -perfbook 5/8] debugging: " Akira Yokosawa
2021-05-18 12:23 ` [PATCH -perfbook 6/8] formal: " Akira Yokosawa
2021-05-18 12:24 ` [PATCH -perfbook 7/8] together, advsync, memorder: " Akira Yokosawa
2021-05-18 12:25 ` [PATCH -perfbook 8/8] easy, future, appendix: " Akira Yokosawa
2021-05-18 18:30 ` [PATCH -perfbook 0/8] Employ cleveref macros, take four Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.