[PATCH 0/2] RFC: Automatic :c:func: annotation in the sphinx build

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] RFC: Automatic :c:func: annotation in the sphinx build
@ 2019-04-25 20:01 Jonathan Corbet
  2019-04-25 20:01 ` [PATCH 1/2] Docs: An initial automarkup extension for sphinx Jonathan Corbet
  2019-04-25 20:01 ` [PATCH 2/2] docs: remove :c:func: annotations from xarray.rst Jonathan Corbet
  0 siblings, 2 replies; 13+ messages in thread
From: Jonathan Corbet @ 2019-04-25 20:01 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Matthew Wilcox, Mauro Carvalho Chehab, Jonathan Corbet

There have been (justified) complaints that writing function() as
:c:func:`function()` in RST documents is obnoxious for both writers and
readers.  The following patch adds an extension to the sphinx build to
add this annotation automatically instead.  As is always the case,
recognizing "function()" is easy; recognizing the places where we should
*not* mangle things is harder.

Patch #2 removes all of the :c:func: annotations from xarray.rst - over
150 of them - to highlight the difference.  The only change in the output
is cross references added for a couple of functions that were not annotated
in the original version.

This adds some processing during the docs build, but my attempts to
measure the slowdown showed that it is less than the noise between
builds.

The output from a full docs build with these patches applied can be
seen at https://static.lwn.net/kerneldoc/.

Jonathan Corbet (2):
  Docs: An initial automarkup extension for sphinx
  docs: remove :c:func: annotations from xarray.rst

 Documentation/conf.py              |   3 +-
 Documentation/core-api/xarray.rst  | 270 ++++++++++++++---------------
 Documentation/sphinx/automarkup.py |  90 ++++++++++
 3 files changed, 227 insertions(+), 136 deletions(-)
 create mode 100644 Documentation/sphinx/automarkup.py

-- 
2.21.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-25 20:01 [PATCH 0/2] RFC: Automatic :c:func: annotation in the sphinx build Jonathan Corbet
@ 2019-04-25 20:01 ` Jonathan Corbet
  2019-04-26  9:06   ` Jani Nikula
  2019-04-26 18:32   ` Mauro Carvalho Chehab
  2019-04-25 20:01 ` [PATCH 2/2] docs: remove :c:func: annotations from xarray.rst Jonathan Corbet
  1 sibling, 2 replies; 13+ messages in thread
From: Jonathan Corbet @ 2019-04-25 20:01 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Matthew Wilcox, Mauro Carvalho Chehab, Jonathan Corbet

Rather than fill our text files with :c:func:`function()` syntax, just do
the markup via a hook into the sphinx build process.  As is always the
case, the real problem is detecting the situations where this markup should
*not* be done.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/conf.py              |  3 +-
 Documentation/sphinx/automarkup.py | 90 ++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/sphinx/automarkup.py

diff --git a/Documentation/conf.py b/Documentation/conf.py
index 72647a38b5c2..ba7b2846b1c5 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -34,7 +34,8 @@ needs_sphinx = '1.3'
 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
-extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
+extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
+              'kfigure', 'sphinx.ext.ifconfig', 'automarkup']
 
 # The name of the math extension changed on Sphinx 1.4
 if major == 1 and minor > 3:
diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
new file mode 100644
index 000000000000..c47469372bae
--- /dev/null
+++ b/Documentation/sphinx/automarkup.py
@@ -0,0 +1,90 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# This is a little Sphinx extension that tries to apply certain kinds
+# of markup automatically so we can keep it out of the text files
+# themselves.
+#
+# It's possible that this could be done better by hooking into the build
+# much later and traversing through the doctree.  That would eliminate the
+# need to duplicate some RST parsing and perhaps be less fragile, at the
+# cost of some more complexity and the need to generate the cross-reference
+# links ourselves.
+#
+# Copyright 2019 Jonathan Corbet <corbet@lwn.net>
+#
+from __future__ import print_function
+import re
+import sphinx
+
+#
+# Regex nastiness.  Of course.
+# Try to identify "function()" that's not already marked up some
+# other way.  Sphinx doesn't like a lot of stuff right after a
+# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
+# bit tries to restrict matches to things that won't create trouble.
+#
+RE_function = re.compile(r'(^|\s+)([\w\d_]+\(\))([.,/\s]|$)')
+#
+# Lines consisting of a single underline character.
+#
+RE_underline = re.compile(r'^([-=~])\1+$')
+#
+# Starting a literal block.
+#
+RE_literal = re.compile(r'^(\s*)(.*::\s*|\.\.\s+code-block::.*)$')
+#
+# Just get the white space beginning a line.
+#
+RE_whitesp = re.compile(r'^(\s*)')
+
+def MangleFile(app, docname, text):
+    ret = [ ]
+    previous = ''
+    literal = False
+    for line in text[0].split('\n'):
+        #
+        # See if we might be ending a literal block, as denoted by
+        # an indent no greater than when we started.
+        #
+        if literal and len(line) > 0:
+            m = RE_whitesp.match(line)  # Should always match
+            if len(m.group(1).expandtabs()) <= lit_indent:
+                literal = False
+        #
+        # Blank lines, directives, and lines within literal blocks
+        # should not be messed with.
+        #
+        if literal or len(line) == 0 or line[0] == '.':
+            ret.append(line)
+        #
+        # Is this an underline line?  If so, and it is the same length
+        # as the previous line, we may have mangled a heading line in
+        # error, so undo it.
+        #
+        elif RE_underline.match(line):
+            if len(line) == len(previous):
+                ret[-1] = previous
+            ret.append(line)
+        #
+        # Normal line - perform substitutions.
+        #
+        else:
+            ret.append(RE_function.sub(r'\1:c:func:`\2`\3', line))
+        #
+        # Might we be starting a literal block?  If so make note of
+        # the fact.
+        #
+        m = RE_literal.match(line)
+        if m:
+            literal = True
+            lit_indent = len(m.group(1).expandtabs())
+        previous = line
+    text[0] = '\n'.join(ret)
+
+def setup(app):
+    app.connect('source-read', MangleFile)
+
+    return dict(
+        parallel_read_safe = True,
+        parallel_write_safe = True
+    )
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] docs: remove :c:func: annotations from xarray.rst
  2019-04-25 20:01 [PATCH 0/2] RFC: Automatic :c:func: annotation in the sphinx build Jonathan Corbet
  2019-04-25 20:01 ` [PATCH 1/2] Docs: An initial automarkup extension for sphinx Jonathan Corbet
@ 2019-04-25 20:01 ` Jonathan Corbet
  2019-04-25 20:43   ` Matthew Wilcox
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Corbet @ 2019-04-25 20:01 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Matthew Wilcox, Mauro Carvalho Chehab, Jonathan Corbet

Now that the build system automatically marks up function references, we
don't have to clutter the source files, so take it out.

[Some paragraphs could now benefit from refilling, but that was left out to
avoid obscuring the real changes.]

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/xarray.rst | 270 +++++++++++++++---------------
 1 file changed, 135 insertions(+), 135 deletions(-)

diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst
index ef6f9f98f595..fcedc5349ace 100644
--- a/Documentation/core-api/xarray.rst
+++ b/Documentation/core-api/xarray.rst
@@ -30,27 +30,27 @@ it called marks.  Each mark may be set or cleared independently of
 the others.  You can iterate over entries which are marked.
 
 Normal pointers may be stored in the XArray directly.  They must be 4-byte
-aligned, which is true for any pointer returned from :c:func:`kmalloc` and
-:c:func:`alloc_page`.  It isn't true for arbitrary user-space pointers,
+aligned, which is true for any pointer returned from kmalloc() and
+alloc_page().  It isn't true for arbitrary user-space pointers,
 nor for function pointers.  You can store pointers to statically allocated
 objects, as long as those objects have an alignment of at least 4.
 
 You can also store integers between 0 and ``LONG_MAX`` in the XArray.
-You must first convert it into an entry using :c:func:`xa_mk_value`.
+You must first convert it into an entry using xa_mk_value().
 When you retrieve an entry from the XArray, you can check whether it is
-a value entry by calling :c:func:`xa_is_value`, and convert it back to
-an integer by calling :c:func:`xa_to_value`.
+a value entry by calling xa_is_value(), and convert it back to
+an integer by calling xa_to_value().
 
 Some users want to store tagged pointers instead of using the marks
-described above.  They can call :c:func:`xa_tag_pointer` to create an
-entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry
-back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve
+described above.  They can call xa_tag_pointer() to create an
+entry with a tag, xa_untag_pointer() to turn a tagged entry
+back into an untagged pointer and xa_pointer_tag() to retrieve
 the tag of an entry.  Tagged pointers use the same bits that are used
 to distinguish value entries from normal pointers, so each user must
 decide whether they want to store value entries or tagged pointers in
 any particular XArray.
 
-The XArray does not support storing :c:func:`IS_ERR` pointers as some
+The XArray does not support storing IS_ERR() pointers as some
 conflict with value entries or internal entries.
 
 An unusual feature of the XArray is the ability to create entries which
@@ -64,89 +64,89 @@ entry will cause the XArray to forget about the range.
 Normal API
 ==========
 
-Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
-for statically allocated XArrays or :c:func:`xa_init` for dynamically
+Start by initialising an XArray, either with DEFINE_XARRAY()
+for statically allocated XArrays or xa_init() for dynamically
 allocated ones.  A freshly-initialised XArray contains a ``NULL``
 pointer at every index.
 
-You can then set entries using :c:func:`xa_store` and get entries
-using :c:func:`xa_load`.  xa_store will overwrite any entry with the
+You can then set entries using xa_store() and get entries
+using xa_load().  xa_store will overwrite any entry with the
 new entry and return the previous entry stored at that index.  You can
-use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a
+use xa_erase() instead of calling xa_store() with a
 ``NULL`` entry.  There is no difference between an entry that has never
 been stored to, one that has been erased and one that has most recently
 had ``NULL`` stored to it.
 
 You can conditionally replace an entry at an index by using
-:c:func:`xa_cmpxchg`.  Like :c:func:`cmpxchg`, it will only succeed if
+xa_cmpxchg().  Like cmpxchg(), it will only succeed if
 the entry at that index has the 'old' value.  It also returns the entry
 which was at that index; if it returns the same entry which was passed as
-'old', then :c:func:`xa_cmpxchg` succeeded.
+'old', then xa_cmpxchg() succeeded.
 
 If you want to only store a new entry to an index if the current entry
-at that index is ``NULL``, you can use :c:func:`xa_insert` which
+at that index is ``NULL``, you can use xa_insert() which
 returns ``-EBUSY`` if the entry is not empty.
 
 You can enquire whether a mark is set on an entry by using
-:c:func:`xa_get_mark`.  If the entry is not ``NULL``, you can set a mark
-on it by using :c:func:`xa_set_mark` and remove the mark from an entry by
-calling :c:func:`xa_clear_mark`.  You can ask whether any entry in the
-XArray has a particular mark set by calling :c:func:`xa_marked`.
+xa_get_mark().  If the entry is not ``NULL``, you can set a mark
+on it by using xa_set_mark() and remove the mark from an entry by
+calling xa_clear_mark().  You can ask whether any entry in the
+XArray has a particular mark set by calling xa_marked().
 
 You can copy entries out of the XArray into a plain array by calling
-:c:func:`xa_extract`.  Or you can iterate over the present entries in
-the XArray by calling :c:func:`xa_for_each`.  You may prefer to use
-:c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present
+xa_extract().  Or you can iterate over the present entries in
+the XArray by calling xa_for_each().  You may prefer to use
+xa_find() or xa_find_after() to move to the next present
 entry in the XArray.
 
-Calling :c:func:`xa_store_range` stores the same entry in a range
+Calling xa_store_range() stores the same entry in a range
 of indices.  If you do this, some of the other operations will behave
 in a slightly odd way.  For example, marking the entry at one index
 may result in the entry being marked at some, but not all of the other
 indices.  Storing into one index may result in the entry retrieved by
 some, but not all of the other indices changing.
 
-Sometimes you need to ensure that a subsequent call to :c:func:`xa_store`
-will not need to allocate memory.  The :c:func:`xa_reserve` function
+Sometimes you need to ensure that a subsequent call to xa_store()
+will not need to allocate memory.  The xa_reserve() function
 will store a reserved entry at the indicated index.  Users of the
 normal API will see this entry as containing ``NULL``.  If you do
-not need to use the reserved entry, you can call :c:func:`xa_release`
+not need to use the reserved entry, you can call xa_release()
 to remove the unused entry.  If another user has stored to the entry
-in the meantime, :c:func:`xa_release` will do nothing; if instead you
-want the entry to become ``NULL``, you should use :c:func:`xa_erase`.
-Using :c:func:`xa_insert` on a reserved entry will fail.
+in the meantime, xa_release() will do nothing; if instead you
+want the entry to become ``NULL``, you should use xa_erase().
+Using xa_insert() on a reserved entry will fail.
 
-If all entries in the array are ``NULL``, the :c:func:`xa_empty` function
+If all entries in the array are ``NULL``, the xa_empty() function
 will return ``true``.
 
 Finally, you can remove all entries from an XArray by calling
-:c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
+xa_destroy().  If the XArray entries are pointers, you may wish
 to free the entries first.  You can do this by iterating over all present
-entries in the XArray using the :c:func:`xa_for_each` iterator.
+entries in the XArray using the xa_for_each() iterator.
 
 Allocating XArrays
 ------------------
 
-If you use :c:func:`DEFINE_XARRAY_ALLOC` to define the XArray, or
-initialise it by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`,
+If you use DEFINE_XARRAY_ALLOC() to define the XArray, or
+initialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(),
 the XArray changes to track whether entries are in use or not.
 
-You can call :c:func:`xa_alloc` to store the entry at an unused index
+You can call xa_alloc() to store the entry at an unused index
 in the XArray.  If you need to modify the array from interrupt context,
-you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable
+you can use xa_alloc_bh() or xa_alloc_irq() to disable
 interrupts while allocating the ID.
 
-Using :c:func:`xa_store`, :c:func:`xa_cmpxchg` or :c:func:`xa_insert` will
+Using xa_store(), xa_cmpxchg() or xa_insert() will
 also mark the entry as being allocated.  Unlike a normal XArray, storing
-``NULL`` will mark the entry as being in use, like :c:func:`xa_reserve`.
-To free an entry, use :c:func:`xa_erase` (or :c:func:`xa_release` if
+``NULL`` will mark the entry as being in use, like xa_reserve().
+To free an entry, use xa_erase() (or xa_release() if
 you only want to free the entry if it's ``NULL``).
 
 By default, the lowest free entry is allocated starting from 0.  If you
 want to allocate entries starting at 1, it is more efficient to use
-:c:func:`DEFINE_XARRAY_ALLOC1` or ``XA_FLAGS_ALLOC1``.  If you want to
+DEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``.  If you want to
 allocate IDs up to a maximum, then wrap back around to the lowest free
-ID, you can use :c:func:`xa_alloc_cyclic`.
+ID, you can use xa_alloc_cyclic().
 
 You cannot use ``XA_MARK_0`` with an allocating XArray as this mark
 is used to track whether an entry is free or not.  The other marks are
@@ -155,17 +155,17 @@ available for your use.
 Memory allocation
 -----------------
 
-The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`,
-:c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t
+The xa_store(), xa_cmpxchg(), xa_alloc(),
+xa_reserve() and xa_insert() functions take a gfp_t
 parameter in case the XArray needs to allocate memory to store this entry.
 If the entry is being deleted, no memory allocation needs to be performed,
 and the GFP flags specified will be ignored.
 
 It is possible for no memory to be allocatable, particularly if you pass
 a restrictive set of GFP flags.  In that case, the functions return a
-special value which can be turned into an errno using :c:func:`xa_err`.
+special value which can be turned into an errno using xa_err().
 If you don't need to know exactly which error occurred, using
-:c:func:`xa_is_err` is slightly more efficient.
+xa_is_err() is slightly more efficient.
 
 Locking
 -------
@@ -174,54 +174,54 @@ When using the Normal API, you do not have to worry about locking.
 The XArray uses RCU and an internal spinlock to synchronise access:
 
 No lock needed:
- * :c:func:`xa_empty`
- * :c:func:`xa_marked`
+ * xa_empty()
+ * xa_marked()
 
 Takes RCU read lock:
- * :c:func:`xa_load`
- * :c:func:`xa_for_each`
- * :c:func:`xa_find`
- * :c:func:`xa_find_after`
- * :c:func:`xa_extract`
- * :c:func:`xa_get_mark`
+ * xa_load()
+ * xa_for_each()
+ * xa_find()
+ * xa_find_after()
+ * xa_extract()
+ * xa_get_mark()
 
 Takes xa_lock internally:
- * :c:func:`xa_store`
- * :c:func:`xa_store_bh`
- * :c:func:`xa_store_irq`
- * :c:func:`xa_insert`
- * :c:func:`xa_insert_bh`
- * :c:func:`xa_insert_irq`
- * :c:func:`xa_erase`
- * :c:func:`xa_erase_bh`
- * :c:func:`xa_erase_irq`
- * :c:func:`xa_cmpxchg`
- * :c:func:`xa_cmpxchg_bh`
- * :c:func:`xa_cmpxchg_irq`
- * :c:func:`xa_store_range`
- * :c:func:`xa_alloc`
- * :c:func:`xa_alloc_bh`
- * :c:func:`xa_alloc_irq`
- * :c:func:`xa_reserve`
- * :c:func:`xa_reserve_bh`
- * :c:func:`xa_reserve_irq`
- * :c:func:`xa_destroy`
- * :c:func:`xa_set_mark`
- * :c:func:`xa_clear_mark`
+ * xa_store()
+ * xa_store_bh()
+ * xa_store_irq()
+ * xa_insert()
+ * xa_insert_bh()
+ * xa_insert_irq()
+ * xa_erase()
+ * xa_erase_bh()
+ * xa_erase_irq()
+ * xa_cmpxchg()
+ * xa_cmpxchg_bh()
+ * xa_cmpxchg_irq()
+ * xa_store_range()
+ * xa_alloc()
+ * xa_alloc_bh()
+ * xa_alloc_irq()
+ * xa_reserve()
+ * xa_reserve_bh()
+ * xa_reserve_irq()
+ * xa_destroy()
+ * xa_set_mark()
+ * xa_clear_mark()
 
 Assumes xa_lock held on entry:
- * :c:func:`__xa_store`
- * :c:func:`__xa_insert`
- * :c:func:`__xa_erase`
- * :c:func:`__xa_cmpxchg`
- * :c:func:`__xa_alloc`
- * :c:func:`__xa_set_mark`
- * :c:func:`__xa_clear_mark`
+ * __xa_store()
+ * __xa_insert()
+ * __xa_erase()
+ * __xa_cmpxchg()
+ * __xa_alloc()
+ * __xa_set_mark()
+ * __xa_clear_mark()
 
 If you want to take advantage of the lock to protect the data structures
-that you are storing in the XArray, you can call :c:func:`xa_lock`
-before calling :c:func:`xa_load`, then take a reference count on the
-object you have found before calling :c:func:`xa_unlock`.  This will
+that you are storing in the XArray, you can call xa_lock()
+before calling xa_load(), then take a reference count on the
+object you have found before calling xa_unlock().  This will
 prevent stores from removing the object from the array between looking
 up the object and incrementing the refcount.  You can also use RCU to
 avoid dereferencing freed memory, but an explanation of that is beyond
@@ -261,7 +261,7 @@ context and then erase them in softirq context, you can do that this way::
     }
 
 If you are going to modify the XArray from interrupt or softirq context,
-you need to initialise the array using :c:func:`xa_init_flags`, passing
+you need to initialise the array using xa_init_flags(), passing
 ``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
 
 The above example also shows a common pattern of wanting to extend the
@@ -269,20 +269,20 @@ coverage of the xa_lock on the store side to protect some statistics
 associated with the array.
 
 Sharing the XArray with interrupt context is also possible, either
-using :c:func:`xa_lock_irqsave` in both the interrupt handler and process
-context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock`
+using xa_lock_irqsave() in both the interrupt handler and process
+context, or xa_lock_irq() in process context and xa_lock()
 in the interrupt handler.  Some of the more common patterns have helper
-functions such as :c:func:`xa_store_bh`, :c:func:`xa_store_irq`,
-:c:func:`xa_erase_bh`, :c:func:`xa_erase_irq`, :c:func:`xa_cmpxchg_bh`
-and :c:func:`xa_cmpxchg_irq`.
+functions such as xa_store_bh(), xa_store_irq(),
+xa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh()
+and xa_cmpxchg_irq().
 
 Sometimes you need to protect access to the XArray with a mutex because
 that lock sits above another mutex in the locking hierarchy.  That does
-not entitle you to use functions like :c:func:`__xa_erase` without taking
+not entitle you to use functions like __xa_erase() without taking
 the xa_lock; the xa_lock is used for lockdep validation and will be used
 for other purposes in the future.
 
-The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also
+The __xa_set_mark() and __xa_clear_mark() functions are also
 available for situations where you look up an entry and want to atomically
 set or clear a mark.  It may be more efficient to use the advanced API
 in this case, as it will save you from walking the tree twice.
@@ -300,27 +300,27 @@ indeed the normal API is implemented in terms of the advanced API.  The
 advanced API is only available to modules with a GPL-compatible license.
 
 The advanced API is based around the xa_state.  This is an opaque data
-structure which you declare on the stack using the :c:func:`XA_STATE`
+structure which you declare on the stack using the XA_STATE()
 macro.  This macro initialises the xa_state ready to start walking
 around the XArray.  It is used as a cursor to maintain the position
 in the XArray and let you compose various operations together without
 having to restart from the top every time.
 
 The xa_state is also used to store errors.  You can call
-:c:func:`xas_error` to retrieve the error.  All operations check whether
+xas_error() to retrieve the error.  All operations check whether
 the xa_state is in an error state before proceeding, so there's no need
 for you to check for an error after each call; you can make multiple
 calls in succession and only check at a convenient point.  The only
 errors currently generated by the XArray code itself are ``ENOMEM`` and
 ``EINVAL``, but it supports arbitrary errors in case you want to call
-:c:func:`xas_set_err` yourself.
+xas_set_err() yourself.
 
-If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem`
+If the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
 will attempt to allocate more memory using the specified gfp flags and
 cache it in the xa_state for the next attempt.  The idea is that you take
 the xa_lock, attempt the operation and drop the lock.  The operation
 attempts to allocate memory while holding the lock, but it is more
-likely to fail.  Once you have dropped the lock, :c:func:`xas_nomem`
+likely to fail.  Once you have dropped the lock, xas_nomem()
 can try harder to allocate more memory.  It will return ``true`` if it
 is worth retrying the operation (i.e. that there was a memory error *and*
 more memory was allocated).  If it has previously allocated memory, and
@@ -333,7 +333,7 @@ Internal Entries
 The XArray reserves some entries for its own purposes.  These are never
 exposed through the normal API, but when using the advanced API, it's
 possible to see them.  Usually the best way to handle them is to pass them
-to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
+to xas_retry(), and retry the operation if it returns ``true``.
 
 .. flat-table::
    :widths: 1 1 6
@@ -343,89 +343,89 @@ to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
      - Usage
 
    * - Node
-     - :c:func:`xa_is_node`
+     - xa_is_node()
      - An XArray node.  May be visible when using a multi-index xa_state.
 
    * - Sibling
-     - :c:func:`xa_is_sibling`
+     - xa_is_sibling()
      - A non-canonical entry for a multi-index entry.  The value indicates
        which slot in this node has the canonical entry.
 
    * - Retry
-     - :c:func:`xa_is_retry`
+     - xa_is_retry()
      - This entry is currently being modified by a thread which has the
        xa_lock.  The node containing this entry may be freed at the end
        of this RCU period.  You should restart the lookup from the head
        of the array.
 
    * - Zero
-     - :c:func:`xa_is_zero`
+     - xa_is_zero()
      - Zero entries appear as ``NULL`` through the Normal API, but occupy
        an entry in the XArray which can be used to reserve the index for
        future use.  This is used by allocating XArrays for allocated entries
        which are ``NULL``.
 
 Other internal entries may be added in the future.  As far as possible, they
-will be handled by :c:func:`xas_retry`.
+will be handled by xas_retry().
 
 Additional functionality
 ------------------------
 
-The :c:func:`xas_create_range` function allocates all the necessary memory
+The xas_create_range() function allocates all the necessary memory
 to store every entry in a range.  It will set ENOMEM in the xa_state if
 it cannot allocate memory.
 
-You can use :c:func:`xas_init_marks` to reset the marks on an entry
+You can use xas_init_marks() to reset the marks on an entry
 to their default state.  This is usually all marks clear, unless the
 XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
 and all other marks are clear.  Replacing one entry with another using
-:c:func:`xas_store` will not reset the marks on that entry; if you want
+xas_store() will not reset the marks on that entry; if you want
 the marks reset, you should do that explicitly.
 
-The :c:func:`xas_load` will walk the xa_state as close to the entry
+The xas_load() will walk the xa_state as close to the entry
 as it can.  If you know the xa_state has already been walked to the
 entry and need to check that the entry hasn't changed, you can use
-:c:func:`xas_reload` to save a function call.
+xas_reload() to save a function call.
 
 If you need to move to a different index in the XArray, call
-:c:func:`xas_set`.  This resets the cursor to the top of the tree, which
+xas_set().  This resets the cursor to the top of the tree, which
 will generally make the next operation walk the cursor to the desired
 spot in the tree.  If you want to move to the next or previous index,
-call :c:func:`xas_next` or :c:func:`xas_prev`.  Setting the index does
+call xas_next() or xas_prev().  Setting the index does
 not walk the cursor around the array so does not require a lock to be
 held, while moving to the next or previous index does.
 
-You can search for the next present entry using :c:func:`xas_find`.  This
-is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`;
+You can search for the next present entry using xas_find().  This
+is the equivalent of both xa_find() and xa_find_after();
 if the cursor has been walked to an entry, then it will find the next
 entry after the one currently referenced.  If not, it will return the
-entry at the index of the xa_state.  Using :c:func:`xas_next_entry` to
-move to the next present entry instead of :c:func:`xas_find` will save
+entry at the index of the xa_state.  Using xas_next_entry() to
+move to the next present entry instead of xas_find() will save
 a function call in the majority of cases at the expense of emitting more
 inline code.
 
-The :c:func:`xas_find_marked` function is similar.  If the xa_state has
+The xas_find_marked() function is similar.  If the xa_state has
 not been walked, it will return the entry at the index of the xa_state,
 if it is marked.  Otherwise, it will return the first marked entry after
-the entry referenced by the xa_state.  The :c:func:`xas_next_marked`
-function is the equivalent of :c:func:`xas_next_entry`.
+the entry referenced by the xa_state.  The xas_next_marked()
+function is the equivalent of xas_next_entry().
 
-When iterating over a range of the XArray using :c:func:`xas_for_each`
-or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop
-the iteration.  The :c:func:`xas_pause` function exists for this purpose.
+When iterating over a range of the XArray using xas_for_each()
+or xas_for_each_marked(), it may be necessary to temporarily stop
+the iteration.  The xas_pause() function exists for this purpose.
 After you have done the necessary work and wish to resume, the xa_state
 is in an appropriate state to continue the iteration after the entry
 you last processed.  If you have interrupts disabled while iterating,
 then it is good manners to pause the iteration and reenable interrupts
 every ``XA_CHECK_SCHED`` entries.
 
-The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and
-:c:func:`xas_clear_mark` functions require the xa_state cursor to have
+The xas_get_mark(), xas_set_mark() and
+xas_clear_mark() functions require the xa_state cursor to have
 been moved to the appropriate location in the xarray; they will do
-nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set`
+nothing if you have called xas_pause() or xas_set()
 immediately before.
 
-You can call :c:func:`xas_set_update` to have a callback function
+You can call xas_set_update() to have a callback function
 called each time the XArray updates a node.  This is used by the page
 cache workingset code to maintain its list of nodes which contain only
 shadow entries.
@@ -443,25 +443,25 @@ eg indices 64-127 may be tied together, but 2-6 may not be.  This may
 save substantial quantities of memory; for example tying 512 entries
 together will save over 4kB.
 
-You can create a multi-index entry by using :c:func:`XA_STATE_ORDER`
-or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`.
-Calling :c:func:`xas_load` with a multi-index xa_state will walk the
+You can create a multi-index entry by using XA_STATE_ORDER()
+or xas_set_order() followed by a call to xas_store().
+Calling xas_load() with a multi-index xa_state will walk the
 xa_state to the right location in the tree, but the return value is not
 meaningful, potentially being an internal entry or ``NULL`` even when there
-is an entry stored within the range.  Calling :c:func:`xas_find_conflict`
+is an entry stored within the range.  Calling xas_find_conflict()
 will return the first entry within the range or ``NULL`` if there are no
-entries in the range.  The :c:func:`xas_for_each_conflict` iterator will
+entries in the range.  The xas_for_each_conflict() iterator will
 iterate over every entry which overlaps the specified range.
 
-If :c:func:`xas_load` encounters a multi-index entry, the xa_index
+If xas_load() encounters a multi-index entry, the xa_index
 in the xa_state will not be changed.  When iterating over an XArray
-or calling :c:func:`xas_find`, if the initial index is in the middle
+or calling xas_find(), if the initial index is in the middle
 of a multi-index entry, it will not be altered.  Subsequent calls
 or iterations will move the index to the first index in the range.
 Each entry will only be returned once, no matter how many indices it
 occupies.
 
-Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state
+Using xas_next() or xas_prev() with a multi-index xa_state
 is not supported.  Using either of these functions on a multi-index entry
 will reveal sibling entries; these should be skipped over by the caller.
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] docs: remove :c:func: annotations from xarray.rst
  2019-04-25 20:01 ` [PATCH 2/2] docs: remove :c:func: annotations from xarray.rst Jonathan Corbet
@ 2019-04-25 20:43   ` Matthew Wilcox
  0 siblings, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2019-04-25 20:43 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Mauro Carvalho Chehab

On Thu, Apr 25, 2019 at 02:01:25PM -0600, Jonathan Corbet wrote:
> Now that the build system automatically marks up function references, we
> don't have to clutter the source files, so take it out.

Well, that reads a lot more nicely and makes it much easier to write!

Acked-by: Matthew Wilcox <willy@infradead.org>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-25 20:01 ` [PATCH 1/2] Docs: An initial automarkup extension for sphinx Jonathan Corbet
@ 2019-04-26  9:06   ` Jani Nikula
  2019-04-26 11:31     ` Markus Heiser
  2019-04-26 16:52     ` Jonathan Corbet
  2019-04-26 18:32   ` Mauro Carvalho Chehab
  1 sibling, 2 replies; 13+ messages in thread
From: Jani Nikula @ 2019-04-26  9:06 UTC (permalink / raw)
  To: Jonathan Corbet, linux-doc
  Cc: linux-kernel, Matthew Wilcox, Mauro Carvalho Chehab, Jonathan Corbet

On Thu, 25 Apr 2019, Jonathan Corbet <corbet@lwn.net> wrote:
> Rather than fill our text files with :c:func:`function()` syntax, just do
> the markup via a hook into the sphinx build process.  As is always the
> case, the real problem is detecting the situations where this markup should
> *not* be done.

This is basically a regex based pre-processing step in front of Sphinx,
but it's not independent as it embeds a limited understanding/parsing of
reStructuredText syntax. This is similar to what we do in kernel-doc the
Perl monster, except slightly different.

I understand the motivation, and I sympathize with the idea of a quick
regex hack to silence the mob. But I fear this will lead to hard to
solve corner cases and the same style of "impedance mismatches" we had
with the kernel-doc/docproc/docbook Rube Goldberg machine of the past.

It's more involved, but I think the better place to do this (as well as
the kernel-doc transformations) would be in the doctree-read event,
after the rst parsing is done. You can traverse the doctree and find the
places which weren't special for Sphinx, and replace the plain text
nodes in-place. I've toyed with this in the past, but alas I didn't have
(and still don't) have the time to finish the job. There were some
unresolved issues with e.g. replacing nodes that had syntax highlighting
(because I wanted to make the references work also within preformatted
blocks).

If you decide to go with regex anyway, I'd at least consider pulling the
transformations/highlights from kernel-doc the script to the Sphinx
extension, and use the exact same transformations for stuff in source
code comments and rst files.

BR,
Jani.

>
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/conf.py              |  3 +-
>  Documentation/sphinx/automarkup.py | 90 ++++++++++++++++++++++++++++++
>  2 files changed, 92 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/sphinx/automarkup.py
>
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 72647a38b5c2..ba7b2846b1c5 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -34,7 +34,8 @@ needs_sphinx = '1.3'
>  # Add any Sphinx extension module names here, as strings. They can be
>  # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
>  # ones.
> -extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
> +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
> +              'kfigure', 'sphinx.ext.ifconfig', 'automarkup']
>  
>  # The name of the math extension changed on Sphinx 1.4
>  if major == 1 and minor > 3:
> diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> new file mode 100644
> index 000000000000..c47469372bae
> --- /dev/null
> +++ b/Documentation/sphinx/automarkup.py
> @@ -0,0 +1,90 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# This is a little Sphinx extension that tries to apply certain kinds
> +# of markup automatically so we can keep it out of the text files
> +# themselves.
> +#
> +# It's possible that this could be done better by hooking into the build
> +# much later and traversing through the doctree.  That would eliminate the
> +# need to duplicate some RST parsing and perhaps be less fragile, at the
> +# cost of some more complexity and the need to generate the cross-reference
> +# links ourselves.
> +#
> +# Copyright 2019 Jonathan Corbet <corbet@lwn.net>
> +#
> +from __future__ import print_function
> +import re
> +import sphinx
> +
> +#
> +# Regex nastiness.  Of course.
> +# Try to identify "function()" that's not already marked up some
> +# other way.  Sphinx doesn't like a lot of stuff right after a
> +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> +# bit tries to restrict matches to things that won't create trouble.
> +#
> +RE_function = re.compile(r'(^|\s+)([\w\d_]+\(\))([.,/\s]|$)')
> +#
> +# Lines consisting of a single underline character.
> +#
> +RE_underline = re.compile(r'^([-=~])\1+$')
> +#
> +# Starting a literal block.
> +#
> +RE_literal = re.compile(r'^(\s*)(.*::\s*|\.\.\s+code-block::.*)$')
> +#
> +# Just get the white space beginning a line.
> +#
> +RE_whitesp = re.compile(r'^(\s*)')
> +
> +def MangleFile(app, docname, text):
> +    ret = [ ]
> +    previous = ''
> +    literal = False
> +    for line in text[0].split('\n'):
> +        #
> +        # See if we might be ending a literal block, as denoted by
> +        # an indent no greater than when we started.
> +        #
> +        if literal and len(line) > 0:
> +            m = RE_whitesp.match(line)  # Should always match
> +            if len(m.group(1).expandtabs()) <= lit_indent:
> +                literal = False
> +        #
> +        # Blank lines, directives, and lines within literal blocks
> +        # should not be messed with.
> +        #
> +        if literal or len(line) == 0 or line[0] == '.':
> +            ret.append(line)
> +        #
> +        # Is this an underline line?  If so, and it is the same length
> +        # as the previous line, we may have mangled a heading line in
> +        # error, so undo it.
> +        #
> +        elif RE_underline.match(line):
> +            if len(line) == len(previous):
> +                ret[-1] = previous
> +            ret.append(line)
> +        #
> +        # Normal line - perform substitutions.
> +        #
> +        else:
> +            ret.append(RE_function.sub(r'\1:c:func:`\2`\3', line))
> +        #
> +        # Might we be starting a literal block?  If so make note of
> +        # the fact.
> +        #
> +        m = RE_literal.match(line)
> +        if m:
> +            literal = True
> +            lit_indent = len(m.group(1).expandtabs())
> +        previous = line
> +    text[0] = '\n'.join(ret)
> +
> +def setup(app):
> +    app.connect('source-read', MangleFile)
> +
> +    return dict(
> +        parallel_read_safe = True,
> +        parallel_write_safe = True
> +    )

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-26  9:06   ` Jani Nikula
@ 2019-04-26 11:31     ` Markus Heiser
  2019-04-26 16:52     ` Jonathan Corbet
  1 sibling, 0 replies; 13+ messages in thread
From: Markus Heiser @ 2019-04-26 11:31 UTC (permalink / raw)
  To: Jani Nikula, Jonathan Corbet, linux-doc
  Cc: linux-kernel, Matthew Wilcox, Mauro Carvalho Chehab

Am 26.04.19 um 11:06 schrieb Jani Nikula:
> On Thu, 25 Apr 2019, Jonathan Corbet<corbet@lwn.net>  wrote:
>> Rather than fill our text files with :c:func:`function()` syntax, just do
>> the markup via a hook into the sphinx build process.  As is always the
>> case, the real problem is detecting the situations where this markup should
>> *not*  be done.
> This is basically a regex based pre-processing step in front of Sphinx,
> but it's not independent as it embeds a limited understanding/parsing of
> reStructuredText syntax. This is similar to what we do in kernel-doc the
> Perl monster, except slightly different.
> 
> I understand the motivation, and I sympathize with the idea of a quick
> regex hack to silence the mob. But I fear this will lead to hard to
> solve corner cases and the same style of "impedance mismatches" we had
> with the kernel-doc/docproc/docbook Rube Goldberg machine of the past.
> 
> It's more involved, but I think the better place to do this (as well as
> the kernel-doc transformations) would be in the doctree-read event,
> after the rst parsing is done. You can traverse the doctree and find the
> places which weren't special for Sphinx, and replace the plain text
> nodes in-place. I've toyed with this in the past, but alas I didn't have
> (and still don't) have the time to finish the job. There were some
> unresolved issues with e.g. replacing nodes that had syntax highlighting
> (because I wanted to make the references work also within preformatted
> blocks).
> 
> If you decide to go with regex anyway, I'd at least consider pulling the
> transformations/highlights from kernel-doc the script to the Sphinx
> extension, and use the exact same transformations for stuff in source
> code comments and rst files.

FWIW mentioning: I fully agree with Jan.

   -- Markus --

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-26  9:06   ` Jani Nikula
  2019-04-26 11:31     ` Markus Heiser
@ 2019-04-26 16:52     ` Jonathan Corbet
  2019-04-26 17:54       ` Mauro Carvalho Chehab
  2019-04-26 18:58       ` Jani Nikula
  1 sibling, 2 replies; 13+ messages in thread
From: Jonathan Corbet @ 2019-04-26 16:52 UTC (permalink / raw)
  To: Jani Nikula
  Cc: linux-doc, linux-kernel, Matthew Wilcox, Mauro Carvalho Chehab

On Fri, 26 Apr 2019 12:06:42 +0300
Jani Nikula <jani.nikula@linux.intel.com> wrote:

> It's more involved, but I think the better place to do this (as well as
> the kernel-doc transformations) would be in the doctree-read event,
> after the rst parsing is done. You can traverse the doctree and find the
> places which weren't special for Sphinx, and replace the plain text
> nodes in-place. I've toyed with this in the past, but alas I didn't have
> (and still don't) have the time to finish the job.

I had thought about this; indeed, there's a comment in the posted patch to
that effect.  The tradeoff comes down to this, I think:

 - The fragility and inelegance of embedding a bit of redundant RST
   parsing into this extension v. that of adding a late parsing stage as a
   transform, trying to enumerate every node type that you might want to
   change, and digging into the C domain code to emulate its reference
   generation.

 - The time required to implement a solution; I'm having a bit of a hard
   time keeping up with docs at the moment as it is.

 - Regexes.  This solution has more of them, but we're not going to get
   away from them regardless.

I am not at all opposed to a more proper solution that might, in the
long term, produce more deterministic results.  I can even try to work in
that direction.  But this is something that can be done now that, IMO,
doesn't in any way close off a better implementation in the future.  If we
agree that we should automatically generate references for occurrences of
"function()", we can change how that is actually done later.

I'll look into this further, but my inclination is to go forward with what
I have now.  It's simple and easy to understand, and doesn't seem to screw
up anywhere in the current body of kernel docs as far as I can tell.

> If you decide to go with regex anyway, I'd at least consider pulling the
> transformations/highlights from kernel-doc the script to the Sphinx
> extension, and use the exact same transformations for stuff in source
> code comments and rst files.

Pulling all RST manipulation out of kerneldoc has a great deal of appeal;
assuming this goes forward that should indeed be high on the todo list.

Thanks,

jon

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-26 16:52     ` Jonathan Corbet
@ 2019-04-26 17:54       ` Mauro Carvalho Chehab
  2019-04-26 18:58       ` Jani Nikula
  1 sibling, 0 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2019-04-26 17:54 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: Jani Nikula, linux-doc, linux-kernel, Matthew Wilcox

Em Fri, 26 Apr 2019 10:52:09 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> On Fri, 26 Apr 2019 12:06:42 +0300
> Jani Nikula <jani.nikula@linux.intel.com> wrote:
> 
> > It's more involved, but I think the better place to do this (as well as
> > the kernel-doc transformations) would be in the doctree-read event,
> > after the rst parsing is done. You can traverse the doctree and find the
> > places which weren't special for Sphinx, and replace the plain text
> > nodes in-place. I've toyed with this in the past, but alas I didn't have
> > (and still don't) have the time to finish the job.  
> 
> I had thought about this; indeed, there's a comment in the posted patch to
> that effect.  The tradeoff comes down to this, I think:
> 
>  - The fragility and inelegance of embedding a bit of redundant RST
>    parsing into this extension v. that of adding a late parsing stage as a
>    transform, trying to enumerate every node type that you might want to
>    change, and digging into the C domain code to emulate its reference
>    generation.
> 
>  - The time required to implement a solution; I'm having a bit of a hard
>    time keeping up with docs at the moment as it is.
> 
>  - Regexes.  This solution has more of them, but we're not going to get
>    away from them regardless.
> 
> I am not at all opposed to a more proper solution that might, in the
> long term, produce more deterministic results.  I can even try to work in
> that direction.  But this is something that can be done now that, IMO,
> doesn't in any way close off a better implementation in the future.  If we
> agree that we should automatically generate references for occurrences of
> "function()", we can change how that is actually done later.
> 
> I'll look into this further, but my inclination is to go forward with what
> I have now.  It's simple and easy to understand, and doesn't seem to screw
> up anywhere in the current body of kernel docs as far as I can tell.
> 
> > If you decide to go with regex anyway, I'd at least consider pulling the
> > transformations/highlights from kernel-doc the script to the Sphinx
> > extension, and use the exact same transformations for stuff in source
> > code comments and rst files.  
> 
> Pulling all RST manipulation out of kerneldoc has a great deal of appeal;
> assuming this goes forward that should indeed be high on the todo list.

While I didn't review the python code yet, and while I agree with
Jani and Markus that it would be better only parse functions on texts
after the Sphinx parser, I agree with Jon on that: if the current code works
well enough with the current documentation, I would proceed and apply it. 

Later, the script can be improved.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-25 20:01 ` [PATCH 1/2] Docs: An initial automarkup extension for sphinx Jonathan Corbet
  2019-04-26  9:06   ` Jani Nikula
@ 2019-04-26 18:32   ` Mauro Carvalho Chehab
  2019-04-26 19:16     ` Mauro Carvalho Chehab
  2019-04-26 19:37     ` Jonathan Corbet
  1 sibling, 2 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2019-04-26 18:32 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Matthew Wilcox

Em Thu, 25 Apr 2019 14:01:24 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Rather than fill our text files with :c:func:`function()` syntax, just do
> the markup via a hook into the sphinx build process.  As is always the
> case, the real problem is detecting the situations where this markup should
> *not* be done.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/conf.py              |  3 +-
>  Documentation/sphinx/automarkup.py | 90 ++++++++++++++++++++++++++++++
>  2 files changed, 92 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/sphinx/automarkup.py
> 
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 72647a38b5c2..ba7b2846b1c5 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -34,7 +34,8 @@ needs_sphinx = '1.3'
>  # Add any Sphinx extension module names here, as strings. They can be
>  # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
>  # ones.
> -extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
> +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
> +              'kfigure', 'sphinx.ext.ifconfig', 'automarkup']
>  
>  # The name of the math extension changed on Sphinx 1.4
>  if major == 1 and minor > 3:
> diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> new file mode 100644
> index 000000000000..c47469372bae
> --- /dev/null
> +++ b/Documentation/sphinx/automarkup.py
> @@ -0,0 +1,90 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# This is a little Sphinx extension that tries to apply certain kinds
> +# of markup automatically so we can keep it out of the text files
> +# themselves.
> +#
> +# It's possible that this could be done better by hooking into the build
> +# much later and traversing through the doctree.  That would eliminate the
> +# need to duplicate some RST parsing and perhaps be less fragile, at the
> +# cost of some more complexity and the need to generate the cross-reference
> +# links ourselves.
> +#
> +# Copyright 2019 Jonathan Corbet <corbet@lwn.net>
> +#
> +from __future__ import print_function
> +import re
> +import sphinx
> +
> +#
> +# Regex nastiness.  Of course.
> +# Try to identify "function()" that's not already marked up some
> +# other way.  Sphinx doesn't like a lot of stuff right after a
> +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> +# bit tries to restrict matches to things that won't create trouble.
> +#
> +RE_function = re.compile(r'(^|\s+)([\w\d_]+\(\))([.,/\s]|$)')

IMHO, this looks good enough to avoid trouble, maybe except if one
wants to write a document explaining this functionality at the
doc-guide/kernel-doc.rst.

Anyway, the way it is written, we could still explain it by adding
a "\ " after the func, e. g.:

	When you write a function like: func()\ , the automarkup
	extension will automatically convert it into:
	``:c:func:`func()```.

So, this looks OK on my eyes.

> +#
> +# Lines consisting of a single underline character.
> +#
> +RE_underline = re.compile(r'^([-=~])\1+$')

Hmm... why are you calling this "underline"? Sounds a bad name to me,
as it took me a while to understand what you meant.

From the code I'm inferring that this is meant to track 3 of the
possible symbols used as a (sub).*title markup. On several places 
we use other symbols:'^', '~', '.', '*' (and others) as sub-sub(sub..)
title markups.

I would instead define this Regex as:

	RE_title_markup = re.compile(r'^([^\w\d])\1+$')

You should probably need another regex for the title itself:

	RE_possible_title = re.compile(r'^(\S.*\S)\s*$')

in order to get the size of the matched line. Doing a doing len(previous)
will get you false positives.

As on Sphinx, **all** titles should start at the first column,
or it will produce a severe error[1], we can use such regex to
minimize parsing errors.

[1] and either crash or keep running some endless loop internally.
Not being bad enough, it will also invalidate all the previously
cached data, losing a lot of time next time you try to build the
docs.

---

on a separate matter (but related to automarkup matter - and to what
I would name underline), as a future feature, perhaps we could also add
a parser for:

	_something that requires underlines_

Underlined text is probably the only feature that we use on several docs
with Sphinx doesn't support (there are some extensions for that - I guess,
but it sounds simple enough to have a parser here).

This can be tricky to get it right, as just underlines_ is a
cross reference markup - so, I would only add this after we improve the
script to come after Sphinx own markup processing.

---

> +#
> +# Starting a literal block.
> +#
> +RE_literal = re.compile(r'^(\s*)(.*::\s*|\.\.\s+code-block::.*)$')
> +#
> +# Just get the white space beginning a line.
> +#
> +RE_whitesp = re.compile(r'^(\s*)')
> +
> +def MangleFile(app, docname, text):
> +    ret = [ ]
> +    previous = ''
> +    literal = False
> +    for line in text[0].split('\n'):
> +        #
> +        # See if we might be ending a literal block, as denoted by
> +        # an indent no greater than when we started.
> +        #
> +        if literal and len(line) > 0:
> +            m = RE_whitesp.match(line)  # Should always match
> +            if len(m.group(1).expandtabs()) <= lit_indent:
> +                literal = False
> +        #
> +        # Blank lines, directives, and lines within literal blocks
> +        # should not be messed with.
> +        #
> +        if literal or len(line) == 0 or line[0] == '.':
> +            ret.append(line)


> +        #
> +        # Is this an underline line?  If so, and it is the same length
> +        # as the previous line, we may have mangled a heading line in
> +        # error, so undo it.
> +        #
> +        elif RE_underline.match(line):
> +            if len(line) == len(previous):

No, that doesn't seem enough. I would, instead, use the regex I
proposed before, in order to check if the previous line starts with
a non-space, and getting the length only up to the last non-space
(yeah, unfortunately, we have some text files that have extra blank
spaces at line's tail).

> +                ret[-1] = previous
> +            ret.append(line)
> +        #
> +        # Normal line - perform substitutions.
> +        #
> +        else:
> +            ret.append(RE_function.sub(r'\1:c:func:`\2`\3', line))
> +        #
> +        # Might we be starting a literal block?  If so make note of
> +        # the fact.
> +        #
> +        m = RE_literal.match(line)
> +        if m:
> +            literal = True
> +            lit_indent = len(m.group(1).expandtabs())
> +        previous = line
> +    text[0] = '\n'.join(ret)
> +
> +def setup(app):
> +    app.connect('source-read', MangleFile)
> +
> +    return dict(
> +        parallel_read_safe = True,
> +        parallel_write_safe = True
> +    )

The remaining looks fine to me - although I'm not a Sphinx-extension
expert, and my knowledge of python is far from being perfect.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-26 16:52     ` Jonathan Corbet
  2019-04-26 17:54       ` Mauro Carvalho Chehab
@ 2019-04-26 18:58       ` Jani Nikula
  1 sibling, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2019-04-26 18:58 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: linux-doc, linux-kernel, Matthew Wilcox, Mauro Carvalho Chehab

On Fri, 26 Apr 2019, Jonathan Corbet <corbet@lwn.net> wrote:
> I am not at all opposed to a more proper solution that might, in the
> long term, produce more deterministic results.  I can even try to work in
> that direction.  But this is something that can be done now that, IMO,
> doesn't in any way close off a better implementation in the future.  If we
> agree that we should automatically generate references for occurrences of
> "function()", we can change how that is actually done later.
>
> I'll look into this further, but my inclination is to go forward with what
> I have now.  It's simple and easy to understand, and doesn't seem to screw
> up anywhere in the current body of kernel docs as far as I can tell.

Fair enough. It's most important that this doesn't block us from
switching to a different implementation later once someone figures it
out.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-26 18:32   ` Mauro Carvalho Chehab
@ 2019-04-26 19:16     ` Mauro Carvalho Chehab
  2019-04-26 19:37     ` Jonathan Corbet
  1 sibling, 0 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2019-04-26 19:16 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Matthew Wilcox

Em Fri, 26 Apr 2019 15:32:55 -0300
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> escreveu:

> Em Thu, 25 Apr 2019 14:01:24 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
> 
> > Rather than fill our text files with :c:func:`function()` syntax, just do
> > the markup via a hook into the sphinx build process.  As is always the
> > case, the real problem is detecting the situations where this markup should
> > *not* be done.
> > 
> > Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> > ---
> >  Documentation/conf.py              |  3 +-
> >  Documentation/sphinx/automarkup.py | 90 ++++++++++++++++++++++++++++++
> >  2 files changed, 92 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/sphinx/automarkup.py
> > 
> > diff --git a/Documentation/conf.py b/Documentation/conf.py
> > index 72647a38b5c2..ba7b2846b1c5 100644
> > --- a/Documentation/conf.py
> > +++ b/Documentation/conf.py
> > @@ -34,7 +34,8 @@ needs_sphinx = '1.3'
> >  # Add any Sphinx extension module names here, as strings. They can be
> >  # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
> >  # ones.
> > -extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
> > +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
> > +              'kfigure', 'sphinx.ext.ifconfig', 'automarkup']
> >  
> >  # The name of the math extension changed on Sphinx 1.4
> >  if major == 1 and minor > 3:
> > diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> > new file mode 100644
> > index 000000000000..c47469372bae
> > --- /dev/null
> > +++ b/Documentation/sphinx/automarkup.py
> > @@ -0,0 +1,90 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# This is a little Sphinx extension that tries to apply certain kinds
> > +# of markup automatically so we can keep it out of the text files
> > +# themselves.
> > +#
> > +# It's possible that this could be done better by hooking into the build
> > +# much later and traversing through the doctree.  That would eliminate the
> > +# need to duplicate some RST parsing and perhaps be less fragile, at the
> > +# cost of some more complexity and the need to generate the cross-reference
> > +# links ourselves.
> > +#
> > +# Copyright 2019 Jonathan Corbet <corbet@lwn.net>
> > +#
> > +from __future__ import print_function
> > +import re
> > +import sphinx
> > +
> > +#
> > +# Regex nastiness.  Of course.
> > +# Try to identify "function()" that's not already marked up some
> > +# other way.  Sphinx doesn't like a lot of stuff right after a
> > +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> > +# bit tries to restrict matches to things that won't create trouble.
> > +#
> > +RE_function = re.compile(r'(^|\s+)([\w\d_]+\(\))([.,/\s]|$)')  
> 
> IMHO, this looks good enough to avoid trouble, maybe except if one
> wants to write a document explaining this functionality at the
> doc-guide/kernel-doc.rst.
> 
> Anyway, the way it is written, we could still explain it by adding
> a "\ " after the func, e. g.:
> 
> 	When you write a function like: func()\ , the automarkup
> 	extension will automatically convert it into:
> 	``:c:func:`func()```.
> 
> So, this looks OK on my eyes.
> 
> > +#
> > +# Lines consisting of a single underline character.
> > +#
> > +RE_underline = re.compile(r'^([-=~])\1+$')  
> 
> Hmm... why are you calling this "underline"? Sounds a bad name to me,
> as it took me a while to understand what you meant.
> 
> From the code I'm inferring that this is meant to track 3 of the
> possible symbols used as a (sub).*title markup. On several places 
> we use other symbols:'^', '~', '.', '*' (and others) as sub-sub(sub..)
> title markups.
> 
> I would instead define this Regex as:
> 
> 	RE_title_markup = re.compile(r'^([^\w\d])\1+$')

In time:

	RE_title_markup = re.compile(r'^([^\w\d\s])\1+$')

As otherwise it would get whitespaces/tabs :-)

Also, please notice that this is pure review - didn't actually try to
apply your patch or test with or without the proposed changes :-)

> 
> You should probably need another regex for the title itself:
> 
> 	RE_possible_title = re.compile(r'^(\S.*\S)\s*$')
> 
> in order to get the size of the matched line. Doing a doing len(previous)
> will get you false positives.
> 
> As on Sphinx, **all** titles should start at the first column,
> or it will produce a severe error[1], we can use such regex to
> minimize parsing errors.
> 
> [1] and either crash or keep running some endless loop internally.
> Not being bad enough, it will also invalidate all the previously
> cached data, losing a lot of time next time you try to build the
> docs.
> 
> ---
> 
> on a separate matter (but related to automarkup matter - and to what
> I would name underline), as a future feature, perhaps we could also add
> a parser for:
> 
> 	_something that requires underlines_
> 
> Underlined text is probably the only feature that we use on several docs
> with Sphinx doesn't support (there are some extensions for that - I guess,
> but it sounds simple enough to have a parser here).
> 
> This can be tricky to get it right, as just underlines_ is a
> cross reference markup - so, I would only add this after we improve the
> script to come after Sphinx own markup processing.
> 
> ---
> 
> > +#
> > +# Starting a literal block.
> > +#
> > +RE_literal = re.compile(r'^(\s*)(.*::\s*|\.\.\s+code-block::.*)$')
> > +#
> > +# Just get the white space beginning a line.
> > +#
> > +RE_whitesp = re.compile(r'^(\s*)')
> > +
> > +def MangleFile(app, docname, text):
> > +    ret = [ ]
> > +    previous = ''
> > +    literal = False
> > +    for line in text[0].split('\n'):
> > +        #
> > +        # See if we might be ending a literal block, as denoted by
> > +        # an indent no greater than when we started.
> > +        #
> > +        if literal and len(line) > 0:
> > +            m = RE_whitesp.match(line)  # Should always match
> > +            if len(m.group(1).expandtabs()) <= lit_indent:
> > +                literal = False
> > +        #
> > +        # Blank lines, directives, and lines within literal blocks
> > +        # should not be messed with.
> > +        #
> > +        if literal or len(line) == 0 or line[0] == '.':
> > +            ret.append(line)  
> 
> 
> > +        #
> > +        # Is this an underline line?  If so, and it is the same length
> > +        # as the previous line, we may have mangled a heading line in
> > +        # error, so undo it.
> > +        #
> > +        elif RE_underline.match(line):
> > +            if len(line) == len(previous):  
> 
> No, that doesn't seem enough. I would, instead, use the regex I
> proposed before, in order to check if the previous line starts with
> a non-space, and getting the length only up to the last non-space
> (yeah, unfortunately, we have some text files that have extra blank
> spaces at line's tail).
> 
> > +                ret[-1] = previous
> > +            ret.append(line)
> > +        #
> > +        # Normal line - perform substitutions.
> > +        #
> > +        else:
> > +            ret.append(RE_function.sub(r'\1:c:func:`\2`\3', line))
> > +        #
> > +        # Might we be starting a literal block?  If so make note of
> > +        # the fact.
> > +        #
> > +        m = RE_literal.match(line)
> > +        if m:
> > +            literal = True
> > +            lit_indent = len(m.group(1).expandtabs())
> > +        previous = line
> > +    text[0] = '\n'.join(ret)
> > +
> > +def setup(app):
> > +    app.connect('source-read', MangleFile)
> > +
> > +    return dict(
> > +        parallel_read_safe = True,
> > +        parallel_write_safe = True
> > +    )  
> 
> The remaining looks fine to me - although I'm not a Sphinx-extension
> expert, and my knowledge of python is far from being perfect.
> 
> Thanks,
> Mauro



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-26 18:32   ` Mauro Carvalho Chehab
  2019-04-26 19:16     ` Mauro Carvalho Chehab
@ 2019-04-26 19:37     ` Jonathan Corbet
  2019-04-26 20:51       ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Corbet @ 2019-04-26 19:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-doc, linux-kernel, Matthew Wilcox

On Fri, 26 Apr 2019 15:32:55 -0300
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:

> > +# Try to identify "function()" that's not already marked up some
> > +# other way.  Sphinx doesn't like a lot of stuff right after a
> > +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> > +# bit tries to restrict matches to things that won't create trouble.
> > +#
> > +RE_function = re.compile(r'(^|\s+)([\w\d_]+\(\))([.,/\s]|$)')  
> 
> IMHO, this looks good enough to avoid trouble, maybe except if one
> wants to write a document explaining this functionality at the
> doc-guide/kernel-doc.rst.

Adding something to the docs is definitely on my list.

> Anyway, the way it is written, we could still explain it by adding
> a "\ " after the func, e. g.:
> 
> 	When you write a function like: func()\ , the automarkup
> 	extension will automatically convert it into:
> 	``:c:func:`func()```.
> 
> So, this looks OK on my eyes.

Not sure I like that; the whole point is to avoid extra markup here.  Plus
I like that it catches all function references whether the author thought
to mark them or not.

> > +#
> > +# Lines consisting of a single underline character.
> > +#
> > +RE_underline = re.compile(r'^([-=~])\1+$')  
> 
> Hmm... why are you calling this "underline"? Sounds a bad name to me,
> as it took me a while to understand what you meant.

Seemed OK to me, but I can change it :)

> From the code I'm inferring that this is meant to track 3 of the
> possible symbols used as a (sub).*title markup. On several places 
> we use other symbols:'^', '~', '.', '*' (and others) as sub-sub(sub..)
> title markups.

I picked the ones that were suggested in our docs; it was enough to catch
all of the problems in the current kernel docs.

Anyway, The real documentation gives the actual set, so I'll maybe make it:

	=-'`":~^_*+#<>

I'd prefer that to something more wildcardish.

> You should probably need another regex for the title itself:
> 
> 	RE_possible_title = re.compile(r'^(\S.*\S)\s*$')
> 
> in order to get the size of the matched line. Doing a doing len(previous)
> will get you false positives.

This I don't quite get.  It's easy enough to trim off the spaces with
strip() if that turns out to be a problem (which it hasn't so far).  I can
add that.

> on a separate matter (but related to automarkup matter - and to what
> I would name underline), as a future feature, perhaps we could also add
> a parser for:
> 
> 	_something that requires underlines_
> 
> Underlined text is probably the only feature that we use on several docs
> with Sphinx doesn't support (there are some extensions for that - I guess,
> but it sounds simple enough to have a parser here).
> 
> This can be tricky to get it right, as just underlines_ is a
> cross reference markup - so, I would only add this after we improve the
> script to come after Sphinx own markup processing.

That does indeed sound tricky.  It would also probably have to come
*before* Sphinx does its thing or it's unlikely to survive.

> > +        #
> > +        # Is this an underline line?  If so, and it is the same length
> > +        # as the previous line, we may have mangled a heading line in
> > +        # error, so undo it.
> > +        #
> > +        elif RE_underline.match(line):
> > +            if len(line) == len(previous):  
> 
> No, that doesn't seem enough. I would, instead, use the regex I
> proposed before, in order to check if the previous line starts with
> a non-space, and getting the length only up to the last non-space
> (yeah, unfortunately, we have some text files that have extra blank
> spaces at line's tail).

So I'll make it "if len(line) == len(previous.strip())

Thanks,

jon

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
  2019-04-26 19:37     ` Jonathan Corbet
@ 2019-04-26 20:51       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2019-04-26 20:51 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Matthew Wilcox

Em Fri, 26 Apr 2019 13:37:19 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> On Fri, 26 Apr 2019 15:32:55 -0300
> Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:
> 
> > > +# Try to identify "function()" that's not already marked up some
> > > +# other way.  Sphinx doesn't like a lot of stuff right after a
> > > +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> > > +# bit tries to restrict matches to things that won't create trouble.
> > > +#
> > > +RE_function = re.compile(r'(^|\s+)([\w\d_]+\(\))([.,/\s]|$)')    
> > 
> > IMHO, this looks good enough to avoid trouble, maybe except if one
> > wants to write a document explaining this functionality at the
> > doc-guide/kernel-doc.rst.  
> 
> Adding something to the docs is definitely on my list.
> 
> > Anyway, the way it is written, we could still explain it by adding
> > a "\ " after the func, e. g.:
> > 
> > 	When you write a function like: func()\ , the automarkup
> > 	extension will automatically convert it into:
> > 	``:c:func:`func()```.
> > 
> > So, this looks OK on my eyes.  
> 
> Not sure I like that; the whole point is to avoid extra markup here.  Plus
> I like that it catches all function references whether the author thought
> to mark them or not.

Yes, but I'm pretty sure that there will be cases where one may want
to explicitly force the parser to not recognize it. One of such examples
is the document explaining this feature.

> 
> > > +#
> > > +# Lines consisting of a single underline character.
> > > +#
> > > +RE_underline = re.compile(r'^([-=~])\1+$')    
> > 
> > Hmm... why are you calling this "underline"? Sounds a bad name to me,
> > as it took me a while to understand what you meant.  
> 
> Seemed OK to me, but I can change it :)

I'm pretty sure that, on my last /79 patch series, I used some
patterns that would be placing function names at the title,
and that were not using '-', '=' or '~'. If the parser
would pick it or not is a separate matter[1] :-)

[1] It would probably reject on titles, as very often what
we write on titles are things like:

foo(int i, int j)
.................

As it has the variables inside, the parser won't likely get it.

Yet, I vaguely remember I saw or wrote some title that had
a pattern like:

Usage of foo()
^^^^^^^^^^^^^^

(but I can't really remember what was the used markup)

I would prefer if you change. I usually use myself:

	'=' '-' '^' and '.'

(usually on the above order - as it makes some sense to my
brain  to use the above indentation levels)

In general, function descriptions are sub-sub-title or
sub-sub-sub-title, so, on the places I wrote, it would likely
be either '^' or '.'.

But I've seen other symbols being used too to mark titles
(like '*' and '#').

> > From the code I'm inferring that this is meant to track 3 of the
> > possible symbols used as a (sub).*title markup. On several places 
> > we use other symbols:'^', '~', '.', '*' (and others) as sub-sub(sub..)
> > title markups.  
> 
> I picked the ones that were suggested in our docs; it was enough to catch
> all of the problems in the current kernel docs.
> 
> Anyway, The real documentation gives the actual set, so I'll maybe make it:
> 
> 	=-'`":~^_*+#<>

I'm pretty sure a single dot works as well, as I used this already.

> I'd prefer that to something more wildcardish.

Yeah, makes sense, provided that it will reflect what Sphinx actually
uses internally.

> > You should probably need another regex for the title itself:
> > 
> > 	RE_possible_title = re.compile(r'^(\S.*\S)\s*$')
> > 
> > in order to get the size of the matched line. Doing a doing len(previous)
> > will get you false positives.  
> 
> This I don't quite get.  It's easy enough to trim off the spaces with
> strip() if that turns out to be a problem (which it hasn't so far).  I can
> add that.


What I'm saying is that the title markup should always start at the first
position. So, this is a valid title:

Foo valid title
===============

But this causes Sphinx to crash badly:

 Foo invalid title
 =================

Knowing that, we can use a regex for the previous line assuming that it
will always start with a non-spaced character[2], and checking only the 
length of non-blank characters.

[2] Strictly speaking, I guess Sphinx would accept something like:

   Foo weirdly marked title - probably non-compliant with ReST spec
===================================================================

But I don't think we have any occurrence of something like that - and
I don't think we should concern about that, as it would be a very bad 
documentation style anyway.

So, what I'm saying is that we could use such knowledge in our benefit,
considering a valid title to be something like: ^(\S.*\S)\s*$ - e. g.
the title itself starts on a non-space char and ends on another 
non-space char.

> 
> > on a separate matter (but related to automarkup matter - and to what
> > I would name underline), as a future feature, perhaps we could also add
> > a parser for:
> > 
> > 	_something that requires underlines_
> > 
> > Underlined text is probably the only feature that we use on several docs
> > with Sphinx doesn't support (there are some extensions for that - I guess,
> > but it sounds simple enough to have a parser here).
> > 
> > This can be tricky to get it right, as just underlines_ is a
> > cross reference markup - so, I would only add this after we improve the
> > script to come after Sphinx own markup processing.  
> 
> That does indeed sound tricky.  It would also probably have to come
> *before* Sphinx does its thing or it's unlikely to survive.
> 
> > > +        #
> > > +        # Is this an underline line?  If so, and it is the same length
> > > +        # as the previous line, we may have mangled a heading line in
> > > +        # error, so undo it.
> > > +        #
> > > +        elif RE_underline.match(line):
> > > +            if len(line) == len(previous):    
> > 
> > No, that doesn't seem enough. I would, instead, use the regex I
> > proposed before, in order to check if the previous line starts with
> > a non-space, and getting the length only up to the last non-space
> > (yeah, unfortunately, we have some text files that have extra blank
> > spaces at line's tail).  
> 
> So I'll make it "if len(line) == len(previous.strip())
> 
> Thanks,
> 
> jon



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-04-26 20:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-25 20:01 [PATCH 0/2] RFC: Automatic :c:func: annotation in the sphinx build Jonathan Corbet
2019-04-25 20:01 ` [PATCH 1/2] Docs: An initial automarkup extension for sphinx Jonathan Corbet
2019-04-26  9:06   ` Jani Nikula
2019-04-26 11:31     ` Markus Heiser
2019-04-26 16:52     ` Jonathan Corbet
2019-04-26 17:54       ` Mauro Carvalho Chehab
2019-04-26 18:58       ` Jani Nikula
2019-04-26 18:32   ` Mauro Carvalho Chehab
2019-04-26 19:16     ` Mauro Carvalho Chehab
2019-04-26 19:37     ` Jonathan Corbet
2019-04-26 20:51       ` Mauro Carvalho Chehab
2019-04-25 20:01 ` [PATCH 2/2] docs: remove :c:func: annotations from xarray.rst Jonathan Corbet
2019-04-25 20:43   ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).