linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
@ 2020-10-14 20:09 Nícolas F. R. A. Prado
  2020-10-14 20:16 ` Jonathan Corbet
  0 siblings, 1 reply; 5+ messages in thread
From: Nícolas F. R. A. Prado @ 2020-10-14 20:09 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Mauro Carvalho Chehab, linux-doc, linux-kernel, lkcamp, andrealmeid

On Wed Oct 14, 2020 at 4:11 PM -03, Jonathan Corbet wrote:
>
> On Tue, 13 Oct 2020 23:13:17 +0000
> Nícolas F. R. A. Prado <nfraprado@protonmail.com> wrote:
>
> > The warnings were caused by the expressions matching words in the
> > translated versions of the documentation, since any unicode character
> > was matched.
> >
> > Fix the regular expression by making the C regexes use ASCII
>
> I don't quite understand this part, can you give an example of the kinds
> of warnings you were seeing?

Hi Jon,
sure.

One I had noted down was:

WARNING: Unparseable C cross-reference: '调用debugfs_rename'

which I believe occurred in the chinese translation.

I think the problem is that in chinese there normally isn't space between the
words, so even if I had made the regexes only match the beginning of the word
(which I didn't, but I fixed this in this patch with the \b), it would still try
to cross-reference to that symbol containing chinese characters, which is
unparsable to sphinx.

So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
to make \w, and \d only match ASCII characters, otherwise they match any unicode
character.

If you want to have a look at other warnings or more complete output let me know
and I will recompile those versions. That sentence was the only thing I noted
down, but I think it gives a good idea of the problem.

Thanks,
Nícolas

>
> Thanks,
>
> jon



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
  2020-10-14 20:09 [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings Nícolas F. R. A. Prado
@ 2020-10-14 20:16 ` Jonathan Corbet
  2020-10-15  6:31   ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Corbet @ 2020-10-14 20:16 UTC (permalink / raw)
  To: Nícolas F. R. A. Prado
  Cc: Mauro Carvalho Chehab, linux-doc, linux-kernel, lkcamp, andrealmeid

On Wed, 14 Oct 2020 20:09:10 +0000
Nícolas F. R. A. Prado <nfraprado@protonmail.com> wrote:

> One I had noted down was:
> 
> WARNING: Unparseable C cross-reference: '调用debugfs_rename'
> 
> which I believe occurred in the chinese translation.
> 
> I think the problem is that in chinese there normally isn't space between the
> words, so even if I had made the regexes only match the beginning of the word
> (which I didn't, but I fixed this in this patch with the \b), it would still try
> to cross-reference to that symbol containing chinese characters, which is
> unparsable to sphinx.
> 
> So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
> to make \w, and \d only match ASCII characters, otherwise they match any unicode
> character.

OK, this all makes sense, as does your fix.  The one thing I would ask
would be to put that warning into the changelog for future reference.

Thanks,

jon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
  2020-10-14 20:16 ` Jonathan Corbet
@ 2020-10-15  6:31   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 5+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-15  6:31 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Nícolas F. R. A. Prado, linux-doc, linux-kernel, lkcamp,
	andrealmeid

Em Wed, 14 Oct 2020 14:16:16 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> On Wed, 14 Oct 2020 20:09:10 +0000
> Nícolas F. R. A. Prado <nfraprado@protonmail.com> wrote:
> 
> > One I had noted down was:
> > 
> > WARNING: Unparseable C cross-reference: '调用debugfs_rename'
> > 
> > which I believe occurred in the chinese translation.
> > 
> > I think the problem is that in chinese there normally isn't space between the
> > words, so even if I had made the regexes only match the beginning of the word
> > (which I didn't, but I fixed this in this patch with the \b), it would still try
> > to cross-reference to that symbol containing chinese characters, which is
> > unparsable to sphinx.
> > 
> > So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
> > to make \w, and \d only match ASCII characters, otherwise they match any unicode
> > character.  
> 
> OK, this all makes sense, as does your fix.  The one thing I would ask
> would be to put that warning into the changelog for future reference.

I added yesterday patches 1 to 4 from Nícolas series on my -next tree:

	https://git.linuxtv.org/mchehab/media-next.git/log/

Today, I changed the changelog in order to better describe the ASCII issue:

	https://git.linuxtv.org/mchehab/media-next.git/commit/?id=f66e47f98c1e827a85654a8cfa1ba539bb381a1b

If this is enough, I'll likely send the PR to Linus later today or tomorrow,
depending on next- merge results.

Patch 5 can be added later, after we find a way to keep it safe for
parallel reading.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
  2020-10-13 23:13 ` [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings Nícolas F. R. A. Prado
@ 2020-10-14 19:11   ` Jonathan Corbet
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Corbet @ 2020-10-14 19:11 UTC (permalink / raw)
  To: Nícolas F. R. A. Prado
  Cc: Mauro Carvalho Chehab, linux-doc, linux-kernel, lkcamp, andrealmeid

On Tue, 13 Oct 2020 23:13:17 +0000
Nícolas F. R. A. Prado <nfraprado@protonmail.com> wrote:

> The warnings were caused by the expressions matching words in the
> translated versions of the documentation, since any unicode character
> was matched.
> 
> Fix the regular expression by making the C regexes use ASCII

I don't quite understand this part, can you give an example of the kinds
of warnings you were seeing?

Thanks,

jon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
  2020-10-13 23:13 [PATCH v2 0/5] docs: automarkup.py: Make automarkup ready for Sphinx 3.1+ Nícolas F. R. A. Prado
@ 2020-10-13 23:13 ` Nícolas F. R. A. Prado
  2020-10-14 19:11   ` Jonathan Corbet
  0 siblings, 1 reply; 5+ messages in thread
From: Nícolas F. R. A. Prado @ 2020-10-13 23:13 UTC (permalink / raw)
  To: Jonathan Corbet, Mauro Carvalho Chehab
  Cc: linux-doc, linux-kernel, lkcamp, andrealmeid

With the transition to Sphinx 3, new warnings were generated by
automarkup, exposing bugs in the regexes.

The warnings were caused by the expressions matching words in the
translated versions of the documentation, since any unicode character
was matched.

Fix the regular expression by making the C regexes use ASCII and
ensuring the expressions only match the beginning of words.

Signed-off-by: Nícolas F. R. A. Prado <nfraprado@protonmail.com>
---
 Documentation/sphinx/automarkup.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index db13fb15cedc..43dd9025fc77 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -22,12 +22,13 @@ from itertools import chain
 # :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
 # bit tries to restrict matches to things that won't create trouble.
 #
-RE_function = re.compile(r'(([\w_][\w\d_]+)\(\))')
+RE_function = re.compile(r'\b(([a-zA-Z_]\w+)\(\))', flags=re.ASCII)
 
 #
 # Sphinx 2 uses the same :c:type role for struct, union, enum and typedef
 #
-RE_generic_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
+RE_generic_type = re.compile(r'\b(struct|union|enum|typedef)\s+([a-zA-Z_]\w+)',
+                             flags=re.ASCII)
 
 #
 # Sphinx 3 uses a different C role for each one of struct, union, enum and
@@ -42,7 +43,7 @@ RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
 # Detects a reference to a documentation page of the form Documentation/... with
 # an optional extension
 #
-RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
+RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')
 
 #
 # Many places in the docs refer to common system calls.  It is
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-15  6:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14 20:09 [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings Nícolas F. R. A. Prado
2020-10-14 20:16 ` Jonathan Corbet
2020-10-15  6:31   ` Mauro Carvalho Chehab
  -- strict thread matches above, loose matches on Subject: below --
2020-10-13 23:13 [PATCH v2 0/5] docs: automarkup.py: Make automarkup ready for Sphinx 3.1+ Nícolas F. R. A. Prado
2020-10-13 23:13 ` [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings Nícolas F. R. A. Prado
2020-10-14 19:11   ` Jonathan Corbet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).