qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/22] qapi: static typing conversion, pt5a
@ 2021-04-22  3:06 John Snow
  2021-04-22  3:06 ` [PATCH 01/22] qapi/parser: Don't try to handle file errors John Snow
                   ` (21 more replies)
  0 siblings, 22 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

This is part five, and focuses on QAPISchemaParser in parser.py.\r
It does not touch QAPIDoc yet, which will be covered next.\r
\r
gitlab: https://gitlab.com/jsnow/qemu/-/commits/python-qapi-cleanup-pt5a\r
merge-request (and CI): https://gitlab.com/jsnow/qemu/-/merge_requests/3\r
\r
I encourage you to leave comments on the gitlab MR request link!  Let's\r
try an experiment. (Of course, I will still response to critique on the\r
list, as usual.)\r
\r
The patches that belong to just part five start here:\r
https://gitlab.com/jsnow/qemu/-/merge_requests/3/diffs?commit_id=7cc329b57bbc5504cba7552be6c0502081aca5f0\r
\r
At the top near "Viewing commit 7cc329b5", you can click "Next" to move\r
on to the next patch after you're done leaving comments on a single\r
commit.\r
\r
Give it a whirl!\r
\r
Requirements:\r
- Python 3.6+\r
- mypy >= 0.770\r
- pylint >= 2.6.0 (2.7.0+ when using Python 3.9+)\r
\r
Every commit should pass with:\r
 - `isort -c qapi/`\r
 - `flake8 qapi/`\r
 - `pylint --rcfile=qapi/pylintrc qapi/`\r
 - `mypy --config-file=qapi/mypy.ini qapi/`\r
\r
John Snow (22):\r
  qapi/parser: Don't try to handle file errors\r
  qapi/source: [RFC] add "with_column" contextmanager\r
  qapi/source: Remove line number from QAPISourceInfo initializer\r
  qapi/parser: factor parsing routine into method\r
  qapi/parser: Assert lexer value is a string\r
  qapi/parser: assert get_expr returns object in outer loop\r
  qapi/parser: assert object keys are strings\r
  qapi/parser: Use @staticmethod where appropriate\r
  qapi: add match_nofail helper\r
  qapi/parser: Fix typing of token membership tests\r
  qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard\r
  qapi/parser: add type hint annotations\r
  qapi/parser: [RFC] overload the return type of get_expr\r
  qapi/parser: Remove superfluous list constructor\r
  qapi/parser: allow 'ch' variable name\r
  qapi/parser: add docstrings\r
  CHECKPOINT\r
  qapi: [WIP] Rip QAPIDoc out of parser.py\r
  qapi: [WIP] Add type ignores for qapidoc.py\r
  qapi: [WIP] Import QAPIDoc from qapidoc Signed-off-by: John Snow\r
    <jsnow@redhat.com>\r
  qapi: [WIP] Add QAPIDocError\r
  qapi: [WIP] Enable linters on parser.py\r
\r
 scripts/qapi/common.py  |   8 +-\r
 scripts/qapi/error.py   |   8 +-\r
 scripts/qapi/expr.py    |   2 +-\r
 scripts/qapi/main.py    |  14 +-\r
 scripts/qapi/mypy.ini   |   2 +-\r
 scripts/qapi/parser.py  | 566 +++++++++++++---------------------------\r
 scripts/qapi/pylintrc   |   3 +-\r
 scripts/qapi/qapidoc.py | 360 +++++++++++++++++++++++++\r
 scripts/qapi/source.py  |  32 ++-\r
 9 files changed, 581 insertions(+), 414 deletions(-)\r
 create mode 100644 scripts/qapi/qapidoc.py\r
\r
-- \r
2.30.2\r
\r



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 01/22] qapi/parser: Don't try to handle file errors
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
@ 2021-04-22  3:06 ` John Snow
  2021-04-23 15:46   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 02/22] qapi/source: [RFC] add "with_column" contextmanager John Snow
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

The short-ish version of what motivates this patch is:

- The parser initializer does not possess adequate context to write a
  good error message -- It tries to determine the caller's semantic
  context.
- We don't want to allow QAPISourceInfo(None, None, None) to exist.
- Errors made using such an object are currently incorrect.
- It's not technically a semantic error if we cannot open the schema
- There are various typing constraints that make mixing these two cases
  undesirable for a single special case.
- The current open block in parser's initializer will leak file
  pointers, because it isn't using a with statement.


Here's the details in why this got written the way it did, and why a few
disparate issues are rolled into one commit. (They're hard to fix
separately without writing really weird stuff that'd be harder to
review.)

The error message string here is incorrect:

> python3 qapi-gen.py 'fake.json'
qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory

In pursuing it, we find that QAPISourceInfo has a special accommodation
for when there's no filename. Meanwhile, we intend to type info.fname as
str; something we always have.

To remove this, we need to not have a "fake" QAPISourceInfo object. We
also don't want to explicitly begin accommodating QAPISourceInfo being
None, because we actually want to eventually prove that this can never
happen -- We don't want to confuse "The file isn't open yet" with "This
error stems from a definition that wasn't defined in any file".

(An earlier series tried to create an official dummy object, but it was
tough to prove in review that it worked correctly without creating new
regressions. This patch avoids trying to re-litigate that discussion.

We would like to first prove that we never raise QAPISemError for any
built-in object before we relent and add "special" info objects. We
aren't ready to do that yet, so crashing is preferred.)

So, how to solve this mess?

Here's one way: Don't try to handle errors at a level with "mixed"
semantic levels; i.e. don't try to handle inclusion errors (should
report a source line where the include was triggered) with command line
errors (where we specified a file we couldn't read).

Simply remove the error handling from the initializer of the
parser. Pythonic! Now it's the caller's job to figure out what to do
about it. Handle the error in QAPISchemaParser._include() instead, where
we do have the correct semantic context to not need to play games with
the error message generation.

Next, to re-gain a nice error at the top level, add a new try/except
into qapi/main.generate(). Now the error looks sensible:

> python3 qapi-gen.py 'fake.json'
qapi-gen.py: can't read schema file 'fake.json': No such file or directory

Lastly, with this usage gone, we can remove the special type violation
from QAPISourceInfo, and all is well with the world.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/main.py   |  8 +++++++-
 scripts/qapi/parser.py | 18 +++++++++---------
 scripts/qapi/source.py |  3 ---
 3 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
index 703e7ed1ed5..70f8aa86f37 100644
--- a/scripts/qapi/main.py
+++ b/scripts/qapi/main.py
@@ -48,7 +48,13 @@ def generate(schema_file: str,
     """
     assert invalid_prefix_char(prefix) is None
 
-    schema = QAPISchema(schema_file)
+    try:
+        schema = QAPISchema(schema_file)
+    except OSError as err:
+        raise QAPIError(
+            f"can't read schema file '{schema_file}': {err.strerror}"
+        ) from err
+
     gen_types(schema, output_dir, prefix, builtins)
     gen_visit(schema, output_dir, prefix, builtins)
     gen_commands(schema, output_dir, prefix)
diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index ca5e8e18e00..b378fa33807 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
         previously_included = previously_included or set()
         previously_included.add(os.path.abspath(fname))
 
-        try:
-            fp = open(fname, 'r', encoding='utf-8')
+        # Allow the caller to catch this error.
+        with open(fname, 'r', encoding='utf-8') as fp:
             self.src = fp.read()
-        except IOError as e:
-            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
-                               "can't read %s file '%s': %s"
-                               % ("include" if incl_info else "schema",
-                                  fname,
-                                  e.strerror))
 
         if self.src == '' or self.src[-1] != '\n':
             self.src += '\n'
@@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
         if incl_abs_fname in previously_included:
             return None
 
-        return QAPISchemaParser(incl_fname, previously_included, info)
+        try:
+            return QAPISchemaParser(incl_fname, previously_included, info)
+        except OSError as err:
+            raise QAPISemError(
+                info,
+                f"can't read include file '{incl_fname}': {err.strerror}"
+            ) from err
 
     def _check_pragma_list_of_str(self, name, value, info):
         if (not isinstance(value, list)
diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
index 03b6ede0828..1ade864d7b9 100644
--- a/scripts/qapi/source.py
+++ b/scripts/qapi/source.py
@@ -10,7 +10,6 @@
 # See the COPYING file in the top-level directory.
 
 import copy
-import sys
 from typing import List, Optional, TypeVar
 
 
@@ -53,8 +52,6 @@ def next_line(self: T) -> T:
         return info
 
     def loc(self) -> str:
-        if self.fname is None:
-            return sys.argv[0]
         ret = self.fname
         if self.line is not None:
             ret += ':%d' % self.line
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 02/22] qapi/source: [RFC] add "with_column" contextmanager
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
  2021-04-22  3:06 ` [PATCH 01/22] qapi/parser: Don't try to handle file errors John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-27  9:33   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer John Snow
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

This is a silly one, but... it's important to have fun.

This patch isn't *needed*, it's here as an RFC. In trying to experiment
with different ways to solve the problem addressed by the previous
commit, I kept getting confused at how the "source location" string with
line and column number was built across two different classes.

(i.e. QAPISourceError appends the column, but QAPISourceInfo does not
track column information natively.)

I was afraid to try and fully implement column number directly in
QAPISourceInfo on the chance that it might have undesirable effects, so
I came up with a quick "hack" to centralize the 'location' information
generation.

It's a little goofy, but it works :')

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/error.py  |  8 +++-----
 scripts/qapi/source.py | 23 ++++++++++++++++++++++-
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/scripts/qapi/error.py b/scripts/qapi/error.py
index e35e4ddb26a..6b04f56f8a2 100644
--- a/scripts/qapi/error.py
+++ b/scripts/qapi/error.py
@@ -39,11 +39,9 @@ def __init__(self,
 
     def __str__(self) -> str:
         assert self.info is not None
-        loc = str(self.info)
-        if self.col is not None:
-            assert self.info.line is not None
-            loc += ':%s' % self.col
-        return loc + ': ' + self.msg
+        with self.info.at_column(self.col):
+            loc = str(self.info)
+        return f"{loc}: {self.msg}"
 
 
 class QAPISemError(QAPISourceError):
diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
index 1ade864d7b9..21090b9fe78 100644
--- a/scripts/qapi/source.py
+++ b/scripts/qapi/source.py
@@ -9,8 +9,14 @@
 # This work is licensed under the terms of the GNU GPL, version 2.
 # See the COPYING file in the top-level directory.
 
+from contextlib import contextmanager
 import copy
-from typing import List, Optional, TypeVar
+from typing import (
+    Iterator,
+    List,
+    Optional,
+    TypeVar,
+)
 
 
 class QAPISchemaPragma:
@@ -35,6 +41,7 @@ def __init__(self, fname: str, line: int,
                  parent: Optional['QAPISourceInfo']):
         self.fname = fname
         self.line = line
+        self._column: Optional[int] = None
         self.parent = parent
         self.pragma: QAPISchemaPragma = (
             parent.pragma if parent else QAPISchemaPragma()
@@ -52,9 +59,14 @@ def next_line(self: T) -> T:
         return info
 
     def loc(self) -> str:
+        # column cannot be provided meaningfully when line is absent.
+        assert self.line or self._column is None
+
         ret = self.fname
         if self.line is not None:
             ret += ':%d' % self.line
+        if self._column is not None:
+            ret += ':%d' % self._column
         return ret
 
     def in_defn(self) -> str:
@@ -71,5 +83,14 @@ def include_path(self) -> str:
             parent = parent.parent
         return ret
 
+    @contextmanager
+    def at_column(self, column: Optional[int]) -> Iterator[None]:
+        current_column = self._column
+        try:
+            self._column = column
+            yield
+        finally:
+            self._column = current_column
+
     def __str__(self) -> str:
         return self.include_path() + self.in_defn() + self.loc()
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
  2021-04-22  3:06 ` [PATCH 01/22] qapi/parser: Don't try to handle file errors John Snow
  2021-04-22  3:07 ` [PATCH 02/22] qapi/source: [RFC] add "with_column" contextmanager John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-24  6:38   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 04/22] qapi/parser: factor parsing routine into method John Snow
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

With the QAPISourceInfo(None, None, None) construct gone, there's not
really any reason to have to specify that a file starts on the first
line.

Remove it from the initializer and have it default to 1.

Remove the last vestiges where we check for 'line' being unset. That
won't happen again, now!

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py |  2 +-
 scripts/qapi/source.py | 12 +++---------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index b378fa33807..edd0af33ae0 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -47,7 +47,7 @@ def __init__(self, fname, previously_included=None, incl_info=None):
         if self.src == '' or self.src[-1] != '\n':
             self.src += '\n'
         self.cursor = 0
-        self.info = QAPISourceInfo(fname, 1, incl_info)
+        self.info = QAPISourceInfo(fname, incl_info)
         self.line_pos = 0
         self.exprs = []
         self.docs = []
diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
index 21090b9fe78..afa21518974 100644
--- a/scripts/qapi/source.py
+++ b/scripts/qapi/source.py
@@ -37,10 +37,9 @@ def __init__(self) -> None:
 class QAPISourceInfo:
     T = TypeVar('T', bound='QAPISourceInfo')
 
-    def __init__(self, fname: str, line: int,
-                 parent: Optional['QAPISourceInfo']):
+    def __init__(self, fname: str, parent: Optional['QAPISourceInfo'] = None):
         self.fname = fname
-        self.line = line
+        self.line = 1
         self._column: Optional[int] = None
         self.parent = parent
         self.pragma: QAPISchemaPragma = (
@@ -59,12 +58,7 @@ def next_line(self: T) -> T:
         return info
 
     def loc(self) -> str:
-        # column cannot be provided meaningfully when line is absent.
-        assert self.line or self._column is None
-
-        ret = self.fname
-        if self.line is not None:
-            ret += ':%d' % self.line
+        ret = f"{self.fname}:{self.line}"
         if self._column is not None:
             ret += ':%d' % self._column
         return ret
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 04/22] qapi/parser: factor parsing routine into method
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (2 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 05/22] qapi/parser: Assert lexer value is a string John Snow
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

For the sake of keeping __init__ smaller (and treating it more like a
gallery of what state variables we can expect to see), put the actual
parsing action into a parse method. We can still invoke it from the init
method to reduce churn.

To accomplish this, 'previously_included' because the private data
member '_included', and the filename is stashed as _fname.

Add any missing declarations to the init method, and group them by
function so they can be understood quickly at a glance.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index edd0af33ae0..f519518075e 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -37,23 +37,38 @@ def __init__(self, parser, msg):
 class QAPISchemaParser:
 
     def __init__(self, fname, previously_included=None, incl_info=None):
-        previously_included = previously_included or set()
-        previously_included.add(os.path.abspath(fname))
+        self._fname = fname
+        self._included = previously_included or set()
+        self._included.add(os.path.abspath(self._fname))
+        self.src = ''
 
-        # Allow the caller to catch this error.
-        with open(fname, 'r', encoding='utf-8') as fp:
-            self.src = fp.read()
-
-        if self.src == '' or self.src[-1] != '\n':
-            self.src += '\n'
+        # Lexer state (see `accept` for details):
+        self.info = QAPISourceInfo(self._fname, incl_info)
+        self.tok = None
+        self.pos = 0
         self.cursor = 0
-        self.info = QAPISourceInfo(fname, incl_info)
+        self.val = None
         self.line_pos = 0
+
+        # Parser output:
         self.exprs = []
         self.docs = []
-        self.accept()
+
+        # Showtime!
+        self._parse()
+
+    def _parse(self):
         cur_doc = None
 
+        with open(self._fname, 'r', encoding='utf-8') as fp:
+            self.src = fp.read()
+        if self.src == '' or self.src[-1] != '\n':
+            self.src += '\n'
+
+        # Prime the lexer:
+        self.accept()
+
+        # Parse until done:
         while self.tok is not None:
             info = self.info
             if self.tok == '#':
@@ -71,12 +86,12 @@ def __init__(self, fname, previously_included=None, incl_info=None):
                 if not isinstance(include, str):
                     raise QAPISemError(info,
                                        "value of 'include' must be a string")
-                incl_fname = os.path.join(os.path.dirname(fname),
+                incl_fname = os.path.join(os.path.dirname(self._fname),
                                           include)
                 self.exprs.append({'expr': {'include': incl_fname},
                                    'info': info})
                 exprs_include = self._include(include, info, incl_fname,
-                                              previously_included)
+                                              self._included)
                 if exprs_include:
                     self.exprs.extend(exprs_include.exprs)
                     self.docs.extend(exprs_include.docs)
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 05/22] qapi/parser: Assert lexer value is a string
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (3 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 04/22] qapi/parser: factor parsing routine into method John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-24  8:33   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 06/22] qapi/parser: assert get_expr returns object in outer loop John Snow
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

The type checker can't narrow the type of the token value to string,
because it's only loosely correlated with the return token.

We know that a token of '#' should always have a "str" value.
Add an assertion.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index f519518075e..c75434e75a5 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -303,6 +303,7 @@ def get_doc(self, info):
         cur_doc = QAPIDoc(self, info)
         self.accept(False)
         while self.tok == '#':
+            assert isinstance(self.val, str), "Expected str value"
             if self.val.startswith('##'):
                 # End of doc comment
                 if self.val != '##':
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 06/22] qapi/parser: assert get_expr returns object in outer loop
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (4 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 05/22] qapi/parser: Assert lexer value is a string John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-25  7:23   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 07/22] qapi/parser: assert object keys are strings John Snow
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

get_expr can return many things, depending on where it is used. In the
outer parsing loop, we expect and require it to return a dict.

(It's (maybe) a bit involved to teach mypy that when nested is False,
this is already always True. I'll look into it later, maybe.)

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index c75434e75a5..6b443b1247e 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -78,6 +78,8 @@ def _parse(self):
                 continue
 
             expr = self.get_expr(False)
+            assert isinstance(expr, dict)  # Guaranteed when nested=False
+
             if 'include' in expr:
                 self.reject_expr_doc(cur_doc)
                 if len(expr) != 1:
@@ -278,6 +280,7 @@ def get_values(self):
             self.accept()
 
     def get_expr(self, nested):
+        # TODO: Teach mypy that nested=False means the retval is a Dict.
         if self.tok != '{' and not nested:
             raise QAPIParseError(self, "expected '{'")
         if self.tok == '{':
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 07/22] qapi/parser: assert object keys are strings
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (5 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 06/22] qapi/parser: assert get_expr returns object in outer loop John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-25  7:27   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 08/22] qapi/parser: Use @staticmethod where appropriate John Snow
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

The single quote token implies the value is a string. Assert this to be
the case.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 6b443b1247e..8d1fe0ddda5 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -246,6 +246,8 @@ def get_members(self):
             raise QAPIParseError(self, "expected string or '}'")
         while True:
             key = self.val
+            assert isinstance(key, str)  # Guaranteed by tok == "'"
+
             self.accept()
             if self.tok != ':':
                 raise QAPIParseError(self, "expected ':'")
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 08/22] qapi/parser: Use @staticmethod where appropriate
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (6 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 07/22] qapi/parser: assert object keys are strings John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 09/22] qapi: add match_nofail helper John Snow
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

No self, no thank you!

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 8d1fe0ddda5..f2425c0228a 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -127,7 +127,8 @@ def reject_expr_doc(doc):
                 "documentation for '%s' is not followed by the definition"
                 % doc.symbol)
 
-    def _include(self, include, info, incl_fname, previously_included):
+    @staticmethod
+    def _include(include, info, incl_fname, previously_included):
         incl_abs_fname = os.path.abspath(incl_fname)
         # catch inclusion cycle
         inf = info
@@ -148,7 +149,8 @@ def _include(self, include, info, incl_fname, previously_included):
                 f"can't read include file '{incl_fname}': {err.strerror}"
             ) from err
 
-    def _check_pragma_list_of_str(self, name, value, info):
+    @staticmethod
+    def _check_pragma_list_of_str(name, value, info):
         if (not isinstance(value, list)
                 or any([not isinstance(elt, str) for elt in value])):
             raise QAPISemError(
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 09/22] qapi: add match_nofail helper
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (7 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 08/22] qapi/parser: Use @staticmethod where appropriate John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-25  7:54   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 10/22] qapi/parser: Fix typing of token membership tests John Snow
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

Mypy cannot generally understand that these regex functions cannot
possibly fail. Add a _nofail helper that clarifies this for mypy.
Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/common.py |  8 +++++++-
 scripts/qapi/main.py   |  6 ++----
 scripts/qapi/parser.py | 13 +++++++------
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index cbd3fd81d36..d38c1746767 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -12,7 +12,7 @@
 # See the COPYING file in the top-level directory.
 
 import re
-from typing import Optional, Sequence
+from typing import Match, Optional, Sequence
 
 
 #: Magic string that gets removed along with all space to its right.
@@ -210,3 +210,9 @@ def gen_endif(ifcond: Sequence[str]) -> str:
 #endif /* %(cond)s */
 ''', cond=ifc)
     return ret
+
+
+def match_nofail(pattern: str, string: str) -> Match[str]:
+    match = re.match(pattern, string)
+    assert match is not None
+    return match
diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
index 70f8aa86f37..e8d4ba4b389 100644
--- a/scripts/qapi/main.py
+++ b/scripts/qapi/main.py
@@ -8,11 +8,11 @@
 """
 
 import argparse
-import re
 import sys
 from typing import Optional
 
 from .commands import gen_commands
+from .common import match_nofail
 from .error import QAPIError
 from .events import gen_events
 from .introspect import gen_introspect
@@ -22,9 +22,7 @@
 
 
 def invalid_prefix_char(prefix: str) -> Optional[str]:
-    match = re.match(r'([A-Za-z_.-][A-Za-z0-9_.-]*)?', prefix)
-    # match cannot be None, but mypy cannot infer that.
-    assert match is not None
+    match = match_nofail(r'([A-Za-z_.-][A-Za-z0-9_.-]*)?', prefix)
     if match.end() != len(prefix):
         return prefix[match.end()]
     return None
diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index f2425c0228a..7f3c009f64b 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -18,6 +18,7 @@
 import os
 import re
 
+from .common import match_nofail
 from .error import QAPISemError, QAPISourceError
 from .source import QAPISourceInfo
 
@@ -235,8 +236,8 @@ def accept(self, skip_comment=True):
             elif not self.tok.isspace():
                 # Show up to next structural, whitespace or quote
                 # character
-                match = re.match('[^[\\]{}:,\\s\'"]+',
-                                 self.src[self.cursor-1:])
+                match = match_nofail('[^[\\]{}:,\\s\'"]+',
+                                     self.src[self.cursor-1:])
                 raise QAPIParseError(self, "stray '%s'" % match.group(0))
 
     def get_members(self):
@@ -369,7 +370,7 @@ def append(self, line):
             # Strip leading spaces corresponding to the expected indent level
             # Blank lines are always OK.
             if line:
-                indent = re.match(r'\s*', line).end()
+                indent = match_nofail(r'\s*', line).end()
                 if indent < self._indent:
                     raise QAPIParseError(
                         self._parser,
@@ -505,7 +506,7 @@ def _append_args_line(self, line):
             # from line and replace it with spaces so that 'f' has the
             # same index as it did in the original line and can be
             # handled the same way we will handle following lines.
-            indent = re.match(r'@\S*:\s*', line).end()
+            indent = match_nofail(r'@\S*:\s*', line).end()
             line = line[indent:]
             if not line:
                 # Line was just the "@arg:" header; following lines
@@ -540,7 +541,7 @@ def _append_features_line(self, line):
             # from line and replace it with spaces so that 'f' has the
             # same index as it did in the original line and can be
             # handled the same way we will handle following lines.
-            indent = re.match(r'@\S*:\s*', line).end()
+            indent = match_nofail(r'@\S*:\s*', line).end()
             line = line[indent:]
             if not line:
                 # Line was just the "@arg:" header; following lines
@@ -586,7 +587,7 @@ def _append_various_line(self, line):
             # from line and replace it with spaces so that 'f' has the
             # same index as it did in the original line and can be
             # handled the same way we will handle following lines.
-            indent = re.match(r'\S*:\s*', line).end()
+            indent = match_nofail(r'\S*:\s*', line).end()
             line = line[indent:]
             if not line:
                 # Line was just the "Section:" header; following lines
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 10/22] qapi/parser: Fix typing of token membership tests
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (8 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 09/22] qapi: add match_nofail helper John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-25  7:59   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard John Snow
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

When the token can be None, we can't use 'x in "abc"' style membership
tests to group types of tokens together, because 'None in "abc"' is a
TypeError.

Easy enough to fix, if not a little ugly.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 7f3c009f64b..16fd36f8391 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -272,7 +272,7 @@ def get_values(self):
         if self.tok == ']':
             self.accept()
             return expr
-        if self.tok not in "{['tf":
+        if self.tok is None or self.tok not in "{['tf":
             raise QAPIParseError(
                 self, "expected '{', '[', ']', string, or boolean")
         while True:
@@ -294,7 +294,8 @@ def get_expr(self, nested):
         elif self.tok == '[':
             self.accept()
             expr = self.get_values()
-        elif self.tok in "'tf":
+        elif self.tok and self.tok in "'tf":
+            assert isinstance(self.val, (str, bool))
             expr = self.val
             self.accept()
         else:
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (9 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 10/22] qapi/parser: Fix typing of token membership tests John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-25 12:32   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 12/22] qapi/parser: add type hint annotations John Snow
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

TypeGuards wont exist in Python proper until 3.10. Ah well. We can hack
up our own by declaring this function to return the type we claim it
checks for and using this to safely downcast object -> List[str].

In so doing, I bring this function in-line under _pragma so it can use
the 'info' object in its closure. Having done this, _pragma also now
no longer needs to take a 'self' parameter, so drop it.

Rename it to just _check(), to help us out with the line-length -- and
now that it's contained within _pragma, it is contextually easier to see
how it's used anyway -- especially with types.

Signed-off-by: John Snow <jsnow@redhat.com>

---

I left (name, value) as args to avoid creating a fully magic "macro",
though, I thought this was too weird:

    info.pragma.foobar = _check()

and it looked more reasonable as:

    info.pragma.foobar = _check(name, value)

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 16fd36f8391..d02a134aae9 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -17,6 +17,7 @@
 from collections import OrderedDict
 import os
 import re
+from typing import List
 
 from .common import match_nofail
 from .error import QAPISemError, QAPISourceError
@@ -151,28 +152,27 @@ def _include(include, info, incl_fname, previously_included):
             ) from err
 
     @staticmethod
-    def _check_pragma_list_of_str(name, value, info):
-        if (not isinstance(value, list)
-                or any([not isinstance(elt, str) for elt in value])):
-            raise QAPISemError(
-                info,
-                "pragma %s must be a list of strings" % name)
+    def _pragma(name, value, info):
+
+        def _check(name, value) -> List[str]:
+            if (not isinstance(value, list) or
+                    any([not isinstance(elt, str) for elt in value])):
+                raise QAPISemError(
+                    info,
+                    "pragma %s must be a list of strings" % name)
+            return value
 
-    def _pragma(self, name, value, info):
         if name == 'doc-required':
             if not isinstance(value, bool):
                 raise QAPISemError(info,
                                    "pragma 'doc-required' must be boolean")
             info.pragma.doc_required = value
         elif name == 'command-name-exceptions':
-            self._check_pragma_list_of_str(name, value, info)
-            info.pragma.command_name_exceptions = value
+            info.pragma.command_name_exceptions = _check(name, value)
         elif name == 'command-returns-exceptions':
-            self._check_pragma_list_of_str(name, value, info)
-            info.pragma.command_returns_exceptions = value
+            info.pragma.command_returns_exceptions = _check(name, value)
         elif name == 'member-name-exceptions':
-            self._check_pragma_list_of_str(name, value, info)
-            info.pragma.member_name_exceptions = value
+            info.pragma.member_name_exceptions = _check(name, value)
         else:
             raise QAPISemError(info, "unknown pragma '%s'" % name)
 
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 12/22] qapi/parser: add type hint annotations
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (10 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-25 12:34   ` Markus Armbruster
  2021-05-06  1:27   ` John Snow
  2021-04-22  3:07 ` [PATCH 13/22] qapi/parser: [RFC] overload the return type of get_expr John Snow
                   ` (9 subsequent siblings)
  21 siblings, 2 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

Annotations do not change runtime behavior.
This commit *only* adds annotations.

(Annotations for QAPIDoc are in a later commit.)

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 61 ++++++++++++++++++++++++++++--------------
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index d02a134aae9..f2b57d5642a 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -17,16 +17,29 @@
 from collections import OrderedDict
 import os
 import re
-from typing import List
+from typing import (
+    Dict,
+    List,
+    Optional,
+    Set,
+    Union,
+)
 
 from .common import match_nofail
 from .error import QAPISemError, QAPISourceError
 from .source import QAPISourceInfo
 
 
+#: Represents a parsed JSON object; semantically: one QAPI schema expression.
+Expression = Dict[str, object]
+
+# Return value alias for get_expr().
+_ExprValue = Union[List[object], Dict[str, object], str, bool]
+
+
 class QAPIParseError(QAPISourceError):
     """Error class for all QAPI schema parsing errors."""
-    def __init__(self, parser, msg):
+    def __init__(self, parser: 'QAPISchemaParser', msg: str):
         col = 1
         for ch in parser.src[parser.line_pos:parser.pos]:
             if ch == '\t':
@@ -38,7 +51,10 @@ def __init__(self, parser, msg):
 
 class QAPISchemaParser:
 
-    def __init__(self, fname, previously_included=None, incl_info=None):
+    def __init__(self,
+                 fname: str,
+                 previously_included: Optional[Set[str]] = None,
+                 incl_info: Optional[QAPISourceInfo] = None):
         self._fname = fname
         self._included = previously_included or set()
         self._included.add(os.path.abspath(self._fname))
@@ -46,20 +62,20 @@ def __init__(self, fname, previously_included=None, incl_info=None):
 
         # Lexer state (see `accept` for details):
         self.info = QAPISourceInfo(self._fname, incl_info)
-        self.tok = None
+        self.tok: Optional[str] = None
         self.pos = 0
         self.cursor = 0
-        self.val = None
+        self.val: Optional[Union[bool, str]] = None
         self.line_pos = 0
 
         # Parser output:
-        self.exprs = []
-        self.docs = []
+        self.exprs: List[Expression] = []
+        self.docs: List[QAPIDoc] = []
 
         # Showtime!
         self._parse()
 
-    def _parse(self):
+    def _parse(self) -> None:
         cur_doc = None
 
         with open(self._fname, 'r', encoding='utf-8') as fp:
@@ -122,7 +138,7 @@ def _parse(self):
         self.reject_expr_doc(cur_doc)
 
     @staticmethod
-    def reject_expr_doc(doc):
+    def reject_expr_doc(doc: Optional['QAPIDoc']) -> None:
         if doc and doc.symbol:
             raise QAPISemError(
                 doc.info,
@@ -130,10 +146,14 @@ def reject_expr_doc(doc):
                 % doc.symbol)
 
     @staticmethod
-    def _include(include, info, incl_fname, previously_included):
+    def _include(include: str,
+                 info: QAPISourceInfo,
+                 incl_fname: str,
+                 previously_included: Set[str]
+                 ) -> Optional['QAPISchemaParser']:
         incl_abs_fname = os.path.abspath(incl_fname)
         # catch inclusion cycle
-        inf = info
+        inf: Optional[QAPISourceInfo] = info
         while inf:
             if incl_abs_fname == os.path.abspath(inf.fname):
                 raise QAPISemError(info, "inclusion loop for %s" % include)
@@ -152,9 +172,9 @@ def _include(include, info, incl_fname, previously_included):
             ) from err
 
     @staticmethod
-    def _pragma(name, value, info):
+    def _pragma(name: str, value: object, info: QAPISourceInfo) -> None:
 
-        def _check(name, value) -> List[str]:
+        def _check(name: str, value: object) -> List[str]:
             if (not isinstance(value, list) or
                     any([not isinstance(elt, str) for elt in value])):
                 raise QAPISemError(
@@ -176,7 +196,7 @@ def _check(name, value) -> List[str]:
         else:
             raise QAPISemError(info, "unknown pragma '%s'" % name)
 
-    def accept(self, skip_comment=True):
+    def accept(self, skip_comment: bool = True) -> None:
         while True:
             self.tok = self.src[self.cursor]
             self.pos = self.cursor
@@ -240,8 +260,8 @@ def accept(self, skip_comment=True):
                                      self.src[self.cursor-1:])
                 raise QAPIParseError(self, "stray '%s'" % match.group(0))
 
-    def get_members(self):
-        expr = OrderedDict()
+    def get_members(self) -> 'OrderedDict[str, object]':
+        expr: 'OrderedDict[str, object]' = OrderedDict()
         if self.tok == '}':
             self.accept()
             return expr
@@ -267,8 +287,8 @@ def get_members(self):
             if self.tok != "'":
                 raise QAPIParseError(self, "expected string")
 
-    def get_values(self):
-        expr = []
+    def get_values(self) -> List[object]:
+        expr: List[object] = []
         if self.tok == ']':
             self.accept()
             return expr
@@ -284,8 +304,9 @@ def get_values(self):
                 raise QAPIParseError(self, "expected ',' or ']'")
             self.accept()
 
-    def get_expr(self, nested):
+    def get_expr(self, nested: bool = False) -> _ExprValue:
         # TODO: Teach mypy that nested=False means the retval is a Dict.
+        expr: _ExprValue
         if self.tok != '{' and not nested:
             raise QAPIParseError(self, "expected '{'")
         if self.tok == '{':
@@ -303,7 +324,7 @@ def get_expr(self, nested):
                 self, "expected '{', '[', string, or boolean")
         return expr
 
-    def get_doc(self, info):
+    def get_doc(self, info: QAPISourceInfo) -> List['QAPIDoc']:
         if self.val != '##':
             raise QAPIParseError(
                 self, "junk after '##' at start of documentation comment")
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 13/22] qapi/parser: [RFC] overload the return type of get_expr
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (11 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 12/22] qapi/parser: add type hint annotations John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 14/22] qapi/parser: Remove superfluous list constructor John Snow
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

Teach mypy that there are two possible return types here: either an
Expression, or ... something else.

Not a SLOC reduction, but it does remove an assertion. It also isn't
much safer than a cast, mypy has no insight into if overloads are true
or not. It's on the honor system.

I thought I'd demonstrate its use, though.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index f2b57d5642a..cbdddc344e7 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -23,6 +23,7 @@
     Optional,
     Set,
     Union,
+    overload,
 )
 
 from .common import match_nofail
@@ -95,8 +96,7 @@ def _parse(self) -> None:
                     self.docs.append(cur_doc)
                 continue
 
-            expr = self.get_expr(False)
-            assert isinstance(expr, dict)  # Guaranteed when nested=False
+            expr = self.get_expr()
 
             if 'include' in expr:
                 self.reject_expr_doc(cur_doc)
@@ -304,8 +304,15 @@ def get_values(self) -> List[object]:
                 raise QAPIParseError(self, "expected ',' or ']'")
             self.accept()
 
+    @overload
+    # No nesting, must be an Expression.
+    def get_expr(self) -> Expression: ...
+
+    @overload
+    # Possibly nested, might be anything.
+    def get_expr(self, nested: bool) -> _ExprValue: ...
+
     def get_expr(self, nested: bool = False) -> _ExprValue:
-        # TODO: Teach mypy that nested=False means the retval is a Dict.
         expr: _ExprValue
         if self.tok != '{' and not nested:
             raise QAPIParseError(self, "expected '{'")
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 14/22] qapi/parser: Remove superfluous list constructor
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (12 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 13/22] qapi/parser: [RFC] overload the return type of get_expr John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 15/22] qapi/parser: allow 'ch' variable name John Snow
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

A generator suffices (and quiets a pylint warning.)

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index cbdddc344e7..dbbd0fcbc2f 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -176,7 +176,7 @@ def _pragma(name: str, value: object, info: QAPISourceInfo) -> None:
 
         def _check(name: str, value: object) -> List[str]:
             if (not isinstance(value, list) or
-                    any([not isinstance(elt, str) for elt in value])):
+                    any(not isinstance(elt, str) for elt in value)):
                 raise QAPISemError(
                     info,
                     "pragma %s must be a list of strings" % name)
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 15/22] qapi/parser: allow 'ch' variable name
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (13 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 14/22] qapi/parser: Remove superfluous list constructor John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 16/22] qapi/parser: add docstrings John Snow
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

We can have a two-letter variable name, as a treat.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/pylintrc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/qapi/pylintrc b/scripts/qapi/pylintrc
index 88efbf71cb2..c5275d5f59b 100644
--- a/scripts/qapi/pylintrc
+++ b/scripts/qapi/pylintrc
@@ -43,6 +43,7 @@ good-names=i,
            _,
            fp,  # fp = open(...)
            fd,  # fd = os.open(...)
+           ch,
 
 [VARIABLES]
 
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 16/22] qapi/parser: add docstrings
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (14 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 15/22] qapi/parser: allow 'ch' variable name John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-25 13:27   ` Markus Armbruster
  2021-04-22  3:07 ` [PATCH 17/22] CHECKPOINT John Snow
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

Signed-off-by: John Snow <jsnow@redhat.com>

---

My hubris is infinite.

OK, I only added a few -- to help me remember how the parser works at a glance.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index dbbd0fcbc2f..8fc77808ace 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
 
 
 class QAPISchemaParser:
+    """
+    Performs parsing of a QAPI schema source file.
 
+    :param fname: Path to the source file.
+    :param previously_included:
+        The absolute paths of previously included source files.
+        Only used by recursive calls to avoid re-parsing files.
+    :param incl_info:
+       `QAPISourceInfo` for the parent document.
+       This may be None if this is the root schema document.
+
+    :ivar exprs: Resulting parsed expressions.
+    :ivar docs: Resulting parsed documentation blocks.
+
+    :raise OSError: For problems opening the root schema document.
+    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
+    :raise QAPISemError: For various semantic issues with the schema.
+    """
     def __init__(self,
                  fname: str,
                  previously_included: Optional[Set[str]] = None,
@@ -77,6 +94,11 @@ def __init__(self,
         self._parse()
 
     def _parse(self) -> None:
+        """
+        Parse the QAPI schema document.
+
+        :return: None; results are stored in ``exprs`` and ``docs``.
+        """
         cur_doc = None
 
         with open(self._fname, 'r', encoding='utf-8') as fp:
@@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
             raise QAPISemError(info, "unknown pragma '%s'" % name)
 
     def accept(self, skip_comment: bool = True) -> None:
+        """
+        Read the next lexeme and process it into a token.
+
+        :Object state:
+          :tok: represents the token type. See below for values.
+          :pos: is the position of the first character in the lexeme.
+          :cursor: is the position of the next character.
+          :val: is the variable value of the token, if any.
+
+        Single-character tokens:
+
+        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
+        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
+        lexeme.  ``val`` is ``None``.
+
+        Multi-character tokens:
+
+        - ``COMMENT``:
+
+          - This token is not normally yielded by the lexer, but it
+            can be when ``skip_comment`` is False.
+          - ``tok`` is the value ``"#"``.
+          - ``val`` is a string including all chars until end-of-line.
+
+        - ``STRING``:
+
+          - ``tok`` is the ``"'"``, the single quote.
+          - ``value`` is the string, *excluding* the quotes.
+
+        - ``TRUE`` and ``FALSE``:
+
+          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
+          - ``val`` is either ``True`` or ``False`` accordingly.
+
+        - ``NEWLINE`` and ``SPACE``:
+
+          - These are consumed by the lexer directly. ``line_pos`` and
+            ``info`` are advanced when ``NEWLINE`` is encountered.
+            ``tok`` is set to ``None`` upon reaching EOF.
+
+        :param skip_comment:
+            When false, return ``COMMENT`` tokens.
+            This is used when reading documentation blocks.
+        """
         while True:
             self.tok = self.src[self.cursor]
             self.pos = self.cursor
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 17/22] CHECKPOINT
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (15 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 16/22] qapi/parser: add docstrings John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 18/22] qapi: [WIP] Rip QAPIDoc out of parser.py John Snow
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

As of here, parser is actually fully typed, and QAPIDoc is not. Below,
there are a few extra patches that "prove" this, but they are not
necessarily meant for inclusion.

They could theoretically be included anyway, but a few of them would
need to be squashed together to ensure our "no intermediate breakages"
rule, and a few things would need to be re-ordered.

Consider them [RFC], but optional and completely safe to drop.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/mypy.ini | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/qapi/mypy.ini b/scripts/qapi/mypy.ini
index 54ca4483d6d..d7bbb2dc9c7 100644
--- a/scripts/qapi/mypy.ini
+++ b/scripts/qapi/mypy.ini
@@ -4,6 +4,7 @@ disallow_untyped_calls = False
 python_version = 3.6
 
 [mypy-qapi.parser]
+# QAPISchemaParser is done, I promise!
 disallow_untyped_defs = False
 disallow_incomplete_defs = False
 check_untyped_defs = False
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 18/22] qapi: [WIP] Rip QAPIDoc out of parser.py
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (16 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 17/22] CHECKPOINT John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 19/22] qapi: [WIP] Add type ignores for qapidoc.py John Snow
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

This (rather unglamorously) rips QAPIDoc out of parser.py. It does not
leave a working solution in its place, opting instead just for code
movement.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py  | 342 -------------------------------------
 scripts/qapi/qapidoc.py | 362 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 362 insertions(+), 342 deletions(-)
 create mode 100644 scripts/qapi/qapidoc.py

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 8fc77808ace..6fed742124d 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -16,7 +16,6 @@
 
 from collections import OrderedDict
 import os
-import re
 from typing import (
     Dict,
     List,
@@ -430,344 +429,3 @@ def get_doc(self, info: QAPISourceInfo) -> List['QAPIDoc']:
             self.accept(False)
 
         raise QAPIParseError(self, "documentation comment must end with '##'")
-
-
-class QAPIDoc:
-    """
-    A documentation comment block, either definition or free-form
-
-    Definition documentation blocks consist of
-
-    * a body section: one line naming the definition, followed by an
-      overview (any number of lines)
-
-    * argument sections: a description of each argument (for commands
-      and events) or member (for structs, unions and alternates)
-
-    * features sections: a description of each feature flag
-
-    * additional (non-argument) sections, possibly tagged
-
-    Free-form documentation blocks consist only of a body section.
-    """
-
-    class Section:
-        def __init__(self, parser, name=None, indent=0):
-            # parser, for error messages about indentation
-            self._parser = parser
-            # optional section name (argument/member or section name)
-            self.name = name
-            self.text = ''
-            # the expected indent level of the text of this section
-            self._indent = indent
-
-        def append(self, line):
-            # Strip leading spaces corresponding to the expected indent level
-            # Blank lines are always OK.
-            if line:
-                indent = match_nofail(r'\s*', line).end()
-                if indent < self._indent:
-                    raise QAPIParseError(
-                        self._parser,
-                        "unexpected de-indent (expected at least %d spaces)" %
-                        self._indent)
-                line = line[self._indent:]
-
-            self.text += line.rstrip() + '\n'
-
-    class ArgSection(Section):
-        def __init__(self, parser, name, indent=0):
-            super().__init__(parser, name, indent)
-            self.member = None
-
-        def connect(self, member):
-            self.member = member
-
-    def __init__(self, parser, info):
-        # self._parser is used to report errors with QAPIParseError.  The
-        # resulting error position depends on the state of the parser.
-        # It happens to be the beginning of the comment.  More or less
-        # servicable, but action at a distance.
-        self._parser = parser
-        self.info = info
-        self.symbol = None
-        self.body = QAPIDoc.Section(parser)
-        # dict mapping parameter name to ArgSection
-        self.args = OrderedDict()
-        self.features = OrderedDict()
-        # a list of Section
-        self.sections = []
-        # the current section
-        self._section = self.body
-        self._append_line = self._append_body_line
-
-    def has_section(self, name):
-        """Return True if we have a section with this name."""
-        for i in self.sections:
-            if i.name == name:
-                return True
-        return False
-
-    def append(self, line):
-        """
-        Parse a comment line and add it to the documentation.
-
-        The way that the line is dealt with depends on which part of
-        the documentation we're parsing right now:
-        * The body section: ._append_line is ._append_body_line
-        * An argument section: ._append_line is ._append_args_line
-        * A features section: ._append_line is ._append_features_line
-        * An additional section: ._append_line is ._append_various_line
-        """
-        line = line[1:]
-        if not line:
-            self._append_freeform(line)
-            return
-
-        if line[0] != ' ':
-            raise QAPIParseError(self._parser, "missing space after #")
-        line = line[1:]
-        self._append_line(line)
-
-    def end_comment(self):
-        self._end_section()
-
-    @staticmethod
-    def _is_section_tag(name):
-        return name in ('Returns:', 'Since:',
-                        # those are often singular or plural
-                        'Note:', 'Notes:',
-                        'Example:', 'Examples:',
-                        'TODO:')
-
-    def _append_body_line(self, line):
-        """
-        Process a line of documentation text in the body section.
-
-        If this a symbol line and it is the section's first line, this
-        is a definition documentation block for that symbol.
-
-        If it's a definition documentation block, another symbol line
-        begins the argument section for the argument named by it, and
-        a section tag begins an additional section.  Start that
-        section and append the line to it.
-
-        Else, append the line to the current section.
-        """
-        name = line.split(' ', 1)[0]
-        # FIXME not nice: things like '#  @foo:' and '# @foo: ' aren't
-        # recognized, and get silently treated as ordinary text
-        if not self.symbol and not self.body.text and line.startswith('@'):
-            if not line.endswith(':'):
-                raise QAPIParseError(self._parser, "line should end with ':'")
-            self.symbol = line[1:-1]
-            # FIXME invalid names other than the empty string aren't flagged
-            if not self.symbol:
-                raise QAPIParseError(self._parser, "invalid name")
-        elif self.symbol:
-            # This is a definition documentation block
-            if name.startswith('@') and name.endswith(':'):
-                self._append_line = self._append_args_line
-                self._append_args_line(line)
-            elif line == 'Features:':
-                self._append_line = self._append_features_line
-            elif self._is_section_tag(name):
-                self._append_line = self._append_various_line
-                self._append_various_line(line)
-            else:
-                self._append_freeform(line)
-        else:
-            # This is a free-form documentation block
-            self._append_freeform(line)
-
-    def _append_args_line(self, line):
-        """
-        Process a line of documentation text in an argument section.
-
-        A symbol line begins the next argument section, a section tag
-        section or a non-indented line after a blank line begins an
-        additional section.  Start that section and append the line to
-        it.
-
-        Else, append the line to the current section.
-
-        """
-        name = line.split(' ', 1)[0]
-
-        if name.startswith('@') and name.endswith(':'):
-            # If line is "@arg:   first line of description", find
-            # the index of 'f', which is the indent we expect for any
-            # following lines.  We then remove the leading "@arg:"
-            # from line and replace it with spaces so that 'f' has the
-            # same index as it did in the original line and can be
-            # handled the same way we will handle following lines.
-            indent = match_nofail(r'@\S*:\s*', line).end()
-            line = line[indent:]
-            if not line:
-                # Line was just the "@arg:" header; following lines
-                # are not indented
-                indent = 0
-            else:
-                line = ' ' * indent + line
-            self._start_args_section(name[1:-1], indent)
-        elif self._is_section_tag(name):
-            self._append_line = self._append_various_line
-            self._append_various_line(line)
-            return
-        elif (self._section.text.endswith('\n\n')
-              and line and not line[0].isspace()):
-            if line == 'Features:':
-                self._append_line = self._append_features_line
-            else:
-                self._start_section()
-                self._append_line = self._append_various_line
-                self._append_various_line(line)
-            return
-
-        self._append_freeform(line)
-
-    def _append_features_line(self, line):
-        name = line.split(' ', 1)[0]
-
-        if name.startswith('@') and name.endswith(':'):
-            # If line is "@arg:   first line of description", find
-            # the index of 'f', which is the indent we expect for any
-            # following lines.  We then remove the leading "@arg:"
-            # from line and replace it with spaces so that 'f' has the
-            # same index as it did in the original line and can be
-            # handled the same way we will handle following lines.
-            indent = match_nofail(r'@\S*:\s*', line).end()
-            line = line[indent:]
-            if not line:
-                # Line was just the "@arg:" header; following lines
-                # are not indented
-                indent = 0
-            else:
-                line = ' ' * indent + line
-            self._start_features_section(name[1:-1], indent)
-        elif self._is_section_tag(name):
-            self._append_line = self._append_various_line
-            self._append_various_line(line)
-            return
-        elif (self._section.text.endswith('\n\n')
-              and line and not line[0].isspace()):
-            self._start_section()
-            self._append_line = self._append_various_line
-            self._append_various_line(line)
-            return
-
-        self._append_freeform(line)
-
-    def _append_various_line(self, line):
-        """
-        Process a line of documentation text in an additional section.
-
-        A symbol line is an error.
-
-        A section tag begins an additional section.  Start that
-        section and append the line to it.
-
-        Else, append the line to the current section.
-        """
-        name = line.split(' ', 1)[0]
-
-        if name.startswith('@') and name.endswith(':'):
-            raise QAPIParseError(self._parser,
-                                 "'%s' can't follow '%s' section"
-                                 % (name, self.sections[0].name))
-        if self._is_section_tag(name):
-            # If line is "Section:   first line of description", find
-            # the index of 'f', which is the indent we expect for any
-            # following lines.  We then remove the leading "Section:"
-            # from line and replace it with spaces so that 'f' has the
-            # same index as it did in the original line and can be
-            # handled the same way we will handle following lines.
-            indent = match_nofail(r'\S*:\s*', line).end()
-            line = line[indent:]
-            if not line:
-                # Line was just the "Section:" header; following lines
-                # are not indented
-                indent = 0
-            else:
-                line = ' ' * indent + line
-            self._start_section(name[:-1], indent)
-
-        self._append_freeform(line)
-
-    def _start_symbol_section(self, symbols_dict, name, indent):
-        # FIXME invalid names other than the empty string aren't flagged
-        if not name:
-            raise QAPIParseError(self._parser, "invalid parameter name")
-        if name in symbols_dict:
-            raise QAPIParseError(self._parser,
-                                 "'%s' parameter name duplicated" % name)
-        assert not self.sections
-        self._end_section()
-        self._section = QAPIDoc.ArgSection(self._parser, name, indent)
-        symbols_dict[name] = self._section
-
-    def _start_args_section(self, name, indent):
-        self._start_symbol_section(self.args, name, indent)
-
-    def _start_features_section(self, name, indent):
-        self._start_symbol_section(self.features, name, indent)
-
-    def _start_section(self, name=None, indent=0):
-        if name in ('Returns', 'Since') and self.has_section(name):
-            raise QAPIParseError(self._parser,
-                                 "duplicated '%s' section" % name)
-        self._end_section()
-        self._section = QAPIDoc.Section(self._parser, name, indent)
-        self.sections.append(self._section)
-
-    def _end_section(self):
-        if self._section:
-            text = self._section.text = self._section.text.strip()
-            if self._section.name and (not text or text.isspace()):
-                raise QAPIParseError(
-                    self._parser,
-                    "empty doc section '%s'" % self._section.name)
-            self._section = None
-
-    def _append_freeform(self, line):
-        match = re.match(r'(@\S+:)', line)
-        if match:
-            raise QAPIParseError(self._parser,
-                                 "'%s' not allowed in free-form documentation"
-                                 % match.group(1))
-        self._section.append(line)
-
-    def connect_member(self, member):
-        if member.name not in self.args:
-            # Undocumented TODO outlaw
-            self.args[member.name] = QAPIDoc.ArgSection(self._parser,
-                                                        member.name)
-        self.args[member.name].connect(member)
-
-    def connect_feature(self, feature):
-        if feature.name not in self.features:
-            raise QAPISemError(feature.info,
-                               "feature '%s' lacks documentation"
-                               % feature.name)
-        self.features[feature.name].connect(feature)
-
-    def check_expr(self, expr):
-        if self.has_section('Returns') and 'command' not in expr:
-            raise QAPISemError(self.info,
-                               "'Returns:' is only valid for commands")
-
-    def check(self):
-
-        def check_args_section(args, info, what):
-            bogus = [name for name, section in args.items()
-                     if not section.member]
-            if bogus:
-                raise QAPISemError(
-                    self.info,
-                    "documented member%s '%s' %s not exist"
-                    % ("s" if len(bogus) > 1 else "",
-                       "', '".join(bogus),
-                       "do" if len(bogus) > 1 else "does"))
-
-        check_args_section(self.args, self.info, 'members')
-        check_args_section(self.features, self.info, 'features')
diff --git a/scripts/qapi/qapidoc.py b/scripts/qapi/qapidoc.py
new file mode 100644
index 00000000000..eb24ea12a06
--- /dev/null
+++ b/scripts/qapi/qapidoc.py
@@ -0,0 +1,362 @@
+# -*- coding: utf-8 -*-
+#
+# QAPI schema (doc) parser
+#
+# Copyright IBM, Corp. 2011
+# Copyright (c) 2013-2019 Red Hat Inc.
+#
+# Authors:
+#  Anthony Liguori <aliguori@us.ibm.com>
+#  Markus Armbruster <armbru@redhat.com>
+#  Marc-André Lureau <marcandre.lureau@redhat.com>
+#  Kevin Wolf <kwolf@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2.
+# See the COPYING file in the top-level directory.
+
+from collections import OrderedDict
+import re
+
+from .common import match_nofail
+from .error import QAPISemError
+
+
+class QAPIDoc:
+    """
+    A documentation comment block, either definition or free-form
+
+    Definition documentation blocks consist of
+
+    * a body section: one line naming the definition, followed by an
+      overview (any number of lines)
+
+    * argument sections: a description of each argument (for commands
+      and events) or member (for structs, unions and alternates)
+
+    * features sections: a description of each feature flag
+
+    * additional (non-argument) sections, possibly tagged
+
+    Free-form documentation blocks consist only of a body section.
+    """
+
+    class Section:
+        def __init__(self, parser, name=None, indent=0):
+            # parser, for error messages about indentation
+            self._parser = parser
+            # optional section name (argument/member or section name)
+            self.name = name
+            self.text = ''
+            # the expected indent level of the text of this section
+            self._indent = indent
+
+        def append(self, line):
+            # Strip leading spaces corresponding to the expected indent level
+            # Blank lines are always OK.
+            if line:
+                indent = match_nofail(r'\s*', line).end()
+                if indent < self._indent:
+                    raise QAPIParseError(
+                        self._parser,
+                        "unexpected de-indent (expected at least %d spaces)" %
+                        self._indent)
+                line = line[self._indent:]
+
+            self.text += line.rstrip() + '\n'
+
+    class ArgSection(Section):
+        def __init__(self, parser, name, indent=0):
+            super().__init__(parser, name, indent)
+            self.member = None
+
+        def connect(self, member):
+            self.member = member
+
+    def __init__(self, parser, info):
+        # self._parser is used to report errors with QAPIParseError.  The
+        # resulting error position depends on the state of the parser.
+        # It happens to be the beginning of the comment.  More or less
+        # servicable, but action at a distance.
+        self._parser = parser
+        self.info = info
+        self.symbol = None
+        self.body = QAPIDoc.Section(parser)
+        # dict mapping parameter name to ArgSection
+        self.args = OrderedDict()
+        self.features = OrderedDict()
+        # a list of Section
+        self.sections = []
+        # the current section
+        self._section = self.body
+        self._append_line = self._append_body_line
+
+    def has_section(self, name):
+        """Return True if we have a section with this name."""
+        for i in self.sections:
+            if i.name == name:
+                return True
+        return False
+
+    def append(self, line):
+        """
+        Parse a comment line and add it to the documentation.
+
+        The way that the line is dealt with depends on which part of
+        the documentation we're parsing right now:
+        * The body section: ._append_line is ._append_body_line
+        * An argument section: ._append_line is ._append_args_line
+        * A features section: ._append_line is ._append_features_line
+        * An additional section: ._append_line is ._append_various_line
+        """
+        line = line[1:]
+        if not line:
+            self._append_freeform(line)
+            return
+
+        if line[0] != ' ':
+            raise QAPIParseError(self._parser, "missing space after #")
+        line = line[1:]
+        self._append_line(line)
+
+    def end_comment(self):
+        self._end_section()
+
+    @staticmethod
+    def _is_section_tag(name):
+        return name in ('Returns:', 'Since:',
+                        # those are often singular or plural
+                        'Note:', 'Notes:',
+                        'Example:', 'Examples:',
+                        'TODO:')
+
+    def _append_body_line(self, line):
+        """
+        Process a line of documentation text in the body section.
+
+        If this a symbol line and it is the section's first line, this
+        is a definition documentation block for that symbol.
+
+        If it's a definition documentation block, another symbol line
+        begins the argument section for the argument named by it, and
+        a section tag begins an additional section.  Start that
+        section and append the line to it.
+
+        Else, append the line to the current section.
+        """
+        name = line.split(' ', 1)[0]
+        # FIXME not nice: things like '#  @foo:' and '# @foo: ' aren't
+        # recognized, and get silently treated as ordinary text
+        if not self.symbol and not self.body.text and line.startswith('@'):
+            if not line.endswith(':'):
+                raise QAPIParseError(self._parser, "line should end with ':'")
+            self.symbol = line[1:-1]
+            # FIXME invalid names other than the empty string aren't flagged
+            if not self.symbol:
+                raise QAPIParseError(self._parser, "invalid name")
+        elif self.symbol:
+            # This is a definition documentation block
+            if name.startswith('@') and name.endswith(':'):
+                self._append_line = self._append_args_line
+                self._append_args_line(line)
+            elif line == 'Features:':
+                self._append_line = self._append_features_line
+            elif self._is_section_tag(name):
+                self._append_line = self._append_various_line
+                self._append_various_line(line)
+            else:
+                self._append_freeform(line)
+        else:
+            # This is a free-form documentation block
+            self._append_freeform(line)
+
+    def _append_args_line(self, line):
+        """
+        Process a line of documentation text in an argument section.
+
+        A symbol line begins the next argument section, a section tag
+        section or a non-indented line after a blank line begins an
+        additional section.  Start that section and append the line to
+        it.
+
+        Else, append the line to the current section.
+
+        """
+        name = line.split(' ', 1)[0]
+
+        if name.startswith('@') and name.endswith(':'):
+            # If line is "@arg:   first line of description", find
+            # the index of 'f', which is the indent we expect for any
+            # following lines.  We then remove the leading "@arg:"
+            # from line and replace it with spaces so that 'f' has the
+            # same index as it did in the original line and can be
+            # handled the same way we will handle following lines.
+            indent = match_nofail(r'@\S*:\s*', line).end()
+            line = line[indent:]
+            if not line:
+                # Line was just the "@arg:" header; following lines
+                # are not indented
+                indent = 0
+            else:
+                line = ' ' * indent + line
+            self._start_args_section(name[1:-1], indent)
+        elif self._is_section_tag(name):
+            self._append_line = self._append_various_line
+            self._append_various_line(line)
+            return
+        elif (self._section.text.endswith('\n\n')
+              and line and not line[0].isspace()):
+            if line == 'Features:':
+                self._append_line = self._append_features_line
+            else:
+                self._start_section()
+                self._append_line = self._append_various_line
+                self._append_various_line(line)
+            return
+
+        self._append_freeform(line)
+
+    def _append_features_line(self, line):
+        name = line.split(' ', 1)[0]
+
+        if name.startswith('@') and name.endswith(':'):
+            # If line is "@arg:   first line of description", find
+            # the index of 'f', which is the indent we expect for any
+            # following lines.  We then remove the leading "@arg:"
+            # from line and replace it with spaces so that 'f' has the
+            # same index as it did in the original line and can be
+            # handled the same way we will handle following lines.
+            indent = match_nofail(r'@\S*:\s*', line).end()
+            line = line[indent:]
+            if not line:
+                # Line was just the "@arg:" header; following lines
+                # are not indented
+                indent = 0
+            else:
+                line = ' ' * indent + line
+            self._start_features_section(name[1:-1], indent)
+        elif self._is_section_tag(name):
+            self._append_line = self._append_various_line
+            self._append_various_line(line)
+            return
+        elif (self._section.text.endswith('\n\n')
+              and line and not line[0].isspace()):
+            self._start_section()
+            self._append_line = self._append_various_line
+            self._append_various_line(line)
+            return
+
+        self._append_freeform(line)
+
+    def _append_various_line(self, line):
+        """
+        Process a line of documentation text in an additional section.
+
+        A symbol line is an error.
+
+        A section tag begins an additional section.  Start that
+        section and append the line to it.
+
+        Else, append the line to the current section.
+        """
+        name = line.split(' ', 1)[0]
+
+        if name.startswith('@') and name.endswith(':'):
+            raise QAPIParseError(self._parser,
+                                 "'%s' can't follow '%s' section"
+                                 % (name, self.sections[0].name))
+        if self._is_section_tag(name):
+            # If line is "Section:   first line of description", find
+            # the index of 'f', which is the indent we expect for any
+            # following lines.  We then remove the leading "Section:"
+            # from line and replace it with spaces so that 'f' has the
+            # same index as it did in the original line and can be
+            # handled the same way we will handle following lines.
+            indent = match_nofail(r'\S*:\s*', line).end()
+            line = line[indent:]
+            if not line:
+                # Line was just the "Section:" header; following lines
+                # are not indented
+                indent = 0
+            else:
+                line = ' ' * indent + line
+            self._start_section(name[:-1], indent)
+
+        self._append_freeform(line)
+
+    def _start_symbol_section(self, symbols_dict, name, indent):
+        # FIXME invalid names other than the empty string aren't flagged
+        if not name:
+            raise QAPIParseError(self._parser, "invalid parameter name")
+        if name in symbols_dict:
+            raise QAPIParseError(self._parser,
+                                 "'%s' parameter name duplicated" % name)
+        assert not self.sections
+        self._end_section()
+        self._section = QAPIDoc.ArgSection(self._parser, name, indent)
+        symbols_dict[name] = self._section
+
+    def _start_args_section(self, name, indent):
+        self._start_symbol_section(self.args, name, indent)
+
+    def _start_features_section(self, name, indent):
+        self._start_symbol_section(self.features, name, indent)
+
+    def _start_section(self, name=None, indent=0):
+        if name in ('Returns', 'Since') and self.has_section(name):
+            raise QAPIParseError(self._parser,
+                                 "duplicated '%s' section" % name)
+        self._end_section()
+        self._section = QAPIDoc.Section(self._parser, name, indent)
+        self.sections.append(self._section)
+
+    def _end_section(self):
+        if self._section:
+            text = self._section.text = self._section.text.strip()
+            if self._section.name and (not text or text.isspace()):
+                raise QAPIParseError(
+                    self._parser,
+                    "empty doc section '%s'" % self._section.name)
+            self._section = None
+
+    def _append_freeform(self, line):
+        match = re.match(r'(@\S+:)', line)
+        if match:
+            raise QAPIParseError(self._parser,
+                                 "'%s' not allowed in free-form documentation"
+                                 % match.group(1))
+        self._section.append(line)
+
+    def connect_member(self, member):
+        if member.name not in self.args:
+            # Undocumented TODO outlaw
+            self.args[member.name] = QAPIDoc.ArgSection(self._parser,
+                                                        member.name)
+        self.args[member.name].connect(member)
+
+    def connect_feature(self, feature):
+        if feature.name not in self.features:
+            raise QAPISemError(feature.info,
+                               "feature '%s' lacks documentation"
+                               % feature.name)
+        self.features[feature.name].connect(feature)
+
+    def check_expr(self, expr):
+        if self.has_section('Returns') and 'command' not in expr:
+            raise QAPISemError(self.info,
+                               "'Returns:' is only valid for commands")
+
+    def check(self):
+
+        def check_args_section(args, info, what):
+            bogus = [name for name, section in args.items()
+                     if not section.member]
+            if bogus:
+                raise QAPISemError(
+                    self.info,
+                    "documented member%s '%s' %s not exist"
+                    % ("s" if len(bogus) > 1 else "",
+                       "', '".join(bogus),
+                       "do" if len(bogus) > 1 else "does"))
+
+        check_args_section(self.args, self.info, 'members')
+        check_args_section(self.features, self.info, 'features')
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 19/22] qapi: [WIP] Add type ignores for qapidoc.py
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (17 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 18/22] qapi: [WIP] Rip QAPIDoc out of parser.py John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 20/22] qapi: [WIP] Import QAPIDoc from qapidoc Signed-off-by: John Snow <jsnow@redhat.com> John Snow
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/mypy.ini | 5 +++++
 scripts/qapi/pylintrc | 1 +
 2 files changed, 6 insertions(+)

diff --git a/scripts/qapi/mypy.ini b/scripts/qapi/mypy.ini
index d7bbb2dc9c7..1a72be2c788 100644
--- a/scripts/qapi/mypy.ini
+++ b/scripts/qapi/mypy.ini
@@ -9,6 +9,11 @@ disallow_untyped_defs = False
 disallow_incomplete_defs = False
 check_untyped_defs = False
 
+[mypy-qapi.qapidoc]
+disallow_untyped_defs = False
+disallow_incomplete_defs = False
+check_untyped_defs = False
+
 [mypy-qapi.schema]
 disallow_untyped_defs = False
 disallow_incomplete_defs = False
diff --git a/scripts/qapi/pylintrc b/scripts/qapi/pylintrc
index c5275d5f59b..ec7605edade 100644
--- a/scripts/qapi/pylintrc
+++ b/scripts/qapi/pylintrc
@@ -3,6 +3,7 @@
 # Add files or directories matching the regex patterns to the ignore list.
 # The regex matches against base names, not paths.
 ignore-patterns=parser.py,
+                qapidoc.py,
                 schema.py,
 
 
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 20/22] qapi: [WIP] Import QAPIDoc from qapidoc Signed-off-by: John Snow <jsnow@redhat.com>
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (18 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 19/22] qapi: [WIP] Add type ignores for qapidoc.py John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 21/22] qapi: [WIP] Add QAPIDocError John Snow
  2021-04-22  3:07 ` [PATCH 22/22] qapi: [WIP] Enable linters on parser.py John Snow
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/expr.py   | 2 +-
 scripts/qapi/parser.py | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 496f7e0333e..7616646e43d 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -44,7 +44,7 @@
 
 from .common import c_name
 from .error import QAPISemError
-from .parser import QAPIDoc
+from .qapidoc import QAPIDoc
 from .source import QAPISourceInfo
 
 
diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 6fed742124d..3932f05d015 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -27,6 +27,7 @@
 
 from .common import match_nofail
 from .error import QAPISemError, QAPISourceError
+from .qapidoc import QAPIDoc
 from .source import QAPISourceInfo
 
 
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 21/22] qapi: [WIP] Add QAPIDocError
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (19 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 20/22] qapi: [WIP] Import QAPIDoc from qapidoc Signed-off-by: John Snow <jsnow@redhat.com> John Snow
@ 2021-04-22  3:07 ` John Snow
  2021-04-22  3:07 ` [PATCH 22/22] qapi: [WIP] Enable linters on parser.py John Snow
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

Raise this error instead of QAPIParseError and delegate the context up
to the parent parser.

In a chat off-list, we discussed how this design forces us to continue
having less accurate error context information.

Still, it's useful for an extremely simple split without a lot of fuss.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py  | 12 ++++++++++--
 scripts/qapi/qapidoc.py | 36 +++++++++++++++++-------------------
 2 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 3932f05d015..5832bd54eb1 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -27,7 +27,7 @@
 
 from .common import match_nofail
 from .error import QAPISemError, QAPISourceError
-from .qapidoc import QAPIDoc
+from .qapidoc import QAPIDoc, QAPIDocError
 from .source import QAPISourceInfo
 
 
@@ -397,7 +397,7 @@ def get_expr(self, nested: bool = False) -> _ExprValue:
                 self, "expected '{', '[', string, or boolean")
         return expr
 
-    def get_doc(self, info: QAPISourceInfo) -> List['QAPIDoc']:
+    def _get_doc(self, info: QAPISourceInfo) -> List['QAPIDoc']:
         if self.val != '##':
             raise QAPIParseError(
                 self, "junk after '##' at start of documentation comment")
@@ -430,3 +430,11 @@ def get_doc(self, info: QAPISourceInfo) -> List['QAPIDoc']:
             self.accept(False)
 
         raise QAPIParseError(self, "documentation comment must end with '##'")
+
+    def get_doc(self, info: QAPISourceInfo) -> List['QAPIDoc']:
+        # Tie QAPIDocError exceptions to the current parser state,
+        # re-raise as QAPIParseError.
+        try:
+            return self._get_doc(info)
+        except QAPIDocError as err:
+            raise QAPIParseError(self, str(err)) from err
diff --git a/scripts/qapi/qapidoc.py b/scripts/qapi/qapidoc.py
index eb24ea12a06..393af93323f 100644
--- a/scripts/qapi/qapidoc.py
+++ b/scripts/qapi/qapidoc.py
@@ -18,7 +18,11 @@
 import re
 
 from .common import match_nofail
-from .error import QAPISemError
+from .error import QAPIError, QAPISemError
+
+
+class QAPIDocError(QAPIError):
+    """QAPIDoc parsing errors."""
 
 
 class QAPIDoc:
@@ -56,8 +60,7 @@ def append(self, line):
             if line:
                 indent = match_nofail(r'\s*', line).end()
                 if indent < self._indent:
-                    raise QAPIParseError(
-                        self._parser,
+                    raise QAPIDocError(
                         "unexpected de-indent (expected at least %d spaces)" %
                         self._indent)
                 line = line[self._indent:]
@@ -114,7 +117,7 @@ def append(self, line):
             return
 
         if line[0] != ' ':
-            raise QAPIParseError(self._parser, "missing space after #")
+            raise QAPIDocError("missing space after #")
         line = line[1:]
         self._append_line(line)
 
@@ -148,11 +151,11 @@ def _append_body_line(self, line):
         # recognized, and get silently treated as ordinary text
         if not self.symbol and not self.body.text and line.startswith('@'):
             if not line.endswith(':'):
-                raise QAPIParseError(self._parser, "line should end with ':'")
+                raise QAPIDocError("line should end with ':'")
             self.symbol = line[1:-1]
             # FIXME invalid names other than the empty string aren't flagged
             if not self.symbol:
-                raise QAPIParseError(self._parser, "invalid name")
+                raise QAPIDocError("invalid name")
         elif self.symbol:
             # This is a definition documentation block
             if name.startswith('@') and name.endswith(':'):
@@ -261,9 +264,8 @@ def _append_various_line(self, line):
         name = line.split(' ', 1)[0]
 
         if name.startswith('@') and name.endswith(':'):
-            raise QAPIParseError(self._parser,
-                                 "'%s' can't follow '%s' section"
-                                 % (name, self.sections[0].name))
+            raise QAPIDocError("'%s' can't follow '%s' section"
+                               % (name, self.sections[0].name))
         if self._is_section_tag(name):
             # If line is "Section:   first line of description", find
             # the index of 'f', which is the indent we expect for any
@@ -286,10 +288,9 @@ def _append_various_line(self, line):
     def _start_symbol_section(self, symbols_dict, name, indent):
         # FIXME invalid names other than the empty string aren't flagged
         if not name:
-            raise QAPIParseError(self._parser, "invalid parameter name")
+            raise QAPIDocError("invalid parameter name")
         if name in symbols_dict:
-            raise QAPIParseError(self._parser,
-                                 "'%s' parameter name duplicated" % name)
+            raise QAPIDocError("'%s' parameter name duplicated" % name)
         assert not self.sections
         self._end_section()
         self._section = QAPIDoc.ArgSection(self._parser, name, indent)
@@ -303,8 +304,7 @@ def _start_features_section(self, name, indent):
 
     def _start_section(self, name=None, indent=0):
         if name in ('Returns', 'Since') and self.has_section(name):
-            raise QAPIParseError(self._parser,
-                                 "duplicated '%s' section" % name)
+            raise QAPIDocError("duplicated '%s' section" % name)
         self._end_section()
         self._section = QAPIDoc.Section(self._parser, name, indent)
         self.sections.append(self._section)
@@ -313,17 +313,15 @@ def _end_section(self):
         if self._section:
             text = self._section.text = self._section.text.strip()
             if self._section.name and (not text or text.isspace()):
-                raise QAPIParseError(
-                    self._parser,
+                raise QAPIDocError(
                     "empty doc section '%s'" % self._section.name)
             self._section = None
 
     def _append_freeform(self, line):
         match = re.match(r'(@\S+:)', line)
         if match:
-            raise QAPIParseError(self._parser,
-                                 "'%s' not allowed in free-form documentation"
-                                 % match.group(1))
+            raise QAPIDocError("'%s' not allowed in free-form documentation"
+                               % match.group(1))
         self._section.append(line)
 
     def connect_member(self, member):
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 22/22] qapi: [WIP] Enable linters on parser.py
  2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
                   ` (20 preceding siblings ...)
  2021-04-22  3:07 ` [PATCH 21/22] qapi: [WIP] Add QAPIDocError John Snow
@ 2021-04-22  3:07 ` John Snow
  21 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-22  3:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roth, Cleber Rosa, John Snow, Markus Armbruster, Eduardo Habkost

(Only works after we excise QAPIDoc, of course.)
Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/mypy.ini | 6 ------
 scripts/qapi/pylintrc | 3 +--
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/scripts/qapi/mypy.ini b/scripts/qapi/mypy.ini
index 1a72be2c788..56c0f306c5e 100644
--- a/scripts/qapi/mypy.ini
+++ b/scripts/qapi/mypy.ini
@@ -3,12 +3,6 @@ strict = True
 disallow_untyped_calls = False
 python_version = 3.6
 
-[mypy-qapi.parser]
-# QAPISchemaParser is done, I promise!
-disallow_untyped_defs = False
-disallow_incomplete_defs = False
-check_untyped_defs = False
-
 [mypy-qapi.qapidoc]
 disallow_untyped_defs = False
 disallow_incomplete_defs = False
diff --git a/scripts/qapi/pylintrc b/scripts/qapi/pylintrc
index ec7605edade..b41a26c1ceb 100644
--- a/scripts/qapi/pylintrc
+++ b/scripts/qapi/pylintrc
@@ -2,8 +2,7 @@
 
 # Add files or directories matching the regex patterns to the ignore list.
 # The regex matches against base names, not paths.
-ignore-patterns=parser.py,
-                qapidoc.py,
+ignore-patterns=qapidoc.py,
                 schema.py,
 
 
-- 
2.30.2



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/22] qapi/parser: Don't try to handle file errors
  2021-04-22  3:06 ` [PATCH 01/22] qapi/parser: Don't try to handle file errors John Snow
@ 2021-04-23 15:46   ` Markus Armbruster
  2021-04-23 19:20     ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-23 15:46 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> The short-ish version of what motivates this patch is:
>
> - The parser initializer does not possess adequate context to write a
>   good error message -- It tries to determine the caller's semantic
>   context.

I'm not sure I get what you're trying to say here.

> - We don't want to allow QAPISourceInfo(None, None, None) to exist.
> - Errors made using such an object are currently incorrect.
> - It's not technically a semantic error if we cannot open the schema
> - There are various typing constraints that make mixing these two cases
>   undesirable for a single special case.

These I understand.

> - The current open block in parser's initializer will leak file
>   pointers, because it isn't using a with statement.

Uh, isn't the value returned by open() reference-counted?  @fp is the
only reference...

> Here's the details in why this got written the way it did, and why a few
> disparate issues are rolled into one commit. (They're hard to fix
> separately without writing really weird stuff that'd be harder to
> review.)
>
> The error message string here is incorrect:
>
>> python3 qapi-gen.py 'fake.json'
> qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory

Regressed in commit 52a474180a "qapi-gen: Separate arg-parsing from
generation" (v5.2.0).

Before commit c615550df3 "qapi: Improve source file read error handling"
(v4.2.0), it was differently bad (uncaught exception).

Commit c615550df3 explains why the funny QAPISourceInfo exists:

    Reporting open or read failure for the main schema file needs a
    QAPISourceInfo representing "no source".  Make QAPISourceInfo cope
    with fname=None.

The commit turned QAPISourceInfo into the equivalent of a disjoint union
of

1. A position in a source file (.fname is a str)

2. "Not in any source file" (.fname is None)

This is somewhat similar to struct Location in C, which has

1. LOC_FILE: a position in a source file

2. LOC_CMDLINE: a range of command line arguments

3. LOC_NONE: no location information

Abstracting locations this way lets error_report() do the right thing
whether its complaining about the command line, a monitor command, or a
configuration file read with -readconfig.

Your patch demonstrates that qapi-gen has much less need for abstracting
sources: we use 2. "Not in any source file" only for reading the main
schema file.

> In pursuing it, we find that QAPISourceInfo has a special accommodation
> for when there's no filename.

Yes:

    def loc(self) -> str:
-->     if self.fname is None:
-->         return sys.argv[0]
        ret = self.fname
        if self.line is not None:
            ret += ':%d' % self.line
        return ret

>                               Meanwhile, we intend to type info.fname as
> str; something we always have.

Do you mean "as non-optional str"?

> To remove this, we need to not have a "fake" QAPISourceInfo object. We

We may well want to, but I doubt we *need* to.  There are almost
certainly other ways to fix the bug.  I don't see a need to explore
them, though.

> also don't want to explicitly begin accommodating QAPISourceInfo being
> None, because we actually want to eventually prove that this can never
> happen -- We don't want to confuse "The file isn't open yet" with "This
> error stems from a definition that wasn't defined in any file".

Yes, encoding both "poisoned source info not to be used with actual
errors" and "'fake' source info not pointing to a source file" as None
would be a mistake.

> (An earlier series tried to create an official dummy object, but it was
> tough to prove in review that it worked correctly without creating new
> regressions. This patch avoids trying to re-litigate that discussion.
>
> We would like to first prove that we never raise QAPISemError for any
> built-in object before we relent and add "special" info objects. We
> aren't ready to do that yet, so crashing is preferred.)
>
> So, how to solve this mess?
>
> Here's one way: Don't try to handle errors at a level with "mixed"
> semantic levels; i.e. don't try to handle inclusion errors (should
> report a source line where the include was triggered) with command line
> errors (where we specified a file we couldn't read).
>
> Simply remove the error handling from the initializer of the
> parser. Pythonic! Now it's the caller's job to figure out what to do
> about it. Handle the error in QAPISchemaParser._include() instead, where
> we do have the correct semantic context to not need to play games with
> the error message generation.
>
> Next, to re-gain a nice error at the top level, add a new try/except
> into qapi/main.generate(). Now the error looks sensible:

Missing "again" after "sensible" ;-P

>
>> python3 qapi-gen.py 'fake.json'
> qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>
> Lastly, with this usage gone, we can remove the special type violation
> from QAPISourceInfo, and all is well with the world.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/main.py   |  8 +++++++-
>  scripts/qapi/parser.py | 18 +++++++++---------
>  scripts/qapi/source.py |  3 ---
>  3 files changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
> index 703e7ed1ed5..70f8aa86f37 100644
> --- a/scripts/qapi/main.py
> +++ b/scripts/qapi/main.py
> @@ -48,7 +48,13 @@ def generate(schema_file: str,
>      """
>      assert invalid_prefix_char(prefix) is None
>  
> -    schema = QAPISchema(schema_file)
> +    try:
> +        schema = QAPISchema(schema_file)
> +    except OSError as err:
> +        raise QAPIError(
> +            f"can't read schema file '{schema_file}': {err.strerror}"
> +        ) from err
> +
>      gen_types(schema, output_dir, prefix, builtins)
>      gen_visit(schema, output_dir, prefix, builtins)
>      gen_commands(schema, output_dir, prefix)
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index ca5e8e18e00..b378fa33807 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>          previously_included = previously_included or set()
>          previously_included.add(os.path.abspath(fname))
>  
> -        try:
> -            fp = open(fname, 'r', encoding='utf-8')
> +        # Allow the caller to catch this error.

"this error"?  I understand what you mean now, but I'm not sure I will
in three months, when I won't have the context I have now.

> +        with open(fname, 'r', encoding='utf-8') as fp:
>              self.src = fp.read()
> -        except IOError as e:
> -            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
> -                               "can't read %s file '%s': %s"
> -                               % ("include" if incl_info else "schema",
> -                                  fname,
> -                                  e.strerror))
>  
>          if self.src == '' or self.src[-1] != '\n':
>              self.src += '\n'
> @@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
>          if incl_abs_fname in previously_included:
>              return None
>  
> -        return QAPISchemaParser(incl_fname, previously_included, info)
> +        try:
> +            return QAPISchemaParser(incl_fname, previously_included, info)
> +        except OSError as err:
> +            raise QAPISemError(
> +                info,
> +                f"can't read include file '{incl_fname}': {err.strerror}"
> +            ) from err
>  
>      def _check_pragma_list_of_str(self, name, value, info):
>          if (not isinstance(value, list)

Before the patch, only IOError from open() and .read() get converted to
QAPISemError, and therefore caught by main().

The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.

> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
> index 03b6ede0828..1ade864d7b9 100644
> --- a/scripts/qapi/source.py
> +++ b/scripts/qapi/source.py
> @@ -10,7 +10,6 @@
>  # See the COPYING file in the top-level directory.
>  
>  import copy
> -import sys
>  from typing import List, Optional, TypeVar
>  
>  
> @@ -53,8 +52,6 @@ def next_line(self: T) -> T:
>          return info
>  
>      def loc(self) -> str:
> -        if self.fname is None:
> -            return sys.argv[0]
>          ret = self.fname
>          if self.line is not None:
>              ret += ':%d' % self.line

tests/qapi-schema/test-qapi.py also needs an update.  Before the patch:

    $ PYTHONPATH=scripts python3 tests/qapi-schema/test-qapi.py nonexistent
    tests/qapi-schema/test-qapi.py: can't read schema file 'nonexistent.json': No such file or directory

After:

    Traceback (most recent call last):
      File "tests/qapi-schema/test-qapi.py", line 207, in <module>
        main(sys.argv)
      File "tests/qapi-schema/test-qapi.py", line 201, in main
        status |= test_and_diff(test_name, dir_name, args.update)
      File "tests/qapi-schema/test-qapi.py", line 129, in test_and_diff
        test_frontend(os.path.join(dir_name, test_name + '.json'))
      File "tests/qapi-schema/test-qapi.py", line 109, in test_frontend
        schema = QAPISchema(fname)
      File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
        parser = QAPISchemaParser(fname)
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 44, in __init__
        with open(fname, 'r', encoding='utf-8') as fp:
    FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.json'



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/22] qapi/parser: Don't try to handle file errors
  2021-04-23 15:46   ` Markus Armbruster
@ 2021-04-23 19:20     ` John Snow
  2021-04-27 13:47       ` Markus Armbruster
  0 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-23 19:20 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/23/21 11:46 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> The short-ish version of what motivates this patch is:
>>
>> - The parser initializer does not possess adequate context to write a
>>    good error message -- It tries to determine the caller's semantic
>>    context.
> 
> I'm not sure I get what you're trying to say here.
> 

I mean: this __init__ method does not *know* who is calling it or why. 
Of course, *we* do, because the code base is finite and nobody else but 
us is calling into it.

I mean to point out that the initializer has to do extra work (Just a 
little) to determine what the calling context is and raise an error 
accordingly.

Example: If we have a parent info context, we raise an error in the 
context of the caller. If we don't, we have to create a new presumed 
context (using the weird None SourceInfo object).

So I just mean to say:

"Let the caller, who unambiguously always has the exactly correct 
context worry about what the error message ought to be."

>> - We don't want to allow QAPISourceInfo(None, None, None) to exist.
>> - Errors made using such an object are currently incorrect.
>> - It's not technically a semantic error if we cannot open the schema
>> - There are various typing constraints that make mixing these two cases
>>    undesirable for a single special case.
> 
> These I understand.
> 
>> - The current open block in parser's initializer will leak file
>>    pointers, because it isn't using a with statement.
> 
> Uh, isn't the value returned by open() reference-counted?  @fp is the
> only reference...
> 

Yeah, eventually. O:-)

Whenever the GC runs. OK, it's not really an apocalypse error, but it 
felt strange to rewrite a try/except and then write it using bad hygiene 
on purpose in the name of a more isolated commit.

>> Here's the details in why this got written the way it did, and why a few
>> disparate issues are rolled into one commit. (They're hard to fix
>> separately without writing really weird stuff that'd be harder to
>> review.)
>>
>> The error message string here is incorrect:
>>
>>> python3 qapi-gen.py 'fake.json'
>> qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory
> 
> Regressed in commit 52a474180a "qapi-gen: Separate arg-parsing from
> generation" (v5.2.0).
> 

Mea Culpa. Didn't realize it wasn't tested, and I didn't realize at the 
time that the two kinds of errors here were treated differently.

> Before commit c615550df3 "qapi: Improve source file read error handling"
> (v4.2.0), it was differently bad (uncaught exception).
> 
> Commit c615550df3 explains why the funny QAPISourceInfo exists:
> 
>      Reporting open or read failure for the main schema file needs a
>      QAPISourceInfo representing "no source".  Make QAPISourceInfo cope
>      with fname=None.
> 

I am apparently not the first or the last person to dream of wanting a 
QAPISourceInfo that represents "Actually, there's no source location!"

> The commit turned QAPISourceInfo into the equivalent of a disjoint union
> of
> 
> 1. A position in a source file (.fname is a str)
> 
> 2. "Not in any source file" (.fname is None)
> 
> This is somewhat similar to struct Location in C, which has
> 
> 1. LOC_FILE: a position in a source file
> 
> 2. LOC_CMDLINE: a range of command line arguments
> 
> 3. LOC_NONE: no location information
> 
> Abstracting locations this way lets error_report() do the right thing
> whether its complaining about the command line, a monitor command, or a
> configuration file read with -readconfig.
> 
> Your patch demonstrates that qapi-gen has much less need for abstracting
> sources: we use 2. "Not in any source file" only for reading the main
> schema file.
> 

Yes. I got the impression that you didn't want to pursue more abstract 
QSI constructs based on earlier work, so going the other way and 
*removing* them seemed like the faster way to achieve a clean type 
system here.

>> In pursuing it, we find that QAPISourceInfo has a special accommodation
>> for when there's no filename.
> 
> Yes:
> 
>      def loc(self) -> str:
> -->     if self.fname is None:
> -->         return sys.argv[0]
>          ret = self.fname
>          if self.line is not None:
>              ret += ':%d' % self.line
>          return ret
> 
>>                                Meanwhile, we intend to type info.fname as
>> str; something we always have.
> 
> Do you mean "as non-optional str"?
> 

Yeah. I typed it originally as `str`, but the analyzer missed that we 
check the field to see if it's None, which is misleading.

>> To remove this, we need to not have a "fake" QAPISourceInfo object. We
> 
> We may well want to, but I doubt we *need* to.  There are almost
> certainly other ways to fix the bug.  I don't see a need to explore
> them, though.
> 

Either we build out the fake QSI into a proper subtype, or we remove it 
-- those are the two obvious options. Building it out is almost 
certainly more work than this patch.

>> also don't want to explicitly begin accommodating QAPISourceInfo being
>> None, because we actually want to eventually prove that this can never
>> happen -- We don't want to confuse "The file isn't open yet" with "This
>> error stems from a definition that wasn't defined in any file".
> 
> Yes, encoding both "poisoned source info not to be used with actual
> errors" and "'fake' source info not pointing to a source file" as None
> would be a mistake.
> 

:)

>> (An earlier series tried to create an official dummy object, but it was
>> tough to prove in review that it worked correctly without creating new
>> regressions. This patch avoids trying to re-litigate that discussion.
>>
>> We would like to first prove that we never raise QAPISemError for any
>> built-in object before we relent and add "special" info objects. We
>> aren't ready to do that yet, so crashing is preferred.)
>>
>> So, how to solve this mess?
>>
>> Here's one way: Don't try to handle errors at a level with "mixed"
>> semantic levels; i.e. don't try to handle inclusion errors (should
>> report a source line where the include was triggered) with command line
>> errors (where we specified a file we couldn't read).
>>
>> Simply remove the error handling from the initializer of the
>> parser. Pythonic! Now it's the caller's job to figure out what to do
>> about it. Handle the error in QAPISchemaParser._include() instead, where
>> we do have the correct semantic context to not need to play games with
>> the error message generation.
>>
>> Next, to re-gain a nice error at the top level, add a new try/except
>> into qapi/main.generate(). Now the error looks sensible:
> 
> Missing "again" after "sensible" ;-P
> 

okayokayokayfine

>>
>>> python3 qapi-gen.py 'fake.json'
>> qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>>
>> Lastly, with this usage gone, we can remove the special type violation
>> from QAPISourceInfo, and all is well with the world.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/main.py   |  8 +++++++-
>>   scripts/qapi/parser.py | 18 +++++++++---------
>>   scripts/qapi/source.py |  3 ---
>>   3 files changed, 16 insertions(+), 13 deletions(-)
>>
>> diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
>> index 703e7ed1ed5..70f8aa86f37 100644
>> --- a/scripts/qapi/main.py
>> +++ b/scripts/qapi/main.py
>> @@ -48,7 +48,13 @@ def generate(schema_file: str,
>>       """
>>       assert invalid_prefix_char(prefix) is None
>>   
>> -    schema = QAPISchema(schema_file)
>> +    try:
>> +        schema = QAPISchema(schema_file)
>> +    except OSError as err:
>> +        raise QAPIError(
>> +            f"can't read schema file '{schema_file}': {err.strerror}"
>> +        ) from err
>> +
>>       gen_types(schema, output_dir, prefix, builtins)
>>       gen_visit(schema, output_dir, prefix, builtins)
>>       gen_commands(schema, output_dir, prefix)
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index ca5e8e18e00..b378fa33807 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>>           previously_included = previously_included or set()
>>           previously_included.add(os.path.abspath(fname))
>>   
>> -        try:
>> -            fp = open(fname, 'r', encoding='utf-8')
>> +        # Allow the caller to catch this error.
> 
> "this error"?  I understand what you mean now, but I'm not sure I will
> in three months, when I won't have the context I have now.
> 

Yep, OK.

# May raise OSError, allow the caller to handle it.

>> +        with open(fname, 'r', encoding='utf-8') as fp:
>>               self.src = fp.read()
>> -        except IOError as e:
>> -            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
>> -                               "can't read %s file '%s': %s"
>> -                               % ("include" if incl_info else "schema",
>> -                                  fname,
>> -                                  e.strerror))
>>   
>>           if self.src == '' or self.src[-1] != '\n':
>>               self.src += '\n'
>> @@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
>>           if incl_abs_fname in previously_included:
>>               return None
>>   
>> -        return QAPISchemaParser(incl_fname, previously_included, info)
>> +        try:
>> +            return QAPISchemaParser(incl_fname, previously_included, info)
>> +        except OSError as err:
>> +            raise QAPISemError(
>> +                info,
>> +                f"can't read include file '{incl_fname}': {err.strerror}"
>> +            ) from err
>>   
>>       def _check_pragma_list_of_str(self, name, value, info):
>>           if (not isinstance(value, list)
> 
> Before the patch, only IOError from open() and .read() get converted to
> QAPISemError, and therefore caught by main().
> 
> The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.
> 

"Changed in version 3.3: EnvironmentError, IOError, WindowsError, 
socket.error, select.error and mmap.error have been merged into OSError, 
and the constructor may return a subclass."

 >>> OSError == IOError
True

(No, I didn't know this before I wrote it. I just intentionally wanted 
to catch everything that open() might return, which I had simply assumed 
was not fully captured by IOError. Better to leave it as OSError now to 
avoid misleading anyone into thinking it's more narrow than it really is.)

>> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
>> index 03b6ede0828..1ade864d7b9 100644
>> --- a/scripts/qapi/source.py
>> +++ b/scripts/qapi/source.py
>> @@ -10,7 +10,6 @@
>>   # See the COPYING file in the top-level directory.
>>   
>>   import copy
>> -import sys
>>   from typing import List, Optional, TypeVar
>>   
>>   
>> @@ -53,8 +52,6 @@ def next_line(self: T) -> T:
>>           return info
>>   
>>       def loc(self) -> str:
>> -        if self.fname is None:
>> -            return sys.argv[0]
>>           ret = self.fname
>>           if self.line is not None:
>>               ret += ':%d' % self.line
> 
> tests/qapi-schema/test-qapi.py also needs an update.  Before the patch:
> 
>      $ PYTHONPATH=scripts python3 tests/qapi-schema/test-qapi.py nonexistent
>      tests/qapi-schema/test-qapi.py: can't read schema file 'nonexistent.json': No such file or directory
> 
> After:
> 
>      Traceback (most recent call last):
>        File "tests/qapi-schema/test-qapi.py", line 207, in <module>
>          main(sys.argv)
>        File "tests/qapi-schema/test-qapi.py", line 201, in main
>          status |= test_and_diff(test_name, dir_name, args.update)
>        File "tests/qapi-schema/test-qapi.py", line 129, in test_and_diff
>          test_frontend(os.path.join(dir_name, test_name + '.json'))
>        File "tests/qapi-schema/test-qapi.py", line 109, in test_frontend
>          schema = QAPISchema(fname)
>        File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
>          parser = QAPISchemaParser(fname)
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 44, in __init__
>          with open(fname, 'r', encoding='utf-8') as fp:
>      FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.json'
> 

Probably something that should be added to the actual battery of tests 
somehow, yeah? I can't prevent regressions in invocations that don't get 
run O:-)

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer
  2021-04-22  3:07 ` [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer John Snow
@ 2021-04-24  6:38   ` Markus Armbruster
  2021-04-26 17:39     ` John Snow
  2021-04-26 23:14     ` John Snow
  0 siblings, 2 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-04-24  6:38 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> With the QAPISourceInfo(None, None, None) construct gone, there's not
> really any reason to have to specify that a file starts on the first
> line.
>
> Remove it from the initializer and have it default to 1.
>
> Remove the last vestiges where we check for 'line' being unset. That
> won't happen again, now!
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py |  2 +-
>  scripts/qapi/source.py | 12 +++---------
>  2 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index b378fa33807..edd0af33ae0 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -47,7 +47,7 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>          if self.src == '' or self.src[-1] != '\n':
>              self.src += '\n'
>          self.cursor = 0
> -        self.info = QAPISourceInfo(fname, 1, incl_info)
> +        self.info = QAPISourceInfo(fname, incl_info)
>          self.line_pos = 0
>          self.exprs = []
>          self.docs = []
> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
> index 21090b9fe78..afa21518974 100644
> --- a/scripts/qapi/source.py
> +++ b/scripts/qapi/source.py
> @@ -37,10 +37,9 @@ def __init__(self) -> None:
>  class QAPISourceInfo:
>      T = TypeVar('T', bound='QAPISourceInfo')
>  
> -    def __init__(self, fname: str, line: int,
> -                 parent: Optional['QAPISourceInfo']):
> +    def __init__(self, fname: str, parent: Optional['QAPISourceInfo'] = None):

Not mentioned in the commit message: you add a default parameter value.
It's not used; there's just one caller, and it passes a value.
Intentional?

>          self.fname = fname
> -        self.line = line
> +        self.line = 1
>          self._column: Optional[int] = None
>          self.parent = parent
>          self.pragma: QAPISchemaPragma = (
> @@ -59,12 +58,7 @@ def next_line(self: T) -> T:
>          return info
>  
>      def loc(self) -> str:
> -        # column cannot be provided meaningfully when line is absent.
> -        assert self.line or self._column is None
> -
> -        ret = self.fname
> -        if self.line is not None:
> -            ret += ':%d' % self.line
> +        ret = f"{self.fname}:{self.line}"
>          if self._column is not None:
>              ret += ':%d' % self._column
>          return ret

Mixing f-string and % interpolation.  I doubt we'd write it this way
from scratch.  I recommend to either stick to % for now (leave
conversion to f-strings for later), or conver the column formatting,
too, even though it's not related to the patch's purpose.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/22] qapi/parser: Assert lexer value is a string
  2021-04-22  3:07 ` [PATCH 05/22] qapi/parser: Assert lexer value is a string John Snow
@ 2021-04-24  8:33   ` Markus Armbruster
  2021-04-26 17:43     ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-24  8:33 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> The type checker can't narrow the type of the token value to string,
> because it's only loosely correlated with the return token.
>
> We know that a token of '#' should always have a "str" value.
> Add an assertion.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index f519518075e..c75434e75a5 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -303,6 +303,7 @@ def get_doc(self, info):
>          cur_doc = QAPIDoc(self, info)
>          self.accept(False)
>          while self.tok == '#':
> +            assert isinstance(self.val, str), "Expected str value"
>              if self.val.startswith('##'):
>                  # End of doc comment
>                  if self.val != '##':

The second operand of assert provides no additional information.  Please
drop it.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 06/22] qapi/parser: assert get_expr returns object in outer loop
  2021-04-22  3:07 ` [PATCH 06/22] qapi/parser: assert get_expr returns object in outer loop John Snow
@ 2021-04-25  7:23   ` Markus Armbruster
  2021-04-27 15:03     ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-25  7:23 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> get_expr can return many things, depending on where it is used. In the
> outer parsing loop, we expect and require it to return a dict.
>
> (It's (maybe) a bit involved to teach mypy that when nested is False,
> this is already always True. I'll look into it later, maybe.)
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index c75434e75a5..6b443b1247e 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -78,6 +78,8 @@ def _parse(self):
>                  continue
>  
>              expr = self.get_expr(False)
> +            assert isinstance(expr, dict)  # Guaranteed when nested=False
> +
>              if 'include' in expr:
>                  self.reject_expr_doc(cur_doc)
>                  if len(expr) != 1:
> @@ -278,6 +280,7 @@ def get_values(self):
>              self.accept()
>  
>      def get_expr(self, nested):
> +        # TODO: Teach mypy that nested=False means the retval is a Dict.
>          if self.tok != '{' and not nested:
>              raise QAPIParseError(self, "expected '{'")
>          if self.tok == '{':

The better place to assert a post condition would be ...

                self.accept()
                expr = self.get_members()
            elif self.tok == '[':
                self.accept()
                expr = self.get_values()
            elif self.tok in "'tf":
                expr = self.val
                self.accept()
            else:
                raise QAPIParseError(
                    self, "expected '{', '[', string, or boolean")

... here.

            return expr

But then it may not help mypy over the hump, which is the whole point of
the patch.

Alternative ways to skin this cat:

* Split get_object() off get_expr().

  diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
  index ca5e8e18e0..c79b3c7d08 100644
  --- a/scripts/qapi/parser.py
  +++ b/scripts/qapi/parser.py
  @@ -262,9 +262,12 @@ def get_values(self):
                   raise QAPIParseError(self, "expected ',' or ']'")
               self.accept()

  -    def get_expr(self, nested):
  -        if self.tok != '{' and not nested:
  +    def get_object(self):
  +        if self.tok != '{':
               raise QAPIParseError(self, "expected '{'")
  +        return self.get_expr()
  +
  +    def get_expr(self):
           if self.tok == '{':
               self.accept()
               expr = self.get_members()

* Shift "top-level expression must be dict" up:

    diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
    index ca5e8e18e0..ee8cbf3531 100644
    --- a/scripts/qapi/parser.py
    +++ b/scripts/qapi/parser.py
    @@ -68,7 +68,10 @@ def __init__(self, fname, previously_included=None, incl_info=None):
                         self.docs.append(cur_doc)
                     continue

    -            expr = self.get_expr(False)
    +            expr = self.get_expr()
    +            if not isinstance(expr, OrderedDict):
    +                raise QAPISemError(
    +                    info, "top-level expression must be an object")
                 if 'include' in expr:
                     self.reject_expr_doc(cur_doc)
                     if len(expr) != 1:
    @@ -262,9 +265,7 @@ def get_values(self):
                     raise QAPIParseError(self, "expected ',' or ']'")
                 self.accept()

    -    def get_expr(self, nested):
    -        if self.tok != '{' and not nested:
    -            raise QAPIParseError(self, "expected '{'")
    +    def get_expr(self):
             if self.tok == '{':
                 self.accept()
                 expr = self.get_members()

* Shift it further, into expr.py:

   diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
   index 496f7e0333..0a83c493a0 100644
   --- a/scripts/qapi/expr.py
   +++ b/scripts/qapi/expr.py
   @@ -600,7 +600,10 @@ def check_exprs(exprs: List[_JSONObject]) -> List[_JSONObject]:
        """
        for expr_elem in exprs:
            # Expression
   -        assert isinstance(expr_elem['expr'], dict)
   +        if not isinstance(expr_elem['expr'], dict):
   +            raise QAPISemError(
   +                info, "top-level expression must be an object")
   +            
            for key in expr_elem['expr'].keys():
                assert isinstance(key, str)
            expr: _JSONObject = expr_elem['expr']

Shifting it up would be closer to qapi-code-gen.txt than what we have
now.

All observations, no demands.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 07/22] qapi/parser: assert object keys are strings
  2021-04-22  3:07 ` [PATCH 07/22] qapi/parser: assert object keys are strings John Snow
@ 2021-04-25  7:27   ` Markus Armbruster
  2021-04-26 17:46     ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-25  7:27 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> The single quote token implies the value is a string. Assert this to be
> the case.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index 6b443b1247e..8d1fe0ddda5 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -246,6 +246,8 @@ def get_members(self):
>              raise QAPIParseError(self, "expected string or '}'")
>          while True:
>              key = self.val
> +            assert isinstance(key, str)  # Guaranteed by tok == "'"
> +
>              self.accept()
>              if self.tok != ':':
>                  raise QAPIParseError(self, "expected ':'")

The assertion is correct, but I wonder why mypy needs it.  Can you help?



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 09/22] qapi: add match_nofail helper
  2021-04-22  3:07 ` [PATCH 09/22] qapi: add match_nofail helper John Snow
@ 2021-04-25  7:54   ` Markus Armbruster
  2021-04-26 17:48     ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-25  7:54 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> Mypy cannot generally understand that these regex functions cannot
> possibly fail. Add a _nofail helper that clarifies this for mypy.

Convention wants a blank line here.

> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/common.py |  8 +++++++-
>  scripts/qapi/main.py   |  6 ++----
>  scripts/qapi/parser.py | 13 +++++++------
>  3 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> index cbd3fd81d36..d38c1746767 100644
> --- a/scripts/qapi/common.py
> +++ b/scripts/qapi/common.py
> @@ -12,7 +12,7 @@
>  # See the COPYING file in the top-level directory.
>  
>  import re
> -from typing import Optional, Sequence
> +from typing import Match, Optional, Sequence
>  
>  
>  #: Magic string that gets removed along with all space to its right.
> @@ -210,3 +210,9 @@ def gen_endif(ifcond: Sequence[str]) -> str:
>  #endif /* %(cond)s */
>  ''', cond=ifc)
>      return ret
> +
> +
> +def match_nofail(pattern: str, string: str) -> Match[str]:
> +    match = re.match(pattern, string)
> +    assert match is not None
> +    return match

Name it must_match()?  You choose.

I wish we could have more stating typing with less notational overhead,
but no free lunch...

[...]



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/22] qapi/parser: Fix typing of token membership tests
  2021-04-22  3:07 ` [PATCH 10/22] qapi/parser: Fix typing of token membership tests John Snow
@ 2021-04-25  7:59   ` Markus Armbruster
  2021-04-26 17:51     ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-25  7:59 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> When the token can be None, we can't use 'x in "abc"' style membership
> tests to group types of tokens together, because 'None in "abc"' is a
> TypeError.
>
> Easy enough to fix, if not a little ugly.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index 7f3c009f64b..16fd36f8391 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -272,7 +272,7 @@ def get_values(self):
>          if self.tok == ']':
>              self.accept()
>              return expr
> -        if self.tok not in "{['tf":
> +        if self.tok is None or self.tok not in "{['tf":
>              raise QAPIParseError(
>                  self, "expected '{', '[', ']', string, or boolean")
>          while True:
> @@ -294,7 +294,8 @@ def get_expr(self, nested):
>          elif self.tok == '[':
>              self.accept()
>              expr = self.get_values()
> -        elif self.tok in "'tf":
> +        elif self.tok and self.tok in "'tf":
> +            assert isinstance(self.val, (str, bool))
>              expr = self.val
>              self.accept()
>          else:

How can self.tok be None?

I suspect this is an artifact of PATCH 04.  Before, self.tok is
initialized to the first token, then set to subsequent tokens (all str)
in turn.  After, it's initialized to None, then set to tokens in turn.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard
  2021-04-22  3:07 ` [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard John Snow
@ 2021-04-25 12:32   ` Markus Armbruster
  2021-04-26 23:48     ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-25 12:32 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> TypeGuards wont exist in Python proper until 3.10. Ah well. We can hack
> up our own by declaring this function to return the type we claim it
> checks for and using this to safely downcast object -> List[str].
>
> In so doing, I bring this function in-line under _pragma so it can use
> the 'info' object in its closure. Having done this, _pragma also now
> no longer needs to take a 'self' parameter, so drop it.
>
> Rename it to just _check(), to help us out with the line-length -- and
> now that it's contained within _pragma, it is contextually easier to see
> how it's used anyway -- especially with types.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
>
> ---
>
> I left (name, value) as args to avoid creating a fully magic "macro",
> though, I thought this was too weird:
>
>     info.pragma.foobar = _check()
>
> and it looked more reasonable as:
>
>     info.pragma.foobar = _check(name, value)
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index 16fd36f8391..d02a134aae9 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -17,6 +17,7 @@
>  from collections import OrderedDict
>  import os
>  import re
> +from typing import List
>  
>  from .common import match_nofail
>  from .error import QAPISemError, QAPISourceError
> @@ -151,28 +152,27 @@ def _include(include, info, incl_fname, previously_included):
>              ) from err
>  
>      @staticmethod
> -    def _check_pragma_list_of_str(name, value, info):
> -        if (not isinstance(value, list)
> -                or any([not isinstance(elt, str) for elt in value])):
> -            raise QAPISemError(
> -                info,
> -                "pragma %s must be a list of strings" % name)
> +    def _pragma(name, value, info):
> +
> +        def _check(name, value) -> List[str]:
> +            if (not isinstance(value, list) or
> +                    any([not isinstance(elt, str) for elt in value])):
> +                raise QAPISemError(
> +                    info,
> +                    "pragma %s must be a list of strings" % name)
> +            return value
>  
> -    def _pragma(self, name, value, info):
>          if name == 'doc-required':
>              if not isinstance(value, bool):
>                  raise QAPISemError(info,
>                                     "pragma 'doc-required' must be boolean")
>              info.pragma.doc_required = value
>          elif name == 'command-name-exceptions':
> -            self._check_pragma_list_of_str(name, value, info)
> -            info.pragma.command_name_exceptions = value
> +            info.pragma.command_name_exceptions = _check(name, value)
>          elif name == 'command-returns-exceptions':
> -            self._check_pragma_list_of_str(name, value, info)
> -            info.pragma.command_returns_exceptions = value
> +            info.pragma.command_returns_exceptions = _check(name, value)
>          elif name == 'member-name-exceptions':
> -            self._check_pragma_list_of_str(name, value, info)
> -            info.pragma.member_name_exceptions = value
> +            info.pragma.member_name_exceptions = _check(name, value)
>          else:
>              raise QAPISemError(info, "unknown pragma '%s'" % name)

While I appreciate the terseness, I'm not sure I like the generic name
_check() for checking one of two special cases, namely "list of string".
The other case being "boolean".  We could acquire more cases later.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/22] qapi/parser: add type hint annotations
  2021-04-22  3:07 ` [PATCH 12/22] qapi/parser: add type hint annotations John Snow
@ 2021-04-25 12:34   ` Markus Armbruster
  2021-04-26 18:00     ` John Snow
  2021-04-26 23:55     ` John Snow
  2021-05-06  1:27   ` John Snow
  1 sibling, 2 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-04-25 12:34 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> Annotations do not change runtime behavior.
> This commit *only* adds annotations.
>
> (Annotations for QAPIDoc are in a later commit.)
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py | 61 ++++++++++++++++++++++++++++--------------
>  1 file changed, 41 insertions(+), 20 deletions(-)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index d02a134aae9..f2b57d5642a 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -17,16 +17,29 @@
>  from collections import OrderedDict
>  import os
>  import re
> -from typing import List
> +from typing import (
> +    Dict,
> +    List,
> +    Optional,
> +    Set,
> +    Union,
> +)
>  
>  from .common import match_nofail
>  from .error import QAPISemError, QAPISourceError
>  from .source import QAPISourceInfo
>  
>  
> +#: Represents a parsed JSON object; semantically: one QAPI schema expression.
> +Expression = Dict[str, object]

I believe you use this for what qapi-code-gen.txt calls a top-level
expression.  TopLevelExpression is rather long, but it's used just once,
and once more in RFC PATCH 13.  What do you think?

> +
> +# Return value alias for get_expr().
> +_ExprValue = Union[List[object], Dict[str, object], str, bool]

This is essentially a node in our pidgin-JSON parser's abstract syntax
tree.  Tree roots use the Dict branch of this Union.

See also my review of PATCH 06.

> +
> +
>  class QAPIParseError(QAPISourceError):
>      """Error class for all QAPI schema parsing errors."""
> -    def __init__(self, parser, msg):
> +    def __init__(self, parser: 'QAPISchemaParser', msg: str):

Forward reference needs quotes.  Can't be helped.

>          col = 1
>          for ch in parser.src[parser.line_pos:parser.pos]:
>              if ch == '\t':
> @@ -38,7 +51,10 @@ def __init__(self, parser, msg):
>  
>  class QAPISchemaParser:
>  
> -    def __init__(self, fname, previously_included=None, incl_info=None):
> +    def __init__(self,
> +                 fname: str,
> +                 previously_included: Optional[Set[str]] = None,

This needs to be Optional[] because using the empty set as default
parameter value would be a dangerous trap.  Python's choice to evaluate
the default parameter value just once has always been iffy.  Stirring
static typing into the language makes it iffier.  Can't be helped.

> +                 incl_info: Optional[QAPISourceInfo] = None):
>          self._fname = fname
>          self._included = previously_included or set()
>          self._included.add(os.path.abspath(self._fname))
> @@ -46,20 +62,20 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>  
>          # Lexer state (see `accept` for details):
>          self.info = QAPISourceInfo(self._fname, incl_info)
> -        self.tok = None
> +        self.tok: Optional[str] = None

Would

           self.tok: str

work?

>          self.pos = 0
>          self.cursor = 0
> -        self.val = None
> +        self.val: Optional[Union[bool, str]] = None
>          self.line_pos = 0
>  
>          # Parser output:
> -        self.exprs = []
> -        self.docs = []
> +        self.exprs: List[Expression] = []
> +        self.docs: List[QAPIDoc] = []
>  
>          # Showtime!
>          self._parse()
>  
> -    def _parse(self):
> +    def _parse(self) -> None:
>          cur_doc = None
>  
>          with open(self._fname, 'r', encoding='utf-8') as fp:
> @@ -122,7 +138,7 @@ def _parse(self):
>          self.reject_expr_doc(cur_doc)
>  
>      @staticmethod
> -    def reject_expr_doc(doc):
> +    def reject_expr_doc(doc: Optional['QAPIDoc']) -> None:
>          if doc and doc.symbol:
>              raise QAPISemError(
>                  doc.info,
> @@ -130,10 +146,14 @@ def reject_expr_doc(doc):
>                  % doc.symbol)
>  
>      @staticmethod
> -    def _include(include, info, incl_fname, previously_included):
> +    def _include(include: str,
> +                 info: QAPISourceInfo,
> +                 incl_fname: str,
> +                 previously_included: Set[str]
> +                 ) -> Optional['QAPISchemaParser']:
>          incl_abs_fname = os.path.abspath(incl_fname)
>          # catch inclusion cycle
> -        inf = info
> +        inf: Optional[QAPISourceInfo] = info
>          while inf:
>              if incl_abs_fname == os.path.abspath(inf.fname):
>                  raise QAPISemError(info, "inclusion loop for %s" % include)
> @@ -152,9 +172,9 @@ def _include(include, info, incl_fname, previously_included):
>              ) from err
>  
>      @staticmethod
> -    def _pragma(name, value, info):
> +    def _pragma(name: str, value: object, info: QAPISourceInfo) -> None:

value: object isn't wrong, but why not _ExprValue?

>  
> -        def _check(name, value) -> List[str]:
> +        def _check(name: str, value: object) -> List[str]:
>              if (not isinstance(value, list) or
>                      any([not isinstance(elt, str) for elt in value])):
>                  raise QAPISemError(
> @@ -176,7 +196,7 @@ def _check(name, value) -> List[str]:
>          else:
>              raise QAPISemError(info, "unknown pragma '%s'" % name)
>  
> -    def accept(self, skip_comment=True):
> +    def accept(self, skip_comment: bool = True) -> None:
>          while True:
>              self.tok = self.src[self.cursor]
>              self.pos = self.cursor
> @@ -240,8 +260,8 @@ def accept(self, skip_comment=True):
>                                       self.src[self.cursor-1:])
>                  raise QAPIParseError(self, "stray '%s'" % match.group(0))
>  
> -    def get_members(self):
> -        expr = OrderedDict()
> +    def get_members(self) -> 'OrderedDict[str, object]':
> +        expr: 'OrderedDict[str, object]' = OrderedDict()
>          if self.tok == '}':
>              self.accept()
>              return expr
> @@ -267,8 +287,8 @@ def get_members(self):
>              if self.tok != "'":
>                  raise QAPIParseError(self, "expected string")
>  
> -    def get_values(self):
> -        expr = []
> +    def get_values(self) -> List[object]:
> +        expr: List[object] = []
>          if self.tok == ']':
>              self.accept()
>              return expr
> @@ -284,8 +304,9 @@ def get_values(self):
>                  raise QAPIParseError(self, "expected ',' or ']'")
>              self.accept()
>  
> -    def get_expr(self, nested):
> +    def get_expr(self, nested: bool = False) -> _ExprValue:
>          # TODO: Teach mypy that nested=False means the retval is a Dict.
> +        expr: _ExprValue
>          if self.tok != '{' and not nested:
>              raise QAPIParseError(self, "expected '{'")
>          if self.tok == '{':
> @@ -303,7 +324,7 @@ def get_expr(self, nested):
>                  self, "expected '{', '[', string, or boolean")
>          return expr
>  
> -    def get_doc(self, info):
> +    def get_doc(self, info: QAPISourceInfo) -> List['QAPIDoc']:
>          if self.val != '##':
>              raise QAPIParseError(
>                  self, "junk after '##' at start of documentation comment")



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 16/22] qapi/parser: add docstrings
  2021-04-22  3:07 ` [PATCH 16/22] qapi/parser: add docstrings John Snow
@ 2021-04-25 13:27   ` Markus Armbruster
  2021-04-26 18:26     ` John Snow
  2021-05-07  1:34     ` John Snow
  0 siblings, 2 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-04-25 13:27 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> Signed-off-by: John Snow <jsnow@redhat.com>
>
> ---
>
> My hubris is infinite.

Score one of the three principal virtues of a programmer ;)

> OK, I only added a few -- to help me remember how the parser works at a glance.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index dbbd0fcbc2f..8fc77808ace 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
>  
>  
>  class QAPISchemaParser:
> +    """
> +    Performs parsing of a QAPI schema source file.

Actually, this parses one of two layers, see qapi-code-gen.txt section
"Schema syntax".  Pointing there might help.

>  
> +    :param fname: Path to the source file

Either "Source file name" or "Source pathname", please.  I prefer "file
name" for additional distance to "path" in the sense of a search path,
i.e. a list of directory names.

> +    :param previously_included:
> +        The absolute paths of previously included source files.

Either "absolute file name" or "absulute pathname".

> +        Only used by recursive calls to avoid re-parsing files.

Feels like detail, not sure it's needed here.

> +    :param incl_info:
> +       `QAPISourceInfo` for the parent document.
> +       This may be None if this is the root schema document.

Recommend s/This maybe //.

qapi-code-gen.txt calls a QAPI schema that uses include directives
"modular", and the included files "sub-modules".  s/root schema
document/root module/?

> +
> +    :ivar exprs: Resulting parsed expressions.
> +    :ivar docs: Resulting parsed documentation blocks.

Uh, why are these here?  A doc string is interface documentation...

> +
> +    :raise OSError: For problems opening the root schema document.
> +    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
> +    :raise QAPISemError: For various semantic issues with the schema.

Should callers care for the difference between QAPIParseError and
QAPISemError?

> +    """
>      def __init__(self,
>                   fname: str,
>                   previously_included: Optional[Set[str]] = None,
> @@ -77,6 +94,11 @@ def __init__(self,
>          self._parse()
>  
>      def _parse(self) -> None:
> +        """
> +        Parse the QAPI schema document.
> +
> +        :return: None; results are stored in ``exprs`` and ``docs``.

Another ignorant doc string markup question...  how am I supposed to see
that exprs and docs are attributes, and not global variables?

> +        """
>          cur_doc = None
>  
>          with open(self._fname, 'r', encoding='utf-8') as fp:
> @@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
>              raise QAPISemError(info, "unknown pragma '%s'" % name)
>  
>      def accept(self, skip_comment: bool = True) -> None:
> +        """
> +        Read the next lexeme and process it into a token.
> +
> +        :Object state:
> +          :tok: represents the token type. See below for values.
> +          :pos: is the position of the first character in the lexeme.
> +          :cursor: is the position of the next character.

Define "position" :)  It's an index in self.src.

self.cursor and self.pos are not used outside accept().  Not sure thet
belong into interface documentation.

> +          :val: is the variable value of the token, if any.

Missing: self.info, which *is* used outside accept().

> +
> +        Single-character tokens:
> +
> +        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
> +        ``LSQB``, and ``RSQB``.

"These include ..." is misleading.  This is the complete list of
single-character tokens.

> +        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
> +        lexeme.  ``val`` is ``None``.
> +
> +        Multi-character tokens:
> +
> +        - ``COMMENT``:
> +
> +          - This token is not normally yielded by the lexer, but it
> +            can be when ``skip_comment`` is False.
> +          - ``tok`` is the value ``"#"``.
> +          - ``val`` is a string including all chars until end-of-line.
> +
> +        - ``STRING``:
> +
> +          - ``tok`` is the ``"'"``, the single quote.
> +          - ``value`` is the string, *excluding* the quotes.
> +
> +        - ``TRUE`` and ``FALSE``:
> +
> +          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
> +          - ``val`` is either ``True`` or ``False`` accordingly.
> +
> +        - ``NEWLINE`` and ``SPACE``:
> +
> +          - These are consumed by the lexer directly. ``line_pos`` and
> +            ``info`` are advanced when ``NEWLINE`` is encountered.
> +            ``tok`` is set to ``None`` upon reaching EOF.
> +
> +        :param skip_comment:
> +            When false, return ``COMMENT`` tokens.
> +            This is used when reading documentation blocks.

The doc string mostly describes possible state on return of accept().
*Within* accept(), self.tok may be any character.

"Mostly" because item ``NEWLINE`` and ``SPACE`` is about something that
happens within accept().

Perhaps phrasing it as a postcondition would be clearer:

    Read and store the next token.

    On return, self.tok is the token type, self.info is describes its
    source location, and self.value is the token's value.

    The possible token types and their values are

    ...

> +        """
>          while True:
>              self.tok = self.src[self.cursor]
>              self.pos = self.cursor



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer
  2021-04-24  6:38   ` Markus Armbruster
@ 2021-04-26 17:39     ` John Snow
  2021-04-26 23:14     ` John Snow
  1 sibling, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-26 17:39 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/24/21 2:38 AM, Markus Armbruster wrote:
> Not mentioned in the commit message: you add a default parameter value.
> It's not used; there's just one caller, and it passes a value.
> Intentional?
> 

No. Leftover from an earlier version where it was used. It can be made 
to always be an explicit parameter now instead.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/22] qapi/parser: Assert lexer value is a string
  2021-04-24  8:33   ` Markus Armbruster
@ 2021-04-26 17:43     ` John Snow
  2021-04-27 12:30       ` Markus Armbruster
  0 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-26 17:43 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/24/21 4:33 AM, Markus Armbruster wrote:
> The second operand of assert provides no additional information.  Please
> drop it.

I don't agree with "no additional information", strictly.

I left you a comment on gitlab before you started reviewing on-list. 
What I wrote there:

"Markus: I know you're not a fan of these, but I wanted a suggestion on 
how to explain why this must be true in case it wasn't obvious to 
someone else in the future."

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 07/22] qapi/parser: assert object keys are strings
  2021-04-25  7:27   ` Markus Armbruster
@ 2021-04-26 17:46     ` John Snow
  2021-04-27  6:13       ` Markus Armbruster
  0 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-26 17:46 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 3:27 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> The single quote token implies the value is a string. Assert this to be
>> the case.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/parser.py | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index 6b443b1247e..8d1fe0ddda5 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -246,6 +246,8 @@ def get_members(self):
>>               raise QAPIParseError(self, "expected string or '}'")
>>           while True:
>>               key = self.val
>> +            assert isinstance(key, str)  # Guaranteed by tok == "'"
>> +
>>               self.accept()
>>               if self.tok != ':':
>>                   raise QAPIParseError(self, "expected ':'")
> 
> The assertion is correct, but I wonder why mypy needs it.  Can you help?
> 

The lexer value can also be True/False (Maybe None? I forget) based on 
the Token returned. Here, since the token was the single quote, we know 
that value must be a string.

Mypy has no insight into the correlation between the Token itself and 
the token value, because that relationship is not expressed via the type 
system.

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 09/22] qapi: add match_nofail helper
  2021-04-25  7:54   ` Markus Armbruster
@ 2021-04-26 17:48     ` John Snow
  0 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-26 17:48 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 3:54 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> Mypy cannot generally understand that these regex functions cannot
>> possibly fail. Add a _nofail helper that clarifies this for mypy.
> 
> Convention wants a blank line here.
> 

Tooling failure.

stg pop -a
while stg push; and stg edit --sign; done

(Will fix, but not so sure about fixing the tool...)

>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/common.py |  8 +++++++-
>>   scripts/qapi/main.py   |  6 ++----
>>   scripts/qapi/parser.py | 13 +++++++------
>>   3 files changed, 16 insertions(+), 11 deletions(-)
>>
>> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
>> index cbd3fd81d36..d38c1746767 100644
>> --- a/scripts/qapi/common.py
>> +++ b/scripts/qapi/common.py
>> @@ -12,7 +12,7 @@
>>   # See the COPYING file in the top-level directory.
>>   
>>   import re
>> -from typing import Optional, Sequence
>> +from typing import Match, Optional, Sequence
>>   
>>   
>>   #: Magic string that gets removed along with all space to its right.
>> @@ -210,3 +210,9 @@ def gen_endif(ifcond: Sequence[str]) -> str:
>>   #endif /* %(cond)s */
>>   ''', cond=ifc)
>>       return ret
>> +
>> +
>> +def match_nofail(pattern: str, string: str) -> Match[str]:
>> +    match = re.match(pattern, string)
>> +    assert match is not None
>> +    return match
> 
> Name it must_match()?  You choose.
> 

If you think it reads genuinely better, sure.

> I wish we could have more stating typing with less notational overhead,
> but no free lunch...
> 
> [...]
> 



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/22] qapi/parser: Fix typing of token membership tests
  2021-04-25  7:59   ` Markus Armbruster
@ 2021-04-26 17:51     ` John Snow
  2021-04-27  7:00       ` Markus Armbruster
  0 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-26 17:51 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 3:59 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> When the token can be None, we can't use 'x in "abc"' style membership
>> tests to group types of tokens together, because 'None in "abc"' is a
>> TypeError.
>>
>> Easy enough to fix, if not a little ugly.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/parser.py | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index 7f3c009f64b..16fd36f8391 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -272,7 +272,7 @@ def get_values(self):
>>           if self.tok == ']':
>>               self.accept()
>>               return expr
>> -        if self.tok not in "{['tf":
>> +        if self.tok is None or self.tok not in "{['tf":
>>               raise QAPIParseError(
>>                   self, "expected '{', '[', ']', string, or boolean")
>>           while True:
>> @@ -294,7 +294,8 @@ def get_expr(self, nested):
>>           elif self.tok == '[':
>>               self.accept()
>>               expr = self.get_values()
>> -        elif self.tok in "'tf":
>> +        elif self.tok and self.tok in "'tf":
>> +            assert isinstance(self.val, (str, bool))
>>               expr = self.val
>>               self.accept()
>>           else:
> 
> How can self.tok be None?
> 
> I suspect this is an artifact of PATCH 04.  Before, self.tok is
> initialized to the first token, then set to subsequent tokens (all str)
> in turn.  After, it's initialized to None, then set to tokens in turn.
> 

Actually, it's set to None to represent EOF. See here:

             elif self.tok == '\n':
	        if self.cursor == len(self.src):
                     self.tok = None
                     return

A more pythonic idiom would be to create a lexer class that behaves as 
an iterator, yielding Token class objects, and eventually, raising 
StopIteration.

(Not suggesting I do that now. I have thought about it though, yes.)

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/22] qapi/parser: add type hint annotations
  2021-04-25 12:34   ` Markus Armbruster
@ 2021-04-26 18:00     ` John Snow
  2021-04-27  8:21       ` Markus Armbruster
  2021-04-26 23:55     ` John Snow
  1 sibling, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-26 18:00 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 8:34 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> Annotations do not change runtime behavior.
>> This commit *only* adds annotations.
>>
>> (Annotations for QAPIDoc are in a later commit.)
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/parser.py | 61 ++++++++++++++++++++++++++++--------------
>>   1 file changed, 41 insertions(+), 20 deletions(-)
>>
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index d02a134aae9..f2b57d5642a 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -17,16 +17,29 @@
>>   from collections import OrderedDict
>>   import os
>>   import re
>> -from typing import List
>> +from typing import (
>> +    Dict,
>> +    List,
>> +    Optional,
>> +    Set,
>> +    Union,
>> +)
>>   
>>   from .common import match_nofail
>>   from .error import QAPISemError, QAPISourceError
>>   from .source import QAPISourceInfo
>>   
>>   
>> +#: Represents a parsed JSON object; semantically: one QAPI schema expression.
>> +Expression = Dict[str, object]
> 
> I believe you use this for what qapi-code-gen.txt calls a top-level
> expression.  TopLevelExpression is rather long, but it's used just once,
> and once more in RFC PATCH 13.  What do you think?
> 

Yeah, I left a comment on gitlab about this -- Sorry for splitting the 
stream, I didn't expect you to reply on-list without at least clicking 
the link ;)

You're right, this is TOP LEVEL EXPR. I actually do mean to start using 
it in expr.py as well too, in what will become (I think) pt5c: 
non-immediately-necessary parser cleanups.

I can use TopLevelExpression as the type name if you'd like, but if you 
have a suggestion for something shorter I am open to suggestions if 
"Expression" is way too overloaded/confusing.

>> +
>> +# Return value alias for get_expr().
>> +_ExprValue = Union[List[object], Dict[str, object], str, bool]
> 
> This is essentially a node in our pidgin-JSON parser's abstract syntax
> tree.  Tree roots use the Dict branch of this Union.
> 
> See also my review of PATCH 06.
> 

OK, I skimmed that one for now but I'll get back to it.

>> +
>> +
>>   class QAPIParseError(QAPISourceError):
>>       """Error class for all QAPI schema parsing errors."""
>> -    def __init__(self, parser, msg):
>> +    def __init__(self, parser: 'QAPISchemaParser', msg: str):
> 
> Forward reference needs quotes.  Can't be helped.
> 
>>           col = 1
>>           for ch in parser.src[parser.line_pos:parser.pos]:
>>               if ch == '\t':
>> @@ -38,7 +51,10 @@ def __init__(self, parser, msg):
>>   
>>   class QAPISchemaParser:
>>   
>> -    def __init__(self, fname, previously_included=None, incl_info=None):
>> +    def __init__(self,
>> +                 fname: str,
>> +                 previously_included: Optional[Set[str]] = None,
> 
> This needs to be Optional[] because using the empty set as default
> parameter value would be a dangerous trap.  Python's choice to evaluate
> the default parameter value just once has always been iffy.  Stirring
> static typing into the language makes it iffier.  Can't be helped.
> 

We could force it to accept a tuple and convert it into a set 
internally. It's just that we seem to use it for sets now.

Or ... in pt5c I float the idea of just passing the parent parser in, 
and I reach up and grab the previously-included stuff directly.

>> +                 incl_info: Optional[QAPISourceInfo] = None):
>>           self._fname = fname
>>           self._included = previously_included or set()
>>           self._included.add(os.path.abspath(self._fname))
>> @@ -46,20 +62,20 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>>   
>>           # Lexer state (see `accept` for details):
>>           self.info = QAPISourceInfo(self._fname, incl_info)
>> -        self.tok = None
>> +        self.tok: Optional[str] = None
> 
> Would
> 
>             self.tok: str
> 
> work?
> 

Not without modifications, because the Token being None is used to 
represent EOF.

>>           self.pos = 0
>>           self.cursor = 0
>> -        self.val = None
>> +        self.val: Optional[Union[bool, str]] = None
>>           self.line_pos = 0
>>   
>>           # Parser output:
>> -        self.exprs = []
>> -        self.docs = []
>> +        self.exprs: List[Expression] = []
>> +        self.docs: List[QAPIDoc] = []
>>   
>>           # Showtime!
>>           self._parse()
>>   
>> -    def _parse(self):
>> +    def _parse(self) -> None:
>>           cur_doc = None
>>   
>>           with open(self._fname, 'r', encoding='utf-8') as fp:
>> @@ -122,7 +138,7 @@ def _parse(self):
>>           self.reject_expr_doc(cur_doc)
>>   
>>       @staticmethod
>> -    def reject_expr_doc(doc):
>> +    def reject_expr_doc(doc: Optional['QAPIDoc']) -> None:
>>           if doc and doc.symbol:
>>               raise QAPISemError(
>>                   doc.info,
>> @@ -130,10 +146,14 @@ def reject_expr_doc(doc):
>>                   % doc.symbol)
>>   
>>       @staticmethod
>> -    def _include(include, info, incl_fname, previously_included):
>> +    def _include(include: str,
>> +                 info: QAPISourceInfo,
>> +                 incl_fname: str,
>> +                 previously_included: Set[str]
>> +                 ) -> Optional['QAPISchemaParser']:
>>           incl_abs_fname = os.path.abspath(incl_fname)
>>           # catch inclusion cycle
>> -        inf = info
>> +        inf: Optional[QAPISourceInfo] = info
>>           while inf:
>>               if incl_abs_fname == os.path.abspath(inf.fname):
>>                   raise QAPISemError(info, "inclusion loop for %s" % include)
>> @@ -152,9 +172,9 @@ def _include(include, info, incl_fname, previously_included):
>>               ) from err
>>   
>>       @staticmethod
>> -    def _pragma(name, value, info):
>> +    def _pragma(name: str, value: object, info: QAPISourceInfo) -> None:
> 
> value: object isn't wrong, but why not _ExprValue?
> 

I forget; admit this one slipped through from an earlier revision.

Right now: because _ExprValue is too broad. It really wants Dict[str, 
object] but the typing on get_expr() is challenging.

I'll revisit this with better excuses after I digest your patch 6 review.

>>   
>> -        def _check(name, value) -> List[str]:
>> +        def _check(name: str, value: object) -> List[str]:
>>               if (not isinstance(value, list) or
>>                       any([not isinstance(elt, str) for elt in value])):
>>                   raise QAPISemError(
>> @@ -176,7 +196,7 @@ def _check(name, value) -> List[str]:
>>           else:
>>               raise QAPISemError(info, "unknown pragma '%s'" % name)
>>   
>> -    def accept(self, skip_comment=True):
>> +    def accept(self, skip_comment: bool = True) -> None:
>>           while True:
>>               self.tok = self.src[self.cursor]
>>               self.pos = self.cursor
>> @@ -240,8 +260,8 @@ def accept(self, skip_comment=True):
>>                                        self.src[self.cursor-1:])
>>                   raise QAPIParseError(self, "stray '%s'" % match.group(0))
>>   
>> -    def get_members(self):
>> -        expr = OrderedDict()
>> +    def get_members(self) -> 'OrderedDict[str, object]':
>> +        expr: 'OrderedDict[str, object]' = OrderedDict()
>>           if self.tok == '}':
>>               self.accept()
>>               return expr
>> @@ -267,8 +287,8 @@ def get_members(self):
>>               if self.tok != "'":
>>                   raise QAPIParseError(self, "expected string")
>>   
>> -    def get_values(self):
>> -        expr = []
>> +    def get_values(self) -> List[object]:
>> +        expr: List[object] = []
>>           if self.tok == ']':
>>               self.accept()
>>               return expr
>> @@ -284,8 +304,9 @@ def get_values(self):
>>                   raise QAPIParseError(self, "expected ',' or ']'")
>>               self.accept()
>>   
>> -    def get_expr(self, nested):
>> +    def get_expr(self, nested: bool = False) -> _ExprValue:
>>           # TODO: Teach mypy that nested=False means the retval is a Dict.
>> +        expr: _ExprValue
>>           if self.tok != '{' and not nested:
>>               raise QAPIParseError(self, "expected '{'")
>>           if self.tok == '{':
>> @@ -303,7 +324,7 @@ def get_expr(self, nested):
>>                   self, "expected '{', '[', string, or boolean")
>>           return expr
>>   
>> -    def get_doc(self, info):
>> +    def get_doc(self, info: QAPISourceInfo) -> List['QAPIDoc']:
>>           if self.val != '##':
>>               raise QAPIParseError(
>>                   self, "junk after '##' at start of documentation comment")

Thanks!

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 16/22] qapi/parser: add docstrings
  2021-04-25 13:27   ` Markus Armbruster
@ 2021-04-26 18:26     ` John Snow
  2021-04-27  9:03       ` Markus Armbruster
  2021-05-07  1:34     ` John Snow
  1 sibling, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-26 18:26 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 9:27 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> Signed-off-by: John Snow <jsnow@redhat.com>
>>
>> ---
>>
>> My hubris is infinite.
> 
> Score one of the three principal virtues of a programmer ;)
> 

It was written before the prior review, but I promise I am slowing down 
on adding these. I just genuinely left them to help remind myself how 
these modules are actually structured and work so that I will be able to 
"pop in" quickly in the future and make a tactical, informed edit.

>> OK, I only added a few -- to help me remember how the parser works at a glance.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index dbbd0fcbc2f..8fc77808ace 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
>>   
>>   
>>   class QAPISchemaParser:
>> +    """
>> +    Performs parsing of a QAPI schema source file.
> 
> Actually, this parses one of two layers, see qapi-code-gen.txt section
> "Schema syntax".  Pointing there might help.
> 

It sort of parses one-and-a-half layers, but yes ... I know the 
distinction you're drawing here. This is *mostly* the JSON/AST level.

(With some upper-level or mid-level parsing for Pragmas and Includes.)

>>   
>> +    :param fname: Path to the source file
> 
> Either "Source file name" or "Source pathname", please.  I prefer "file
> name" for additional distance to "path" in the sense of a search path,
> i.e. a list of directory names.
> 

OK, I am not sure I have any ... prejudice about when to use which kind 
of description for these sorts of things. I'm happy to defer to you, but 
if there's some kind of existing standard vocabulary I'm trampling all 
over, feel free to point me to your preferred hacker dictionary.

Anyway, happy to adopt your phrasing here.

>> +    :param previously_included:
>> +        The absolute paths of previously included source files.
> 
> Either "absolute file name" or "absulute pathname".
> 

OK.

>> +        Only used by recursive calls to avoid re-parsing files.
> 
> Feels like detail, not sure it's needed here.
> 

You're probably right, but I suppose I wanted to hint/suggest that it 
was not necessary to feed it this argument for the root schema, but it 
was crucial for the recursive calls.

(Earlier I mentioned possibly just passing the parent parser in: that 
helps eliminate some of this ambiguity, too.)

>> +    :param incl_info:
>> +       `QAPISourceInfo` for the parent document.
>> +       This may be None if this is the root schema document.
> 
> Recommend s/This maybe //.
> 
> qapi-code-gen.txt calls a QAPI schema that uses include directives
> "modular", and the included files "sub-modules".  s/root schema
> document/root module/?
> 

Sure. All in favor of phrasing consistency.

(By the way: I did write up a draft for converting qapi-code-gen.txt to 
ReST format, and if I had finished that, it might be nice to hotlink to 
it here. I stopped for now because I wanted to solidify some conventions 
on how to markup certain constructs first, and wanted ... not to 
overwhelm you with more doc-wrangling.)

>> +
>> +    :ivar exprs: Resulting parsed expressions.
>> +    :ivar docs: Resulting parsed documentation blocks.
> 
> Uh, why are these here?  A doc string is interface documentation...
> 

These *are* interface. It is how callers are expected to get the results 
of parsing.

We could change that, of course, but that is absolutely how this class 
works today.

>> +
>> +    :raise OSError: For problems opening the root schema document.
>> +    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
>> +    :raise QAPISemError: For various semantic issues with the schema.
> 
> Should callers care for the difference between QAPIParseError and
> QAPISemError?
> 

That's up to the caller, I suppose. I just dutifully reported the truth 
of the matter here.

(That's a real non-answer, I know.)

I could always document QAPISourceError instead, with a note about the 
subclasses used for completeness.

(The intent is that QAPIError is always assumed/implied to be sufficient 
for capturing absolutely everything raised directly by this package, if 
you want to ignore the meanings behind them.)

>> +    """
>>       def __init__(self,
>>                    fname: str,
>>                    previously_included: Optional[Set[str]] = None,
>> @@ -77,6 +94,11 @@ def __init__(self,
>>           self._parse()
>>   
>>       def _parse(self) -> None:
>> +        """
>> +        Parse the QAPI schema document.
>> +
>> +        :return: None; results are stored in ``exprs`` and ``docs``.
> 
> Another ignorant doc string markup question...  how am I supposed to see
> that exprs and docs are attributes, and not global variables?
> 

I don't know, it's an unsolved mystery for me too. I need more time in 
the Sphinx dungeon to figure out how this stuff is supposed to work. 
You're right to wonder.

>> +        """
>>           cur_doc = None
>>   
>>           with open(self._fname, 'r', encoding='utf-8') as fp:
>> @@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
>>               raise QAPISemError(info, "unknown pragma '%s'" % name)
>>   
>>       def accept(self, skip_comment: bool = True) -> None:
>> +        """
>> +        Read the next lexeme and process it into a token.
>> +
>> +        :Object state:
>> +          :tok: represents the token type. See below for values.
>> +          :pos: is the position of the first character in the lexeme.
>> +          :cursor: is the position of the next character.
> 
> Define "position" :)  It's an index in self.src.
> 

Good call.

> self.cursor and self.pos are not used outside accept().  Not sure thet
> belong into interface documentation.
> 

Fair point, though I was on a mission to document exactly how the parser 
works even at the internal level, because accept(), despite being 
"public", is really more of an internal function here.

I am somewhat partial to documenting these state variables for my own 
sake so that I can remember the way this lexer behaves.

>> +          :val: is the variable value of the token, if any.
> 
> Missing: self.info, which *is* used outside accept().
> 

Oh, yes.

>> +
>> +        Single-character tokens:
>> +
>> +        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
>> +        ``LSQB``, and ``RSQB``.
> 
> "These include ..." is misleading.  This is the complete list of
> single-character tokens.
> 

I'm just testing your ability to recognize the difference between proper 
and improper subsets.

(Joking. I'll reword to avoid that ambiguity.)

>> +        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
>> +        lexeme.  ``val`` is ``None``.
>> +
>> +        Multi-character tokens:
>> +
>> +        - ``COMMENT``:
>> +
>> +          - This token is not normally yielded by the lexer, but it
>> +            can be when ``skip_comment`` is False.
>> +          - ``tok`` is the value ``"#"``.
>> +          - ``val`` is a string including all chars until end-of-line.
>> +
>> +        - ``STRING``:
>> +
>> +          - ``tok`` is the ``"'"``, the single quote.
>> +          - ``value`` is the string, *excluding* the quotes.
>> +
>> +        - ``TRUE`` and ``FALSE``:
>> +
>> +          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
>> +          - ``val`` is either ``True`` or ``False`` accordingly.
>> +
>> +        - ``NEWLINE`` and ``SPACE``:
>> +
>> +          - These are consumed by the lexer directly. ``line_pos`` and
>> +            ``info`` are advanced when ``NEWLINE`` is encountered.
>> +            ``tok`` is set to ``None`` upon reaching EOF.
>> +
>> +        :param skip_comment:
>> +            When false, return ``COMMENT`` tokens.
>> +            This is used when reading documentation blocks.
> 
> The doc string mostly describes possible state on return of accept().
> *Within* accept(), self.tok may be any character.
> 
> "Mostly" because item ``NEWLINE`` and ``SPACE`` is about something that
> happens within accept().
> 

Almost kinda-sorta. The value of "tok" is important there, too.

> Perhaps phrasing it as a postcondition would be clearer:
> 
>      Read and store the next token.
> 
>      On return, self.tok is the token type, self.info is describes its
>      source location, and self.value is the token's value.
> 
>      The possible token types and their values are
> 
>      ...
> 

OK, I will play with this suggestion while I try to clean up the docs.

>> +        """
>>           while True:
>>               self.tok = self.src[self.cursor]
>>               self.pos = self.cursor

Thanks for taking a look at this one.

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer
  2021-04-24  6:38   ` Markus Armbruster
  2021-04-26 17:39     ` John Snow
@ 2021-04-26 23:14     ` John Snow
  2021-04-27  6:07       ` Markus Armbruster
  1 sibling, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-26 23:14 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/24/21 2:38 AM, Markus Armbruster wrote:
> Mixing f-string and % interpolation.  I doubt we'd write it this way
> from scratch.  I recommend to either stick to % for now (leave
> conversion to f-strings for later), or conver the column formatting,
> too, even though it's not related to the patch's purpose.

True. Two thoughts:

1. I don't like using % formatting because it behaves differently from 
.format() and f-strings. My overwhelming desire is to never use it for 
this reason.

Example: {foo} will call foo's __format__ method, whereas "%s" % foo 
will simply add str(foo). They are not always the same, not even for 
built-in Python objects.


2. Cleaning up the formatting here without cleaning it up everywhere is 
a great way to get the patch NACKed. You have in the past been fairly 
reluctant to "While we're here" cleanups, so I am trying to cut back on 
them.


This is why my habit for f-strings keeps trickling in: whenever I have 
to rewrite any interpolation, I reach for the one that behaves most 
idiomatically for Python 3. I am trying to balance that against churn 
that's not in the stated goals of the patch.

In this case: I'll clean the rest of the method to match; and add a note 
to the commit message that explains why. I will get around to removing 
all of the f-strings, but I want to hit the clean linter baseline first 
to help guide the testing for such a series. I regret the awkward 
transitional period.

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard
  2021-04-25 12:32   ` Markus Armbruster
@ 2021-04-26 23:48     ` John Snow
  2021-04-27  7:15       ` Markus Armbruster
  0 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-26 23:48 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 8:32 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> TypeGuards wont exist in Python proper until 3.10. Ah well. We can hack
>> up our own by declaring this function to return the type we claim it
>> checks for and using this to safely downcast object -> List[str].
>>
>> In so doing, I bring this function in-line under _pragma so it can use
>> the 'info' object in its closure. Having done this, _pragma also now
>> no longer needs to take a 'self' parameter, so drop it.
>>
>> Rename it to just _check(), to help us out with the line-length -- and
>> now that it's contained within _pragma, it is contextually easier to see
>> how it's used anyway -- especially with types.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>>
>> ---
>>
>> I left (name, value) as args to avoid creating a fully magic "macro",
>> though, I thought this was too weird:
>>
>>      info.pragma.foobar = _check()
>>
>> and it looked more reasonable as:
>>
>>      info.pragma.foobar = _check(name, value)
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/parser.py | 26 +++++++++++++-------------
>>   1 file changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index 16fd36f8391..d02a134aae9 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -17,6 +17,7 @@
>>   from collections import OrderedDict
>>   import os
>>   import re
>> +from typing import List
>>   
>>   from .common import match_nofail
>>   from .error import QAPISemError, QAPISourceError
>> @@ -151,28 +152,27 @@ def _include(include, info, incl_fname, previously_included):
>>               ) from err
>>   
>>       @staticmethod
>> -    def _check_pragma_list_of_str(name, value, info):
>> -        if (not isinstance(value, list)
>> -                or any([not isinstance(elt, str) for elt in value])):
>> -            raise QAPISemError(
>> -                info,
>> -                "pragma %s must be a list of strings" % name)
>> +    def _pragma(name, value, info):
>> +
>> +        def _check(name, value) -> List[str]:
>> +            if (not isinstance(value, list) or
>> +                    any([not isinstance(elt, str) for elt in value])):
>> +                raise QAPISemError(
>> +                    info,
>> +                    "pragma %s must be a list of strings" % name)
>> +            return value
>>   
>> -    def _pragma(self, name, value, info):
>>           if name == 'doc-required':
>>               if not isinstance(value, bool):
>>                   raise QAPISemError(info,
>>                                      "pragma 'doc-required' must be boolean")
>>               info.pragma.doc_required = value
>>           elif name == 'command-name-exceptions':
>> -            self._check_pragma_list_of_str(name, value, info)
>> -            info.pragma.command_name_exceptions = value
>> +            info.pragma.command_name_exceptions = _check(name, value)
>>           elif name == 'command-returns-exceptions':
>> -            self._check_pragma_list_of_str(name, value, info)
>> -            info.pragma.command_returns_exceptions = value
>> +            info.pragma.command_returns_exceptions = _check(name, value)
>>           elif name == 'member-name-exceptions':
>> -            self._check_pragma_list_of_str(name, value, info)
>> -            info.pragma.member_name_exceptions = value
>> +            info.pragma.member_name_exceptions = _check(name, value)
>>           else:
>>               raise QAPISemError(info, "unknown pragma '%s'" % name)
> 
> While I appreciate the terseness, I'm not sure I like the generic name
> _check() for checking one of two special cases, namely "list of string".
> The other case being "boolean".  We could acquire more cases later.
> 

Yeah, sorry, just trying to make the line fit ...

The important thing is that we need to make sure this routine returns 
some known type. It's just that the block down here has very long lines.

Recommendations?



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/22] qapi/parser: add type hint annotations
  2021-04-25 12:34   ` Markus Armbruster
  2021-04-26 18:00     ` John Snow
@ 2021-04-26 23:55     ` John Snow
  2021-04-27  8:43       ` Markus Armbruster
  1 sibling, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-26 23:55 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 8:34 AM, Markus Armbruster wrote:
> value: object isn't wrong, but why not _ExprValue?
> 

Updated excuse:

because all the way back outside in _parse, we know that:

1. expr is a dict (because of get_expr(False))
2. expr['pragma'] is also a dict, because we explicitly check it there.
3. We iterate over the keys; all we know so far is that the values are 
... something.
4. _pragma()'s job is to validate the type(s) anyway.

More or less, the _ExprValue type union isn't remembered here -- even 
though it was once upon a time something returned by get_expr, it 
happened in a nested call that is now opaque to mypy in this context.

So, it's some combination of "That's all we know about it" and "It 
happens to be exactly sufficient for this function to operate."

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer
  2021-04-26 23:14     ` John Snow
@ 2021-04-27  6:07       ` Markus Armbruster
  0 siblings, 0 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27  6:07 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/24/21 2:38 AM, Markus Armbruster wrote:
>> Mixing f-string and % interpolation.  I doubt we'd write it this way
>> from scratch.  I recommend to either stick to % for now (leave
>> conversion to f-strings for later), or conver the column formatting,
>> too, even though it's not related to the patch's purpose.
>
> True. Two thoughts:
>
> 1. I don't like using % formatting because it behaves differently from
> .format() and f-strings. My overwhelming desire is to never use it for 
> this reason.
>
> Example: {foo} will call foo's __format__ method, whereas "%s" % foo
> will simply add str(foo). They are not always the same, not even for 
> built-in Python objects.

I only care for readability, which profits from local consistency.
Maybe I'll sing a different tune once I got actually bitten by the
difference between interpolation and f-strings.

> 2. Cleaning up the formatting here without cleaning it up everywhere
> is a great way to get the patch NACKed. You have in the past been
> fairly reluctant to "While we're here" cleanups, so I am trying to cut
> back on them.

Yes, I've been pushing back on such cleanups.  But it's really a
case-by-case issue.  When a patch fits on a page, squashing in a bit of
losely related cleanup is usually fine.  When it's long, keep it focused
on a single purpose.

> This is why my habit for f-strings keeps trickling in: whenever I have
> to rewrite any interpolation, I reach for the one that behaves most 
> idiomatically for Python 3. I am trying to balance that against churn
> that's not in the stated goals of the patch.
>
> In this case: I'll clean the rest of the method to match; and add a
> note to the commit message that explains why.

Okay.

>                                               I will get around to
> removing all of the f-strings,

The opposite, I presume.

>                                but I want to hit the clean linter
> baseline first to help guide the testing for such a series. I regret
> the awkward transitional period.

I'd leave converting interpolation to f-strings for later.

I can tolerate early, partial conversion, since I trust complete
conversion will happen, and as long as the resulting local inconsistency
isn't too grating.  Subjective, I know.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 07/22] qapi/parser: assert object keys are strings
  2021-04-26 17:46     ` John Snow
@ 2021-04-27  6:13       ` Markus Armbruster
  2021-04-27 14:15         ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27  6:13 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/25/21 3:27 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> The single quote token implies the value is a string. Assert this to be
>>> the case.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   scripts/qapi/parser.py | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>> index 6b443b1247e..8d1fe0ddda5 100644
>>> --- a/scripts/qapi/parser.py
>>> +++ b/scripts/qapi/parser.py
>>> @@ -246,6 +246,8 @@ def get_members(self):
>>>               raise QAPIParseError(self, "expected string or '}'")
>>>           while True:
>>>               key = self.val
>>> +            assert isinstance(key, str)  # Guaranteed by tok == "'"
>>> +
>>>               self.accept()
>>>               if self.tok != ':':
>>>                   raise QAPIParseError(self, "expected ':'")
>> 
>> The assertion is correct, but I wonder why mypy needs it.  Can you help?
>> 
>
> The lexer value can also be True/False (Maybe None? I forget) based on 

Yes, None for tokens like '{'.

> the Token returned. Here, since the token was the single quote, we know 
> that value must be a string.
>
> Mypy has no insight into the correlation between the Token itself and 
> the token value, because that relationship is not expressed via the type 
> system.

I understand that mypy can't prove implications like if self.tok == "'",
then self.val is a str.

What I'm curious about is why key needs to be known to be str here.
Hmm, is it so return expr type-checks once you add -> OrderedDict[str,
object] to the function?



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/22] qapi/parser: Fix typing of token membership tests
  2021-04-26 17:51     ` John Snow
@ 2021-04-27  7:00       ` Markus Armbruster
  2021-05-04  1:01         ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27  7:00 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/25/21 3:59 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> When the token can be None, we can't use 'x in "abc"' style membership
>>> tests to group types of tokens together, because 'None in "abc"' is a
>>> TypeError.
>>>
>>> Easy enough to fix, if not a little ugly.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   scripts/qapi/parser.py | 5 +++--
>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>> index 7f3c009f64b..16fd36f8391 100644
>>> --- a/scripts/qapi/parser.py
>>> +++ b/scripts/qapi/parser.py
>>> @@ -272,7 +272,7 @@ def get_values(self):
>>>           if self.tok == ']':
>>>               self.accept()
>>>               return expr
>>> -        if self.tok not in "{['tf":
>>> +        if self.tok is None or self.tok not in "{['tf":
>>>               raise QAPIParseError(
>>>                   self, "expected '{', '[', ']', string, or boolean")
>>>           while True:
>>> @@ -294,7 +294,8 @@ def get_expr(self, nested):
>>>           elif self.tok == '[':
>>>               self.accept()
>>>               expr = self.get_values()
>>> -        elif self.tok in "'tf":
>>> +        elif self.tok and self.tok in "'tf":
>>> +            assert isinstance(self.val, (str, bool))
>>>               expr = self.val
>>>               self.accept()
>>>           else:
>> 
>> How can self.tok be None?
>> 
>> I suspect this is an artifact of PATCH 04.  Before, self.tok is
>> initialized to the first token, then set to subsequent tokens (all str)
>> in turn.  After, it's initialized to None, then set to tokens in turn.
>> 
>
> Actually, it's set to None to represent EOF. See here:
>
>              elif self.tok == '\n':
> 	        if self.cursor == len(self.src):
>                      self.tok = None
>                      return

Alright, then this is actually a bug fix:

    $ echo -n "{'key': " | python3 scripts/qapi-gen.py /dev/stdin
    Traceback (most recent call last):
      File "scripts/qapi-gen.py", line 19, in <module>
        sys.exit(main.main())
      File "/work/armbru/qemu/scripts/qapi/main.py", line 93, in main
        generate(args.schema,
      File "/work/armbru/qemu/scripts/qapi/main.py", line 50, in generate
        schema = QAPISchema(schema_file)
      File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
        parser = QAPISchemaParser(fname)
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 59, in __init__
        self._parse()
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 81, in _parse
        expr = self.get_expr(False)
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 293, in get_expr
        expr = self.get_members()
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 260, in get_members
        expr[key] = self.get_expr(True)
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 297, in get_expr
        elif self.tok in "'tf":
    TypeError: 'in <string>' requires string as left operand, not NoneType

Likewise, the other hunk:

    $ echo -n "{'key': [" | python3 scripts/qapi-gen.py /dev/stdin
    Traceback (most recent call last):
      File "scripts/qapi-gen.py", line 19, in <module>
        sys.exit(main.main())
      File "/work/armbru/qemu/scripts/qapi/main.py", line 89, in main
        generate(args.schema,
      File "/work/armbru/qemu/scripts/qapi/main.py", line 51, in generate
        schema = QAPISchema(schema_file)
      File "/work/armbru/qemu/scripts/qapi/schema.py", line 860, in __init__
        parser = QAPISchemaParser(fname)
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 71, in __init__
        expr = self.get_expr(False)
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 270, in get_expr
        expr = self.get_members()
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 238, in get_members
        expr[key] = self.get_expr(True)
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 273, in get_expr
        expr = self.get_values()
      File "/work/armbru/qemu/scripts/qapi/parser.py", line 253, in get_values
        if self.tok not in "{['tf":
    TypeError: 'in <string>' requires string as left operand, not NoneType

Please add test cases.  I recommend adding them in a separate patch, so
this one's diff shows clearly what's being fixed.

There's a similar one in accept(), but it's safe:

            self.tok = self.src[self.cursor]
            ...
            elif self.tok in '{}:,[]':
                return

Regarding "if not a little ugly": instead of

    self.tok is None or self.tok not in "{['tf"

we could use

    self.tok not in tuple("{['tf")

> A more pythonic idiom would be to create a lexer class that behaves as 
> an iterator, yielding Token class objects, and eventually, raising 
> StopIteration.
>
> (Not suggesting I do that now. I have thought about it though, yes.)

Yes, let's resist the temptation to improve things into too many
directions at once.

Aside: using exceptions for perfectly unexceptional things like loop
termination is in questionable taste, but we gotta go with the Python
flow.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard
  2021-04-26 23:48     ` John Snow
@ 2021-04-27  7:15       ` Markus Armbruster
  2021-05-05 19:09         ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27  7:15 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/25/21 8:32 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> TypeGuards wont exist in Python proper until 3.10. Ah well. We can hack
>>> up our own by declaring this function to return the type we claim it
>>> checks for and using this to safely downcast object -> List[str].
>>>
>>> In so doing, I bring this function in-line under _pragma so it can use
>>> the 'info' object in its closure. Having done this, _pragma also now
>>> no longer needs to take a 'self' parameter, so drop it.
>>>
>>> Rename it to just _check(), to help us out with the line-length -- and
>>> now that it's contained within _pragma, it is contextually easier to see
>>> how it's used anyway -- especially with types.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>
>>> ---
>>>
>>> I left (name, value) as args to avoid creating a fully magic "macro",
>>> though, I thought this was too weird:
>>>
>>>      info.pragma.foobar = _check()
>>>
>>> and it looked more reasonable as:
>>>
>>>      info.pragma.foobar = _check(name, value)
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   scripts/qapi/parser.py | 26 +++++++++++++-------------
>>>   1 file changed, 13 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>> index 16fd36f8391..d02a134aae9 100644
>>> --- a/scripts/qapi/parser.py
>>> +++ b/scripts/qapi/parser.py
>>> @@ -17,6 +17,7 @@
>>>   from collections import OrderedDict
>>>   import os
>>>   import re
>>> +from typing import List
>>>   
>>>   from .common import match_nofail
>>>   from .error import QAPISemError, QAPISourceError
>>> @@ -151,28 +152,27 @@ def _include(include, info, incl_fname, previously_included):
>>>               ) from err
>>>   
>>>       @staticmethod
>>> -    def _check_pragma_list_of_str(name, value, info):
>>> -        if (not isinstance(value, list)
>>> -                or any([not isinstance(elt, str) for elt in value])):
>>> -            raise QAPISemError(
>>> -                info,
>>> -                "pragma %s must be a list of strings" % name)
>>> +    def _pragma(name, value, info):
>>> +
>>> +        def _check(name, value) -> List[str]:
>>> +            if (not isinstance(value, list) or
>>> +                    any([not isinstance(elt, str) for elt in value])):
>>> +                raise QAPISemError(
>>> +                    info,
>>> +                    "pragma %s must be a list of strings" % name)
>>> +            return value
>>>   
>>> -    def _pragma(self, name, value, info):
>>>           if name == 'doc-required':
>>>               if not isinstance(value, bool):
>>>                   raise QAPISemError(info,
>>>                                      "pragma 'doc-required' must be boolean")
>>>               info.pragma.doc_required = value
>>>           elif name == 'command-name-exceptions':
>>> -            self._check_pragma_list_of_str(name, value, info)
>>> -            info.pragma.command_name_exceptions = value
>>> +            info.pragma.command_name_exceptions = _check(name, value)
>>>           elif name == 'command-returns-exceptions':
>>> -            self._check_pragma_list_of_str(name, value, info)
>>> -            info.pragma.command_returns_exceptions = value
>>> +            info.pragma.command_returns_exceptions = _check(name, value)
>>>           elif name == 'member-name-exceptions':
>>> -            self._check_pragma_list_of_str(name, value, info)
>>> -            info.pragma.member_name_exceptions = value
>>> +            info.pragma.member_name_exceptions = _check(name, value)
>>>           else:
>>>               raise QAPISemError(info, "unknown pragma '%s'" % name)
>> 
>> While I appreciate the terseness, I'm not sure I like the generic name
>> _check() for checking one of two special cases, namely "list of string".
>> The other case being "boolean".  We could acquire more cases later.
>> 
>
> Yeah, sorry, just trying to make the line fit ...

I understand!

> The important thing is that we need to make sure this routine returns 
> some known type. It's just that the block down here has very long lines.
>
> Recommendations?

Moving the helper into _pragma() lets us drop shorten its name.  Still
too long to fit the line:

            info.pragma.command_returns_exceptions = check_list_str(name, value)

We could break the line in the argument list:

            info.pragma.command_returns_exceptions = check_list_str(name,
                                                                    value)

or

            info.pragma.command_returns_exceptions = check_list_str(
                                                            name, value)

Not exactly pretty.

We could shorten the assignment's target:

            pragma.command_returns_exceptions = check_list_str(name, value)

with

        pragma.info = pragma

before the conditional.  I'm not too fond of creating aliases, but this
one looks decent to me.  What do you think?



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/22] qapi/parser: add type hint annotations
  2021-04-26 18:00     ` John Snow
@ 2021-04-27  8:21       ` Markus Armbruster
  0 siblings, 0 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27  8:21 UTC (permalink / raw)
  To: John Snow
  Cc: Michael Roth, Cleber Rosa, Markus Armbruster, Eduardo Habkost,
	qemu-devel

John Snow <jsnow@redhat.com> writes:

> On 4/25/21 8:34 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> Annotations do not change runtime behavior.
>>> This commit *only* adds annotations.
>>>
>>> (Annotations for QAPIDoc are in a later commit.)
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   scripts/qapi/parser.py | 61 ++++++++++++++++++++++++++++--------------
>>>   1 file changed, 41 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>> index d02a134aae9..f2b57d5642a 100644
>>> --- a/scripts/qapi/parser.py
>>> +++ b/scripts/qapi/parser.py
>>> @@ -17,16 +17,29 @@
>>>   from collections import OrderedDict
>>>   import os
>>>   import re
>>> -from typing import List
>>> +from typing import (
>>> +    Dict,
>>> +    List,
>>> +    Optional,
>>> +    Set,
>>> +    Union,
>>> +)
>>>   
>>>   from .common import match_nofail
>>>   from .error import QAPISemError, QAPISourceError
>>>   from .source import QAPISourceInfo
>>>   
>>>   
>>> +#: Represents a parsed JSON object; semantically: one QAPI schema expression.
>>> +Expression = Dict[str, object]
>> 
>> I believe you use this for what qapi-code-gen.txt calls a top-level
>> expression.  TopLevelExpression is rather long, but it's used just once,
>> and once more in RFC PATCH 13.  What do you think?
>> 
>
> Yeah, I left a comment on gitlab about this -- Sorry for splitting the 
> stream, I didn't expect you to reply on-list without at least clicking 
> the link ;)

I should have; sorry about that.  I need to avoid distractions to stay
productive, and web browsers are basically gatling guns firing
armor-piercing distraction rounds at me.

> You're right, this is TOP LEVEL EXPR. I actually do mean to start using 
> it in expr.py as well too, in what will become (I think) pt5c: 
> non-immediately-necessary parser cleanups.
>
> I can use TopLevelExpression as the type name if you'd like, but if you 
> have a suggestion for something shorter I am open to suggestions if 
> "Expression" is way too overloaded/confusing.

TopLevelExpr?  Matches qapi-code-gen.txt's grammar:

    SCHEMA = TOP-LEVEL-EXPR...

    TOP-LEVEL-EXPR = DIRECTIVE | DEFINITION

    DIRECTIVE = INCLUDE | PRAGMA
    DEFINITION = ENUM | STRUCT | UNION | ALTERNATE | COMMAND | EVENT

>>> +
>>> +# Return value alias for get_expr().
>>> +_ExprValue = Union[List[object], Dict[str, object], str, bool]
>> 
>> This is essentially a node in our pidgin-JSON parser's abstract syntax
>> tree.  Tree roots use the Dict branch of this Union.
>> 
>> See also my review of PATCH 06.
>> 
>
> OK, I skimmed that one for now but I'll get back to it.
>
>>> +
>>> +
>>>   class QAPIParseError(QAPISourceError):
>>>       """Error class for all QAPI schema parsing errors."""
>>> -    def __init__(self, parser, msg):
>>> +    def __init__(self, parser: 'QAPISchemaParser', msg: str):
>> 
>> Forward reference needs quotes.  Can't be helped.
>> 
>>>           col = 1
>>>           for ch in parser.src[parser.line_pos:parser.pos]:
>>>               if ch == '\t':
>>> @@ -38,7 +51,10 @@ def __init__(self, parser, msg):
>>>   
>>>   class QAPISchemaParser:
>>>   
>>> -    def __init__(self, fname, previously_included=None, incl_info=None):
>>> +    def __init__(self,
>>> +                 fname: str,
>>> +                 previously_included: Optional[Set[str]] = None,
>> 
>> This needs to be Optional[] because using the empty set as default
>> parameter value would be a dangerous trap.  Python's choice to evaluate
>> the default parameter value just once has always been iffy.  Stirring
>> static typing into the language makes it iffier.  Can't be helped.
>> 
>
> We could force it to accept a tuple and convert it into a set 
> internally. It's just that we seem to use it for sets now.

Another candidate: frozenset.

> Or ... in pt5c I float the idea of just passing the parent parser in, 
> and I reach up and grab the previously-included stuff directly.
>
>>> +                 incl_info: Optional[QAPISourceInfo] = None):
>>>           self._fname = fname
>>>           self._included = previously_included or set()
>>>           self._included.add(os.path.abspath(self._fname))
>>> @@ -46,20 +62,20 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>>>   
>>>           # Lexer state (see `accept` for details):
>>>           self.info = QAPISourceInfo(self._fname, incl_info)
>>> -        self.tok = None
>>> +        self.tok: Optional[str] = None
>> 
>> Would
>> 
>>             self.tok: str
>> 
>> work?
>> 
>
> Not without modifications, because the Token being None is used to 
> represent EOF.

True.  I missed that, and thought we'd need None just as an initial
value here.

>>>           self.pos = 0
>>>           self.cursor = 0
>>> -        self.val = None
>>> +        self.val: Optional[Union[bool, str]] = None
>>>           self.line_pos = 0

[...]



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/22] qapi/parser: add type hint annotations
  2021-04-26 23:55     ` John Snow
@ 2021-04-27  8:43       ` Markus Armbruster
  2021-05-06  1:49         ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27  8:43 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/25/21 8:34 AM, Markus Armbruster wrote:
>> value: object isn't wrong, but why not _ExprValue?
>> 
>
> Updated excuse:
>
> because all the way back outside in _parse, we know that:
>
> 1. expr is a dict (because of get_expr(False))
> 2. expr['pragma'] is also a dict, because we explicitly check it there.

Yes:

                pragma = expr['pragma']
-->             if not isinstance(pragma, dict):
-->                 raise QAPISemError(
-->                     info, "value of 'pragma' must be an object")
                for name, value in pragma.items():
                    self._pragma(name, value, info)

> 3. We iterate over the keys; all we know so far is that the values are
> ... something.

Actually, *we* know more about the values.  get_expr() returns a tree
whose inner nodes are dict or list, and whose leaves are str or bool.
Therefore, the values are dict, list, str, or bool.

It's *mypy* that doesn't know, because it lacks recursive types.

I know that you're prbably using "we" in the sense of "the toolchain".
I'm annoying you with the difference between "the toolchain" and "we
(you, me, and other humans) because I'm concerned about us humans
dumbing ourselves down to mypy's level of understanding.

To be honest, I'm less and less sure typing these trees without the
necessary typing tools is worth the bother.  The notational overhead it
more oppressive than elsewhere, and yet the typing remains weak.  The
result fails to satisfy, and that's a constant source of discussions
(between us as well as just in my head) on how to best mitigate.

> 4. _pragma()'s job is to validate the type(s) anyway.

_pragma() can safely assume @value is dict, list, str, or bool.  It just
happens not to rely on this assumption.

> More or less, the _ExprValue type union isn't remembered here -- even
> though it was once upon a time something returned by get_expr, it 
> happened in a nested call that is now opaque to mypy in this context.

Understand.

> So, it's some combination of "That's all we know about it" and "It
> happens to be exactly sufficient for this function to operate."

I can accept "it's all mypy can figure out by itself, and it's good
enough to get the static checking we want".



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 16/22] qapi/parser: add docstrings
  2021-04-26 18:26     ` John Snow
@ 2021-04-27  9:03       ` Markus Armbruster
  2021-05-06  2:08         ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27  9:03 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/25/21 9:27 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>
>>> ---
>>>
>>> My hubris is infinite.
>> 
>> Score one of the three principal virtues of a programmer ;)
>> 
>
> It was written before the prior review, but I promise I am slowing down 
> on adding these. I just genuinely left them to help remind myself how 
> these modules are actually structured and work so that I will be able to 
> "pop in" quickly in the future and make a tactical, informed edit.
>
>>> OK, I only added a few -- to help me remember how the parser works at a glance.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 66 insertions(+)
>>>
>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>> index dbbd0fcbc2f..8fc77808ace 100644
>>> --- a/scripts/qapi/parser.py
>>> +++ b/scripts/qapi/parser.py
>>> @@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
>>>   
>>>   
>>>   class QAPISchemaParser:
>>> +    """
>>> +    Performs parsing of a QAPI schema source file.
>> 
>> Actually, this parses one of two layers, see qapi-code-gen.txt section
>> "Schema syntax".  Pointing there might help.
>> 
>
> It sort of parses one-and-a-half layers, but yes ... I know the 
> distinction you're drawing here. This is *mostly* the JSON/AST level.
>
> (With some upper-level or mid-level parsing for Pragmas and Includes.)

True.  I chose simplicity over purity.

>>>   
>>> +    :param fname: Path to the source file
>> 
>> Either "Source file name" or "Source pathname", please.  I prefer "file
>> name" for additional distance to "path" in the sense of a search path,
>> i.e. a list of directory names.
>> 
>
> OK, I am not sure I have any ... prejudice about when to use which kind 
> of description for these sorts of things. I'm happy to defer to you, but 
> if there's some kind of existing standard vocabulary I'm trampling all 
> over, feel free to point me to your preferred hacker dictionary.
>
> Anyway, happy to adopt your phrasing here.
>
>>> +    :param previously_included:
>>> +        The absolute paths of previously included source files.
>> 
>> Either "absolute file name" or "absulute pathname".
>> 
>
> OK.
>
>>> +        Only used by recursive calls to avoid re-parsing files.
>> 
>> Feels like detail, not sure it's needed here.
>> 
>
> You're probably right, but I suppose I wanted to hint/suggest that it 
> was not necessary to feed it this argument for the root schema, but it 
> was crucial for the recursive calls.

To me "if root schema, then nothing was previously included" feels
obvious enough :)  But if you want to spell out proper use of the
parameter, I recommend to stick to the interface, i.e. when to pass it,
not what the function does with it (in the hope that the reader can
then guess when to pass it).

> (Earlier I mentioned possibly just passing the parent parser in: that 
> helps eliminate some of this ambiguity, too.)
>
>>> +    :param incl_info:
>>> +       `QAPISourceInfo` for the parent document.
>>> +       This may be None if this is the root schema document.
>> 
>> Recommend s/This maybe //.
>> 
>> qapi-code-gen.txt calls a QAPI schema that uses include directives
>> "modular", and the included files "sub-modules".  s/root schema
>> document/root module/?
>> 
>
> Sure. All in favor of phrasing consistency.
>
> (By the way: I did write up a draft for converting qapi-code-gen.txt to 
> ReST format, and if I had finished that, it might be nice to hotlink to 
> it here. I stopped for now because I wanted to solidify some conventions 
> on how to markup certain constructs first, and wanted ... not to 
> overwhelm you with more doc-wrangling.)

Appreciated :)

>>> +
>>> +    :ivar exprs: Resulting parsed expressions.
>>> +    :ivar docs: Resulting parsed documentation blocks.
>> 
>> Uh, why are these here?  A doc string is interface documentation...
>> 
>
> These *are* interface. It is how callers are expected to get the results 
> of parsing.

You're right, but is the constructor the right place to document
attributes?

> We could change that, of course, but that is absolutely how this class 
> works today.
>
>>> +
>>> +    :raise OSError: For problems opening the root schema document.
>>> +    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
>>> +    :raise QAPISemError: For various semantic issues with the schema.
>> 
>> Should callers care for the difference between QAPIParseError and
>> QAPISemError?
>> 
>
> That's up to the caller, I suppose. I just dutifully reported the truth 
> of the matter here.
>
> (That's a real non-answer, I know.)
>
> I could always document QAPISourceError instead, with a note about the 
> subclasses used for completeness.
>
> (The intent is that QAPIError is always assumed/implied to be sufficient 
> for capturing absolutely everything raised directly by this package, if 
> you want to ignore the meanings behind them.)

I honestly can't think of a reason for catching anything but QAPIError.
The other classes exist only to give us more convenient ways to
construct instances of QAPIError.  We could replace them all by
functions returning QAPIError.

>>> +    """
>>>       def __init__(self,
>>>                    fname: str,
>>>                    previously_included: Optional[Set[str]] = None,
>>> @@ -77,6 +94,11 @@ def __init__(self,
>>>           self._parse()
>>>   
>>>       def _parse(self) -> None:
>>> +        """
>>> +        Parse the QAPI schema document.
>>> +
>>> +        :return: None; results are stored in ``exprs`` and ``docs``.
>> 
>> Another ignorant doc string markup question...  how am I supposed to see
>> that exprs and docs are attributes, and not global variables?
>> 
>
> I don't know, it's an unsolved mystery for me too. I need more time in 
> the Sphinx dungeon to figure out how this stuff is supposed to work. 
> You're right to wonder.

Use self.exprs and self.docs meanwhile?

>>> +        """
>>>           cur_doc = None
>>>   
>>>           with open(self._fname, 'r', encoding='utf-8') as fp:
>>> @@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
>>>               raise QAPISemError(info, "unknown pragma '%s'" % name)
>>>   
>>>       def accept(self, skip_comment: bool = True) -> None:
>>> +        """
>>> +        Read the next lexeme and process it into a token.
>>> +
>>> +        :Object state:
>>> +          :tok: represents the token type. See below for values.
>>> +          :pos: is the position of the first character in the lexeme.
>>> +          :cursor: is the position of the next character.
>> 
>> Define "position" :)  It's an index in self.src.
>> 
>
> Good call.
>
>> self.cursor and self.pos are not used outside accept().  Not sure thet
>> belong into interface documentation.
>> 
>
> Fair point, though I was on a mission to document exactly how the parser 
> works even at the internal level, because accept(), despite being 
> "public", is really more of an internal function here.
>
> I am somewhat partial to documenting these state variables for my own 
> sake so that I can remember the way this lexer behaves.

I understand why you want to document how they work.  Since they're
internal to accept(), a comment in accept() seems more proper than
accept() doc string.  Admittedly doesn't matter that much, as accept()
is internal to the class.

>>> +          :val: is the variable value of the token, if any.
>> 
>> Missing: self.info, which *is* used outside accept().
>> 
>
> Oh, yes.
>
>>> +
>>> +        Single-character tokens:
>>> +
>>> +        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
>>> +        ``LSQB``, and ``RSQB``.
>> 
>> "These include ..." is misleading.  This is the complete list of
>> single-character tokens.
>> 
>
> I'm just testing your ability to recognize the difference between proper 
> and improper subsets.
>
> (Joking. I'll reword to avoid that ambiguity.)
>
>>> +        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
>>> +        lexeme.  ``val`` is ``None``.
>>> +
>>> +        Multi-character tokens:
>>> +
>>> +        - ``COMMENT``:
>>> +
>>> +          - This token is not normally yielded by the lexer, but it
>>> +            can be when ``skip_comment`` is False.
>>> +          - ``tok`` is the value ``"#"``.
>>> +          - ``val`` is a string including all chars until end-of-line.
>>> +
>>> +        - ``STRING``:
>>> +
>>> +          - ``tok`` is the ``"'"``, the single quote.
>>> +          - ``value`` is the string, *excluding* the quotes.
>>> +
>>> +        - ``TRUE`` and ``FALSE``:
>>> +
>>> +          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
>>> +          - ``val`` is either ``True`` or ``False`` accordingly.
>>> +
>>> +        - ``NEWLINE`` and ``SPACE``:
>>> +
>>> +          - These are consumed by the lexer directly. ``line_pos`` and
>>> +            ``info`` are advanced when ``NEWLINE`` is encountered.
>>> +            ``tok`` is set to ``None`` upon reaching EOF.
>>> +
>>> +        :param skip_comment:
>>> +            When false, return ``COMMENT`` tokens.
>>> +            This is used when reading documentation blocks.
>> 
>> The doc string mostly describes possible state on return of accept().
>> *Within* accept(), self.tok may be any character.
>> 
>> "Mostly" because item ``NEWLINE`` and ``SPACE`` is about something that
>> happens within accept().
>> 
>
> Almost kinda-sorta. The value of "tok" is important there, too.

--verbose?

>> Perhaps phrasing it as a postcondition would be clearer:
>> 
>>      Read and store the next token.
>> 
>>      On return, self.tok is the token type, self.info is describes its
>>      source location, and self.value is the token's value.
>> 
>>      The possible token types and their values are
>> 
>>      ...
>> 
>
> OK, I will play with this suggestion while I try to clean up the docs.
>
>>> +        """
>>>           while True:
>>>               self.tok = self.src[self.cursor]
>>>               self.pos = self.cursor
>
> Thanks for taking a look at this one.

Thank *you* for documenting my[*] code!


[*] Some of it mine in the sense I wrote it, some of it mine in the
sense I maintain it.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/22] qapi/source: [RFC] add "with_column" contextmanager
  2021-04-22  3:07 ` [PATCH 02/22] qapi/source: [RFC] add "with_column" contextmanager John Snow
@ 2021-04-27  9:33   ` Markus Armbruster
  0 siblings, 0 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27  9:33 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> This is a silly one, but... it's important to have fun.
>
> This patch isn't *needed*, it's here as an RFC. In trying to experiment
> with different ways to solve the problem addressed by the previous
> commit, I kept getting confused at how the "source location" string with
> line and column number was built across two different classes.
>
> (i.e. QAPISourceError appends the column, but QAPISourceInfo does not
> track column information natively.)
>
> I was afraid to try and fully implement column number directly in
> QAPISourceInfo on the chance that it might have undesirable effects, so
> I came up with a quick "hack" to centralize the 'location' information
> generation.
>
> It's a little goofy, but it works :')
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/error.py  |  8 +++-----
>  scripts/qapi/source.py | 23 ++++++++++++++++++++++-
>  2 files changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/scripts/qapi/error.py b/scripts/qapi/error.py
> index e35e4ddb26a..6b04f56f8a2 100644
> --- a/scripts/qapi/error.py
> +++ b/scripts/qapi/error.py
> @@ -39,11 +39,9 @@ def __init__(self,
>  
>      def __str__(self) -> str:
>          assert self.info is not None
> -        loc = str(self.info)
> -        if self.col is not None:
> -            assert self.info.line is not None
> -            loc += ':%s' % self.col
> -        return loc + ': ' + self.msg
> +        with self.info.at_column(self.col):
> +            loc = str(self.info)
> +        return f"{loc}: {self.msg}"
>  
>  
>  class QAPISemError(QAPISourceError):
> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
> index 1ade864d7b9..21090b9fe78 100644
> --- a/scripts/qapi/source.py
> +++ b/scripts/qapi/source.py
> @@ -9,8 +9,14 @@
>  # This work is licensed under the terms of the GNU GPL, version 2.
>  # See the COPYING file in the top-level directory.
>  
> +from contextlib import contextmanager
>  import copy
> -from typing import List, Optional, TypeVar
> +from typing import (
> +    Iterator,
> +    List,
> +    Optional,
> +    TypeVar,
> +)
>  
>  
>  class QAPISchemaPragma:
> @@ -35,6 +41,7 @@ def __init__(self, fname: str, line: int,
>                   parent: Optional['QAPISourceInfo']):
>          self.fname = fname
>          self.line = line
> +        self._column: Optional[int] = None
>          self.parent = parent
>          self.pragma: QAPISchemaPragma = (
>              parent.pragma if parent else QAPISchemaPragma()
> @@ -52,9 +59,14 @@ def next_line(self: T) -> T:
>          return info
>  
>      def loc(self) -> str:
> +        # column cannot be provided meaningfully when line is absent.
> +        assert self.line or self._column is None
> +
>          ret = self.fname
>          if self.line is not None:
>              ret += ':%d' % self.line
> +        if self._column is not None:
> +            ret += ':%d' % self._column
>          return ret
>  
>      def in_defn(self) -> str:
> @@ -71,5 +83,14 @@ def include_path(self) -> str:
>              parent = parent.parent
>          return ret
>  
> +    @contextmanager
> +    def at_column(self, column: Optional[int]) -> Iterator[None]:
> +        current_column = self._column
> +        try:
> +            self._column = column
> +            yield
> +        finally:
> +            self._column = current_column
> +
>      def __str__(self) -> str:
>          return self.include_path() + self.in_defn() + self.loc()

Uh...

We create one QAPISourceInfo instance per source line.  First line in
QAPISchemaParser.__init__()

        self.info = QAPISourceInfo(fname, 1, incl_info)

and subsequent ones in .accept()

                self.info = self.info.next_line()

These instances get shared by everything on their line.

Your patch adds a _column attribute to these objects.  Because it
applies to everything that shares the object, setting it is kind of
wrong.  You therefore only ever set it temporarily, in
QAPISourceError.__str__().

This works in the absence of concurrency (which means it just works,
this being Python), but *ugh*!

The obvious extension of QAPISourceInfo to columns would create one
instance per source character.  Too egregiously wasteful for my taste.

Let's start over with the *why*: what is it exactly that bothers you in
the code before this patch?  Is it the spatial distance between the
formatting of "file:line:col: msg" in QAPISourceError.__str_() and the
formatting of the file:line part in QAPISourceInfo.loc()?



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/22] qapi/parser: Assert lexer value is a string
  2021-04-26 17:43     ` John Snow
@ 2021-04-27 12:30       ` Markus Armbruster
  2021-04-27 13:58         ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27 12:30 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/24/21 4:33 AM, Markus Armbruster wrote:
>> The second operand of assert provides no additional information.  Please
>> drop it.
>
> I don't agree with "no additional information", strictly.
>
> I left you a comment on gitlab before you started reviewing on-list. 
> What I wrote there:
>
> "Markus: I know you're not a fan of these, but I wanted a suggestion on 
> how to explain why this must be true in case it wasn't obvious to 
> someone else in the future."

But the second operand doesn't explain anything.  Look:

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index f519518075e..c75434e75a5 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -303,6 +303,7 @@ def get_doc(self, info):
         cur_doc = QAPIDoc(self, info)
         self.accept(False)
         while self.tok == '#':
+            assert isinstance(self.val, str), "Expected str value"
             if self.val.startswith('##'):
                 # End of doc comment
                 if self.val != '##':

The second operand paraphrases the first one in prose rather than code.
An actual *explanation* would instead tell me why the first operand must
be true.

To do that, I'd point to self.accept()'s postcondition.  Which
(informally) is

    self.tok in ('#', '{', ... )
    and self.tok == '#' implies self.val is a str
    and self.tok == '{' implies self.val is None
    ...

I believe this is required working knowledge for understanding the
parser.  Your PATCH 16 puts it in a doc string, so readers don't have to
extract it from code.  Makes sense.

It's not going to fit into a workable second operand here, I'm afraid.

I assume you need this assertion for mypy.  If yes, let's get the job
done with minimal fuss.  If no, please drop the assertion entirely.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/22] qapi/parser: Don't try to handle file errors
  2021-04-23 19:20     ` John Snow
@ 2021-04-27 13:47       ` Markus Armbruster
  2021-04-27 17:58         ` John Snow
  0 siblings, 1 reply; 67+ messages in thread
From: Markus Armbruster @ 2021-04-27 13:47 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/23/21 11:46 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> The short-ish version of what motivates this patch is:
>>>
>>> - The parser initializer does not possess adequate context to write a
>>>    good error message -- It tries to determine the caller's semantic
>>>    context.
>> 
>> I'm not sure I get what you're trying to say here.
>> 
>
> I mean: this __init__ method does not *know* who is calling it or why. 
> Of course, *we* do, because the code base is finite and nobody else but 
> us is calling into it.
>
> I mean to point out that the initializer has to do extra work (Just a 
> little) to determine what the calling context is and raise an error 
> accordingly.
>
> Example: If we have a parent info context, we raise an error in the 
> context of the caller. If we don't, we have to create a new presumed 
> context (using the weird None SourceInfo object).

I guess you mean

            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),

I can't see other instances of messing with context.

> So I just mean to say:
>
> "Let the caller, who unambiguously always has the exactly correct 
> context worry about what the error message ought to be."
>
>>> - We don't want to allow QAPISourceInfo(None, None, None) to exist.
>>> - Errors made using such an object are currently incorrect.
>>> - It's not technically a semantic error if we cannot open the schema
>>> - There are various typing constraints that make mixing these two cases
>>>    undesirable for a single special case.
>> 
>> These I understand.
>> 
>>> - The current open block in parser's initializer will leak file
>>>    pointers, because it isn't using a with statement.
>> 
>> Uh, isn't the value returned by open() reference-counted?  @fp is the
>> only reference...
>> 
>
> Yeah, eventually. O:-)
>
> Whenever the GC runs. OK, it's not really an apocalypse error, but it 
> felt strange to rewrite a try/except and then write it using bad hygiene 
> on purpose in the name of a more isolated commit.

I agree use of with is an improvement (it's idiomatic).  We shouldn't
call it a leak fix, though.

>>> Here's the details in why this got written the way it did, and why a few
>>> disparate issues are rolled into one commit. (They're hard to fix
>>> separately without writing really weird stuff that'd be harder to
>>> review.)
>>>
>>> The error message string here is incorrect:
>>>
>>>> python3 qapi-gen.py 'fake.json'
>>> qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>> 
>> Regressed in commit 52a474180a "qapi-gen: Separate arg-parsing from
>> generation" (v5.2.0).
>> 
>
> Mea Culpa. Didn't realize it wasn't tested, and I didn't realize at the 
> time that the two kinds of errors here were treated differently.

Our tests cover the schema language, not qapi-gen's CLI language.  The
gap feels tolerable.

>> Before commit c615550df3 "qapi: Improve source file read error handling"
>> (v4.2.0), it was differently bad (uncaught exception).
>> 
>> Commit c615550df3 explains why the funny QAPISourceInfo exists:
>> 
>>      Reporting open or read failure for the main schema file needs a
>>      QAPISourceInfo representing "no source".  Make QAPISourceInfo cope
>>      with fname=None.
>> 
>
> I am apparently not the first or the last person to dream of wanting a 
> QAPISourceInfo that represents "Actually, there's no source location!"
>
>> The commit turned QAPISourceInfo into the equivalent of a disjoint union
>> of
>> 
>> 1. A position in a source file (.fname is a str)
>> 
>> 2. "Not in any source file" (.fname is None)
>> 
>> This is somewhat similar to struct Location in C, which has
>> 
>> 1. LOC_FILE: a position in a source file
>> 
>> 2. LOC_CMDLINE: a range of command line arguments
>> 
>> 3. LOC_NONE: no location information
>> 
>> Abstracting locations this way lets error_report() do the right thing
>> whether its complaining about the command line, a monitor command, or a
>> configuration file read with -readconfig.
>> 
>> Your patch demonstrates that qapi-gen has much less need for abstracting
>> sources: we use 2. "Not in any source file" only for reading the main
>> schema file.
>> 
>
> Yes. I got the impression that you didn't want to pursue more abstract 
> QSI constructs based on earlier work, so going the other way and 
> *removing* them seemed like the faster way to achieve a clean type 
> system here.
>
>>> In pursuing it, we find that QAPISourceInfo has a special accommodation
>>> for when there's no filename.
>> 
>> Yes:
>> 
>>      def loc(self) -> str:
>> -->     if self.fname is None:
>> -->         return sys.argv[0]
>>          ret = self.fname
>>          if self.line is not None:
>>              ret += ':%d' % self.line
>>          return ret
>> 
>>>                                Meanwhile, we intend to type info.fname as
>>> str; something we always have.
>> 
>> Do you mean "as non-optional str"?
>> 
>
> Yeah. I typed it originally as `str`, but the analyzer missed that we 
> check the field to see if it's None, which is misleading.
>
>>> To remove this, we need to not have a "fake" QAPISourceInfo object. We
>> 
>> We may well want to, but I doubt we *need* to.  There are almost
>> certainly other ways to fix the bug.  I don't see a need to explore
>> them, though.
>> 
>
> Either we build out the fake QSI into a proper subtype, or we remove it 
> -- those are the two obvious options. Building it out is almost 
> certainly more work than this patch.
>
>>> also don't want to explicitly begin accommodating QAPISourceInfo being
>>> None, because we actually want to eventually prove that this can never
>>> happen -- We don't want to confuse "The file isn't open yet" with "This
>>> error stems from a definition that wasn't defined in any file".
>> 
>> Yes, encoding both "poisoned source info not to be used with actual
>> errors" and "'fake' source info not pointing to a source file" as None
>> would be a mistake.
>> 
>
> :)
>
>>> (An earlier series tried to create an official dummy object, but it was
>>> tough to prove in review that it worked correctly without creating new
>>> regressions. This patch avoids trying to re-litigate that discussion.
>>>
>>> We would like to first prove that we never raise QAPISemError for any
>>> built-in object before we relent and add "special" info objects. We
>>> aren't ready to do that yet, so crashing is preferred.)
>>>
>>> So, how to solve this mess?
>>>
>>> Here's one way: Don't try to handle errors at a level with "mixed"
>>> semantic levels; i.e. don't try to handle inclusion errors (should
>>> report a source line where the include was triggered) with command line
>>> errors (where we specified a file we couldn't read).
>>>
>>> Simply remove the error handling from the initializer of the
>>> parser. Pythonic! Now it's the caller's job to figure out what to do
>>> about it. Handle the error in QAPISchemaParser._include() instead, where
>>> we do have the correct semantic context to not need to play games with
>>> the error message generation.
>>>
>>> Next, to re-gain a nice error at the top level, add a new try/except
>>> into qapi/main.generate(). Now the error looks sensible:
>> 
>> Missing "again" after "sensible" ;-P
>> 
>
> okayokayokayfine
>
>>>
>>>> python3 qapi-gen.py 'fake.json'
>>> qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>>>
>>> Lastly, with this usage gone, we can remove the special type violation
>>> from QAPISourceInfo, and all is well with the world.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   scripts/qapi/main.py   |  8 +++++++-
>>>   scripts/qapi/parser.py | 18 +++++++++---------
>>>   scripts/qapi/source.py |  3 ---
>>>   3 files changed, 16 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
>>> index 703e7ed1ed5..70f8aa86f37 100644
>>> --- a/scripts/qapi/main.py
>>> +++ b/scripts/qapi/main.py
>>> @@ -48,7 +48,13 @@ def generate(schema_file: str,
>>>       """
>>>       assert invalid_prefix_char(prefix) is None
>>>   
>>> -    schema = QAPISchema(schema_file)
>>> +    try:
>>> +        schema = QAPISchema(schema_file)
>>> +    except OSError as err:
>>> +        raise QAPIError(
>>> +            f"can't read schema file '{schema_file}': {err.strerror}"
>>> +        ) from err
>>> +
>>>       gen_types(schema, output_dir, prefix, builtins)
>>>       gen_visit(schema, output_dir, prefix, builtins)
>>>       gen_commands(schema, output_dir, prefix)
>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>> index ca5e8e18e00..b378fa33807 100644
>>> --- a/scripts/qapi/parser.py
>>> +++ b/scripts/qapi/parser.py
>>> @@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>>>           previously_included = previously_included or set()
>>>           previously_included.add(os.path.abspath(fname))
>>>   
>>> -        try:
>>> -            fp = open(fname, 'r', encoding='utf-8')
>>> +        # Allow the caller to catch this error.
>> 
>> "this error"?  I understand what you mean now, but I'm not sure I will
>> in three months, when I won't have the context I have now.
>> 
>
> Yep, OK.
>
> # May raise OSError, allow the caller to handle it.

Okay.

>>> +        with open(fname, 'r', encoding='utf-8') as fp:
>>>               self.src = fp.read()
>>> -        except IOError as e:
>>> -            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
>>> -                               "can't read %s file '%s': %s"
>>> -                               % ("include" if incl_info else "schema",
>>> -                                  fname,
>>> -                                  e.strerror))
>>>   
>>>           if self.src == '' or self.src[-1] != '\n':
>>>               self.src += '\n'
>>> @@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
>>>           if incl_abs_fname in previously_included:
>>>               return None
>>>   
>>> -        return QAPISchemaParser(incl_fname, previously_included, info)
>>> +        try:
>>> +            return QAPISchemaParser(incl_fname, previously_included, info)
>>> +        except OSError as err:
>>> +            raise QAPISemError(
>>> +                info,
>>> +                f"can't read include file '{incl_fname}': {err.strerror}"
>>> +            ) from err
>>>   
>>>       def _check_pragma_list_of_str(self, name, value, info):
>>>           if (not isinstance(value, list)
>> 
>> Before the patch, only IOError from open() and .read() get converted to
>> QAPISemError, and therefore caught by main().
>> 
>> The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.
>> 
>
> "Changed in version 3.3: EnvironmentError, IOError, WindowsError, 
> socket.error, select.error and mmap.error have been merged into OSError, 
> and the constructor may return a subclass."
>
>  >>> OSError == IOError
> True
>
> (No, I didn't know this before I wrote it. I just intentionally wanted 
> to catch everything that open() might return, which I had simply assumed 
> was not fully captured by IOError. Better to leave it as OSError now to 
> avoid misleading anyone into thinking it's more narrow than it really is.)

Good to know.

However, I was talking about the code covered by try ... except OSError
(or IOError, or whatever).  Before the patch, it's just open() and
.read().  Afterwards it's all of .__init__().

Could anything else in .__init__() possibly raise OSError?  Probably
not, but it's not trivially obvious.  Which makes me go "hmm."

"Hmm" isn't "no", it's just "hmm".

>>> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
>>> index 03b6ede0828..1ade864d7b9 100644
>>> --- a/scripts/qapi/source.py
>>> +++ b/scripts/qapi/source.py
>>> @@ -10,7 +10,6 @@
>>>   # See the COPYING file in the top-level directory.
>>>   
>>>   import copy
>>> -import sys
>>>   from typing import List, Optional, TypeVar
>>>   
>>>   
>>> @@ -53,8 +52,6 @@ def next_line(self: T) -> T:
>>>           return info
>>>   
>>>       def loc(self) -> str:
>>> -        if self.fname is None:
>>> -            return sys.argv[0]
>>>           ret = self.fname
>>>           if self.line is not None:
>>>               ret += ':%d' % self.line
>> 
>> tests/qapi-schema/test-qapi.py also needs an update.  Before the patch:
>> 
>>      $ PYTHONPATH=scripts python3 tests/qapi-schema/test-qapi.py nonexistent
>>      tests/qapi-schema/test-qapi.py: can't read schema file 'nonexistent.json': No such file or directory
>> 
>> After:
>> 
>>      Traceback (most recent call last):
>>        File "tests/qapi-schema/test-qapi.py", line 207, in <module>
>>          main(sys.argv)
>>        File "tests/qapi-schema/test-qapi.py", line 201, in main
>>          status |= test_and_diff(test_name, dir_name, args.update)
>>        File "tests/qapi-schema/test-qapi.py", line 129, in test_and_diff
>>          test_frontend(os.path.join(dir_name, test_name + '.json'))
>>        File "tests/qapi-schema/test-qapi.py", line 109, in test_frontend
>>          schema = QAPISchema(fname)
>>        File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
>>          parser = QAPISchemaParser(fname)
>>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 44, in __init__
>>          with open(fname, 'r', encoding='utf-8') as fp:
>>      FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.json'
>> 
>
> Probably something that should be added to the actual battery of tests 
> somehow, yeah? I can't prevent regressions in invocations that don't get 
> run O:-)

I can, by reviewing patches ;-P

You're welcome to contribute automated tests covering qapi-gen and
test-qapi invocations.  I never bothered automating this, and I wouldn't
bother now.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/22] qapi/parser: Assert lexer value is a string
  2021-04-27 12:30       ` Markus Armbruster
@ 2021-04-27 13:58         ` John Snow
  0 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-27 13:58 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

On 4/27/21 8:30 AM, Markus Armbruster wrote:
> I assume you need this assertion for mypy.  If yes, let's get the job
> done with minimal fuss.  If no, please drop the assertion entirely.

Yep, needed for mypy. You are right that these assertions are for 
clarifying postconditions of accept() that tie together the value of 
.tok with the type of .val.

I'll replace the message with a better comment, but we do still need either:

(1) A way to make the return from accept() statically type-safe, or
(2) The assertion.

As with most of the patches after part one of this series, I've opted 
for the quicker thing to speed us along to a clean mypy baseline.

(Though I have spent some time prototyping solutions for #1...)

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 07/22] qapi/parser: assert object keys are strings
  2021-04-27  6:13       ` Markus Armbruster
@ 2021-04-27 14:15         ` John Snow
  0 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-27 14:15 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

On 4/27/21 2:13 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 4/25/21 3:27 AM, Markus Armbruster wrote:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> The single quote token implies the value is a string. Assert this to be
>>>> the case.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>    scripts/qapi/parser.py | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>>> index 6b443b1247e..8d1fe0ddda5 100644
>>>> --- a/scripts/qapi/parser.py
>>>> +++ b/scripts/qapi/parser.py
>>>> @@ -246,6 +246,8 @@ def get_members(self):
>>>>                raise QAPIParseError(self, "expected string or '}'")
>>>>            while True:
>>>>                key = self.val
>>>> +            assert isinstance(key, str)  # Guaranteed by tok == "'"
>>>> +
>>>>                self.accept()
>>>>                if self.tok != ':':
>>>>                    raise QAPIParseError(self, "expected ':'")
>>>
>>> The assertion is correct, but I wonder why mypy needs it.  Can you help?
>>>
>>
>> The lexer value can also be True/False (Maybe None? I forget) based on
> 
> Yes, None for tokens like '{'.
> 
>> the Token returned. Here, since the token was the single quote, we know
>> that value must be a string.
>>
>> Mypy has no insight into the correlation between the Token itself and
>> the token value, because that relationship is not expressed via the type
>> system.
> 
> I understand that mypy can't prove implications like if self.tok == "'",
> then self.val is a str.
> 
> What I'm curious about is why key needs to be known to be str here.
> Hmm, is it so return expr type-checks once you add -> OrderedDict[str,
> object] to the function?
> 

Oh, yes.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 06/22] qapi/parser: assert get_expr returns object in outer loop
  2021-04-25  7:23   ` Markus Armbruster
@ 2021-04-27 15:03     ` John Snow
  0 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-04-27 15:03 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 3:23 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> get_expr can return many things, depending on where it is used. In the
>> outer parsing loop, we expect and require it to return a dict.
>>
>> (It's (maybe) a bit involved to teach mypy that when nested is False,
>> this is already always True. I'll look into it later, maybe.)
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/parser.py | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index c75434e75a5..6b443b1247e 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -78,6 +78,8 @@ def _parse(self):
>>                   continue
>>   
>>               expr = self.get_expr(False)
>> +            assert isinstance(expr, dict)  # Guaranteed when nested=False
>> +
>>               if 'include' in expr:
>>                   self.reject_expr_doc(cur_doc)
>>                   if len(expr) != 1:
>> @@ -278,6 +280,7 @@ def get_values(self):
>>               self.accept()
>>   
>>       def get_expr(self, nested):
>> +        # TODO: Teach mypy that nested=False means the retval is a Dict.
>>           if self.tok != '{' and not nested:
>>               raise QAPIParseError(self, "expected '{'")
>>           if self.tok == '{':
> 
> The better place to assert a post condition would be ...
> 
>                  self.accept()
>                  expr = self.get_members()
>              elif self.tok == '[':
>                  self.accept()
>                  expr = self.get_values()
>              elif self.tok in "'tf":
>                  expr = self.val
>                  self.accept()
>              else:
>                  raise QAPIParseError(
>                      self, "expected '{', '[', string, or boolean")
> 
> ... here.
> 
>              return expr
> 
> But then it may not help mypy over the hump, which is the whole point of
> the patch.
> 

Right, the problem is that 'expr' here actually doesn't have to be a 
Dict. It can be a List, str, or bool too.

The type narrowing occurs only when you pass nested=False.

> Alternative ways to skin this cat:
> 
> * Split get_object() off get_expr().
> 
>    diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>    index ca5e8e18e0..c79b3c7d08 100644
>    --- a/scripts/qapi/parser.py
>    +++ b/scripts/qapi/parser.py
>    @@ -262,9 +262,12 @@ def get_values(self):
>                     raise QAPIParseError(self, "expected ',' or ']'")
>                 self.accept()
> 
>    -    def get_expr(self, nested):
>    -        if self.tok != '{' and not nested:
>    +    def get_object(self):
>    +        if self.tok != '{':
>                 raise QAPIParseError(self, "expected '{'")
>    +        return self.get_expr()
>    +
>    +    def get_expr(self):
>             if self.tok == '{':
>                 self.accept()
>                 expr = self.get_members()
> 

That'd work well. no @overload.

> * Shift "top-level expression must be dict" up:
> 
>      diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>      index ca5e8e18e0..ee8cbf3531 100644
>      --- a/scripts/qapi/parser.py
>      +++ b/scripts/qapi/parser.py
>      @@ -68,7 +68,10 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>                           self.docs.append(cur_doc)
>                       continue
> 
>      -            expr = self.get_expr(False)
>      +            expr = self.get_expr()
>      +            if not isinstance(expr, OrderedDict):
>      +                raise QAPISemError(
>      +                    info, "top-level expression must be an object")

Also works. As a benefit (to both previous suggestions), it leaves 
get_expr completely generic and expresses the grammatical constraint up 
here in the parseloop. It leaves the JSON parsing more generic and 
further consolidates QAPI Schema specific stuff to this region.

>                   if 'include' in expr:
>                       self.reject_expr_doc(cur_doc)
>                       if len(expr) != 1:
>      @@ -262,9 +265,7 @@ def get_values(self):
>                       raise QAPIParseError(self, "expected ',' or ']'")
>                   self.accept()
> 
>      -    def get_expr(self, nested):
>      -        if self.tok != '{' and not nested:
>      -            raise QAPIParseError(self, "expected '{'")
>      +    def get_expr(self):
>               if self.tok == '{':
>                   self.accept()
>                   expr = self.get_members()
> 
> * Shift it further, into expr.py:
> 
>     diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
>     index 496f7e0333..0a83c493a0 100644
>     --- a/scripts/qapi/expr.py
>     +++ b/scripts/qapi/expr.py
>     @@ -600,7 +600,10 @@ def check_exprs(exprs: List[_JSONObject]) -> List[_JSONObject]:
>          """
>          for expr_elem in exprs:
>              # Expression
>     -        assert isinstance(expr_elem['expr'], dict)
>     +        if not isinstance(expr_elem['expr'], dict):
>     +            raise QAPISemError(
>     +                info, "top-level expression must be an object")
>     +
>              for key in expr_elem['expr'].keys():
>                  assert isinstance(key, str)
>              expr: _JSONObject = expr_elem['expr']
> 
> Shifting it up would be closer to qapi-code-gen.txt than what we have
> now.
> 

This is also pretty nice, as it furthers the splitting of the JSON 
syntax from the abstract QAPI syntax, which is a distinct end-goal I have.

A slight downside is that the type of a value now needs to follow 
outside of parser.py, which will warrant a type name.

> All observations, no demands.
> 



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/22] qapi/parser: Don't try to handle file errors
  2021-04-27 13:47       ` Markus Armbruster
@ 2021-04-27 17:58         ` John Snow
  2021-04-28  5:48           ` Markus Armbruster
  0 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-04-27 17:58 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

On 4/27/21 9:47 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 4/23/21 11:46 AM, Markus Armbruster wrote:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> The short-ish version of what motivates this patch is:
>>>>
>>>> - The parser initializer does not possess adequate context to write a
>>>>     good error message -- It tries to determine the caller's semantic
>>>>     context.
>>>
>>> I'm not sure I get what you're trying to say here.
>>>
>>
>> I mean: this __init__ method does not *know* who is calling it or why.
>> Of course, *we* do, because the code base is finite and nobody else but
>> us is calling into it.
>>
>> I mean to point out that the initializer has to do extra work (Just a
>> little) to determine what the calling context is and raise an error
>> accordingly.
>>
>> Example: If we have a parent info context, we raise an error in the
>> context of the caller. If we don't, we have to create a new presumed
>> context (using the weird None SourceInfo object).
> 
> I guess you mean
> 
>              raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
> 
> I can't see other instances of messing with context.
> 

Yes, and the string construction that follows, too. It's all about 
trying to understand who our caller is and raising an error appropriate 
for them on their behalf.

>> So I just mean to say:
>>
>> "Let the caller, who unambiguously always has the exactly correct
>> context worry about what the error message ought to be."
>>
>>>> - We don't want to allow QAPISourceInfo(None, None, None) to exist.
>>>> - Errors made using such an object are currently incorrect.
>>>> - It's not technically a semantic error if we cannot open the schema
>>>> - There are various typing constraints that make mixing these two cases
>>>>     undesirable for a single special case.
>>>
>>> These I understand.
>>>
>>>> - The current open block in parser's initializer will leak file
>>>>     pointers, because it isn't using a with statement.
>>>
>>> Uh, isn't the value returned by open() reference-counted?  @fp is the
>>> only reference...
>>>
>>
>> Yeah, eventually. O:-)
>>
>> Whenever the GC runs. OK, it's not really an apocalypse error, but it
>> felt strange to rewrite a try/except and then write it using bad hygiene
>> on purpose in the name of a more isolated commit.
> 
> I agree use of with is an improvement (it's idiomatic).  We shouldn't
> call it a leak fix, though.
> 

OK. I'll reword it.

>>>> Here's the details in why this got written the way it did, and why a few
>>>> disparate issues are rolled into one commit. (They're hard to fix
>>>> separately without writing really weird stuff that'd be harder to
>>>> review.)
>>>>
>>>> The error message string here is incorrect:
>>>>
>>>>> python3 qapi-gen.py 'fake.json'
>>>> qapi-gen.py: qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>>>
>>> Regressed in commit 52a474180a "qapi-gen: Separate arg-parsing from
>>> generation" (v5.2.0).
>>>
>>
>> Mea Culpa. Didn't realize it wasn't tested, and I didn't realize at the
>> time that the two kinds of errors here were treated differently.
> 
> Our tests cover the schema language, not qapi-gen's CLI language.  The
> gap feels tolerable.
> 
>>> Before commit c615550df3 "qapi: Improve source file read error handling"
>>> (v4.2.0), it was differently bad (uncaught exception).
>>>
>>> Commit c615550df3 explains why the funny QAPISourceInfo exists:
>>>
>>>       Reporting open or read failure for the main schema file needs a
>>>       QAPISourceInfo representing "no source".  Make QAPISourceInfo cope
>>>       with fname=None.
>>>
>>
>> I am apparently not the first or the last person to dream of wanting a
>> QAPISourceInfo that represents "Actually, there's no source location!"
>>
>>> The commit turned QAPISourceInfo into the equivalent of a disjoint union
>>> of
>>>
>>> 1. A position in a source file (.fname is a str)
>>>
>>> 2. "Not in any source file" (.fname is None)
>>>
>>> This is somewhat similar to struct Location in C, which has
>>>
>>> 1. LOC_FILE: a position in a source file
>>>
>>> 2. LOC_CMDLINE: a range of command line arguments
>>>
>>> 3. LOC_NONE: no location information
>>>
>>> Abstracting locations this way lets error_report() do the right thing
>>> whether its complaining about the command line, a monitor command, or a
>>> configuration file read with -readconfig.
>>>
>>> Your patch demonstrates that qapi-gen has much less need for abstracting
>>> sources: we use 2. "Not in any source file" only for reading the main
>>> schema file.
>>>
>>
>> Yes. I got the impression that you didn't want to pursue more abstract
>> QSI constructs based on earlier work, so going the other way and
>> *removing* them seemed like the faster way to achieve a clean type
>> system here.
>>
>>>> In pursuing it, we find that QAPISourceInfo has a special accommodation
>>>> for when there's no filename.
>>>
>>> Yes:
>>>
>>>       def loc(self) -> str:
>>> -->     if self.fname is None:
>>> -->         return sys.argv[0]
>>>           ret = self.fname
>>>           if self.line is not None:
>>>               ret += ':%d' % self.line
>>>           return ret
>>>
>>>>                                 Meanwhile, we intend to type info.fname as
>>>> str; something we always have.
>>>
>>> Do you mean "as non-optional str"?
>>>
>>
>> Yeah. I typed it originally as `str`, but the analyzer missed that we
>> check the field to see if it's None, which is misleading.
>>
>>>> To remove this, we need to not have a "fake" QAPISourceInfo object. We
>>>
>>> We may well want to, but I doubt we *need* to.  There are almost
>>> certainly other ways to fix the bug.  I don't see a need to explore
>>> them, though.
>>>
>>
>> Either we build out the fake QSI into a proper subtype, or we remove it
>> -- those are the two obvious options. Building it out is almost
>> certainly more work than this patch.
>>
>>>> also don't want to explicitly begin accommodating QAPISourceInfo being
>>>> None, because we actually want to eventually prove that this can never
>>>> happen -- We don't want to confuse "The file isn't open yet" with "This
>>>> error stems from a definition that wasn't defined in any file".
>>>
>>> Yes, encoding both "poisoned source info not to be used with actual
>>> errors" and "'fake' source info not pointing to a source file" as None
>>> would be a mistake.
>>>
>>
>> :)
>>
>>>> (An earlier series tried to create an official dummy object, but it was
>>>> tough to prove in review that it worked correctly without creating new
>>>> regressions. This patch avoids trying to re-litigate that discussion.
>>>>
>>>> We would like to first prove that we never raise QAPISemError for any
>>>> built-in object before we relent and add "special" info objects. We
>>>> aren't ready to do that yet, so crashing is preferred.)
>>>>
>>>> So, how to solve this mess?
>>>>
>>>> Here's one way: Don't try to handle errors at a level with "mixed"
>>>> semantic levels; i.e. don't try to handle inclusion errors (should
>>>> report a source line where the include was triggered) with command line
>>>> errors (where we specified a file we couldn't read).
>>>>
>>>> Simply remove the error handling from the initializer of the
>>>> parser. Pythonic! Now it's the caller's job to figure out what to do
>>>> about it. Handle the error in QAPISchemaParser._include() instead, where
>>>> we do have the correct semantic context to not need to play games with
>>>> the error message generation.
>>>>
>>>> Next, to re-gain a nice error at the top level, add a new try/except
>>>> into qapi/main.generate(). Now the error looks sensible:
>>>
>>> Missing "again" after "sensible" ;-P
>>>
>>
>> okayokayokayfine
>>
>>>>
>>>>> python3 qapi-gen.py 'fake.json'
>>>> qapi-gen.py: can't read schema file 'fake.json': No such file or directory
>>>>
>>>> Lastly, with this usage gone, we can remove the special type violation
>>>> from QAPISourceInfo, and all is well with the world.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>    scripts/qapi/main.py   |  8 +++++++-
>>>>    scripts/qapi/parser.py | 18 +++++++++---------
>>>>    scripts/qapi/source.py |  3 ---
>>>>    3 files changed, 16 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/scripts/qapi/main.py b/scripts/qapi/main.py
>>>> index 703e7ed1ed5..70f8aa86f37 100644
>>>> --- a/scripts/qapi/main.py
>>>> +++ b/scripts/qapi/main.py
>>>> @@ -48,7 +48,13 @@ def generate(schema_file: str,
>>>>        """
>>>>        assert invalid_prefix_char(prefix) is None
>>>>    
>>>> -    schema = QAPISchema(schema_file)
>>>> +    try:
>>>> +        schema = QAPISchema(schema_file)
>>>> +    except OSError as err:
>>>> +        raise QAPIError(
>>>> +            f"can't read schema file '{schema_file}': {err.strerror}"
>>>> +        ) from err
>>>> +
>>>>        gen_types(schema, output_dir, prefix, builtins)
>>>>        gen_visit(schema, output_dir, prefix, builtins)
>>>>        gen_commands(schema, output_dir, prefix)
>>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>>> index ca5e8e18e00..b378fa33807 100644
>>>> --- a/scripts/qapi/parser.py
>>>> +++ b/scripts/qapi/parser.py
>>>> @@ -40,15 +40,9 @@ def __init__(self, fname, previously_included=None, incl_info=None):
>>>>            previously_included = previously_included or set()
>>>>            previously_included.add(os.path.abspath(fname))
>>>>    
>>>> -        try:
>>>> -            fp = open(fname, 'r', encoding='utf-8')
>>>> +        # Allow the caller to catch this error.
>>>
>>> "this error"?  I understand what you mean now, but I'm not sure I will
>>> in three months, when I won't have the context I have now.
>>>
>>
>> Yep, OK.
>>
>> # May raise OSError, allow the caller to handle it.
> 
> Okay.
> 
>>>> +        with open(fname, 'r', encoding='utf-8') as fp:
>>>>                self.src = fp.read()
>>>> -        except IOError as e:
>>>> -            raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
>>>> -                               "can't read %s file '%s': %s"
>>>> -                               % ("include" if incl_info else "schema",
>>>> -                                  fname,
>>>> -                                  e.strerror))
>>>>    
>>>>            if self.src == '' or self.src[-1] != '\n':
>>>>                self.src += '\n'
>>>> @@ -129,7 +123,13 @@ def _include(self, include, info, incl_fname, previously_included):
>>>>            if incl_abs_fname in previously_included:
>>>>                return None
>>>>    
>>>> -        return QAPISchemaParser(incl_fname, previously_included, info)
>>>> +        try:
>>>> +            return QAPISchemaParser(incl_fname, previously_included, info)
>>>> +        except OSError as err:
>>>> +            raise QAPISemError(
>>>> +                info,
>>>> +                f"can't read include file '{incl_fname}': {err.strerror}"
>>>> +            ) from err
>>>>    
>>>>        def _check_pragma_list_of_str(self, name, value, info):
>>>>            if (not isinstance(value, list)
>>>
>>> Before the patch, only IOError from open() and .read() get converted to
>>> QAPISemError, and therefore caught by main().
>>>
>>> The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.
>>>
>>
>> "Changed in version 3.3: EnvironmentError, IOError, WindowsError,
>> socket.error, select.error and mmap.error have been merged into OSError,
>> and the constructor may return a subclass."
>>
>>   >>> OSError == IOError
>> True
>>
>> (No, I didn't know this before I wrote it. I just intentionally wanted
>> to catch everything that open() might return, which I had simply assumed
>> was not fully captured by IOError. Better to leave it as OSError now to
>> avoid misleading anyone into thinking it's more narrow than it really is.)
> 
> Good to know.
> 
> However, I was talking about the code covered by try ... except OSError
> (or IOError, or whatever).  Before the patch, it's just open() and
> .read().  Afterwards it's all of .__init__().
> 

Apologies, I misread.

> Could anything else in .__init__() possibly raise OSError?  Probably
> not, but it's not trivially obvious.  Which makes me go "hmm."
> 
> "Hmm" isn't "no", it's just "hmm".
> 

Yeah, it is rather broad. That is one of the perils of doing *so much* 
at init() time, in my opinion.

We don't make any other syscalls in the parser though, so it should be 
fine. The docstring patch later documents the errors we expect to see 
here, so it becomes a visible part of the interface.

>>>> diff --git a/scripts/qapi/source.py b/scripts/qapi/source.py
>>>> index 03b6ede0828..1ade864d7b9 100644
>>>> --- a/scripts/qapi/source.py
>>>> +++ b/scripts/qapi/source.py
>>>> @@ -10,7 +10,6 @@
>>>>    # See the COPYING file in the top-level directory.
>>>>    
>>>>    import copy
>>>> -import sys
>>>>    from typing import List, Optional, TypeVar
>>>>    
>>>>    
>>>> @@ -53,8 +52,6 @@ def next_line(self: T) -> T:
>>>>            return info
>>>>    
>>>>        def loc(self) -> str:
>>>> -        if self.fname is None:
>>>> -            return sys.argv[0]
>>>>            ret = self.fname
>>>>            if self.line is not None:
>>>>                ret += ':%d' % self.line
>>>
>>> tests/qapi-schema/test-qapi.py also needs an update.  Before the patch:
>>>
>>>       $ PYTHONPATH=scripts python3 tests/qapi-schema/test-qapi.py nonexistent
>>>       tests/qapi-schema/test-qapi.py: can't read schema file 'nonexistent.json': No such file or directory
>>>
>>> After:
>>>
>>>       Traceback (most recent call last):
>>>         File "tests/qapi-schema/test-qapi.py", line 207, in <module>
>>>           main(sys.argv)
>>>         File "tests/qapi-schema/test-qapi.py", line 201, in main
>>>           status |= test_and_diff(test_name, dir_name, args.update)
>>>         File "tests/qapi-schema/test-qapi.py", line 129, in test_and_diff
>>>           test_frontend(os.path.join(dir_name, test_name + '.json'))
>>>         File "tests/qapi-schema/test-qapi.py", line 109, in test_frontend
>>>           schema = QAPISchema(fname)
>>>         File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
>>>           parser = QAPISchemaParser(fname)
>>>         File "/work/armbru/qemu/scripts/qapi/parser.py", line 44, in __init__
>>>           with open(fname, 'r', encoding='utf-8') as fp:
>>>       FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.json'
>>>
>>
>> Probably something that should be added to the actual battery of tests
>> somehow, yeah? I can't prevent regressions in invocations that don't get
>> run O:-)
> 
> I can, by reviewing patches ;-P
> 
> You're welcome to contribute automated tests covering qapi-gen and
> test-qapi invocations.  I never bothered automating this, and I wouldn't
> bother now.
> 

Yep, I am going to add a not-found test thanks to Paolo's help.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/22] qapi/parser: Don't try to handle file errors
  2021-04-27 17:58         ` John Snow
@ 2021-04-28  5:48           ` Markus Armbruster
  0 siblings, 0 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-04-28  5:48 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> On 4/27/21 9:47 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> On 4/23/21 11:46 AM, Markus Armbruster wrote:
>>>> John Snow <jsnow@redhat.com> writes:
>>>>
>>>>> The short-ish version of what motivates this patch is:
>>>>>
>>>>> - The parser initializer does not possess adequate context to write a
>>>>>     good error message -- It tries to determine the caller's semantic
>>>>>     context.
>>>>
>>>> I'm not sure I get what you're trying to say here.
>>>>
>>>
>>> I mean: this __init__ method does not *know* who is calling it or why.
>>> Of course, *we* do, because the code base is finite and nobody else but
>>> us is calling into it.
>>>
>>> I mean to point out that the initializer has to do extra work (Just a
>>> little) to determine what the calling context is and raise an error
>>> accordingly.
>>>
>>> Example: If we have a parent info context, we raise an error in the
>>> context of the caller. If we don't, we have to create a new presumed
>>> context (using the weird None SourceInfo object).
>> 
>> I guess you mean
>> 
>>              raise QAPISemError(incl_info or QAPISourceInfo(None, None, None),
>> 
>> I can't see other instances of messing with context.
>> 
>
> Yes, and the string construction that follows, too. It's all about 
> trying to understand who our caller is and raising an error appropriate 
> for them on their behalf.

I guess you can view it that way.  I never did.  My thinking was

    @fname either comes from a schema file (@incl_info is not None) or
    somewhere else.  If schema file, make the exception's __str__()
    start with "SCHEMA-FILE:LINE: ", because that's how compilers report
    errors in source files.  Else, make it start with just "PROGNAME: ",
    because that's how compilers report errors unrelated to source
    files.

This assumes "incl_info is None implies unrelated to source file".  I
think that's fair.  I don't think it rises to the level of
"understanding who our caller is".

>>> So I just mean to say:
>>>
>>> "Let the caller, who unambiguously always has the exactly correct
>>> context worry about what the error message ought to be."

[...]

>>>> Before the patch, only IOError from open() and .read() get converted to
>>>> QAPISemError, and therefore caught by main().
>>>>
>>>> The patch widen this to anywhere in QAPISchemaParser.__init__().  Hmm.
>>>>
>>>
>>> "Changed in version 3.3: EnvironmentError, IOError, WindowsError,
>>> socket.error, select.error and mmap.error have been merged into OSError,
>>> and the constructor may return a subclass."
>>>
>>>   >>> OSError == IOError
>>> True
>>>
>>> (No, I didn't know this before I wrote it. I just intentionally wanted
>>> to catch everything that open() might return, which I had simply assumed
>>> was not fully captured by IOError. Better to leave it as OSError now to
>>> avoid misleading anyone into thinking it's more narrow than it really is.)
>> 
>> Good to know.
>> 
>> However, I was talking about the code covered by try ... except OSError
>> (or IOError, or whatever).  Before the patch, it's just open() and
>> .read().  Afterwards it's all of .__init__().
>> 
>
> Apologies, I misread.
>
>> Could anything else in .__init__() possibly raise OSError?  Probably
>> not, but it's not trivially obvious.  Which makes me go "hmm."
>> 
>> "Hmm" isn't "no", it's just "hmm".
>> 
>
> Yeah, it is rather broad. That is one of the perils of doing *so much* 
> at init() time, in my opinion.

It's not ideal, but then having to write something like

    parser = QAPISchemaParser(fname).parse()

or

    parser = QAPISchemaParser().parse(fname)

instead of just

    parser = QAPISchemaParser(fname)

would be less than ideal, too.

> We don't make any other syscalls in the parser though, so it should be 
> fine. The docstring patch later documents the errors we expect to see 
> here, so it becomes a visible part of the interface.

[...]



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/22] qapi/parser: Fix typing of token membership tests
  2021-04-27  7:00       ` Markus Armbruster
@ 2021-05-04  1:01         ` John Snow
  2021-05-05  6:29           ` Markus Armbruster
  0 siblings, 1 reply; 67+ messages in thread
From: John Snow @ 2021-05-04  1:01 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

On 4/27/21 3:00 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 4/25/21 3:59 AM, Markus Armbruster wrote:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> When the token can be None, we can't use 'x in "abc"' style membership
>>>> tests to group types of tokens together, because 'None in "abc"' is a
>>>> TypeError.
>>>>
>>>> Easy enough to fix, if not a little ugly.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>    scripts/qapi/parser.py | 5 +++--
>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>>> index 7f3c009f64b..16fd36f8391 100644
>>>> --- a/scripts/qapi/parser.py
>>>> +++ b/scripts/qapi/parser.py
>>>> @@ -272,7 +272,7 @@ def get_values(self):
>>>>            if self.tok == ']':
>>>>                self.accept()
>>>>                return expr
>>>> -        if self.tok not in "{['tf":
>>>> +        if self.tok is None or self.tok not in "{['tf":
>>>>                raise QAPIParseError(
>>>>                    self, "expected '{', '[', ']', string, or boolean")
>>>>            while True:
>>>> @@ -294,7 +294,8 @@ def get_expr(self, nested):
>>>>            elif self.tok == '[':
>>>>                self.accept()
>>>>                expr = self.get_values()
>>>> -        elif self.tok in "'tf":
>>>> +        elif self.tok and self.tok in "'tf":
>>>> +            assert isinstance(self.val, (str, bool))
>>>>                expr = self.val
>>>>                self.accept()
>>>>            else:
>>>
>>> How can self.tok be None?
>>>
>>> I suspect this is an artifact of PATCH 04.  Before, self.tok is
>>> initialized to the first token, then set to subsequent tokens (all str)
>>> in turn.  After, it's initialized to None, then set to tokens in turn.
>>>
>>
>> Actually, it's set to None to represent EOF. See here:
>>
>>               elif self.tok == '\n':
>> 	        if self.cursor == len(self.src):
>>                       self.tok = None
>>                       return
> 
> Alright, then this is actually a bug fix:
> 
>      $ echo -n "{'key': " | python3 scripts/qapi-gen.py /dev/stdin
>      Traceback (most recent call last):
>        File "scripts/qapi-gen.py", line 19, in <module>
>          sys.exit(main.main())
>        File "/work/armbru/qemu/scripts/qapi/main.py", line 93, in main
>          generate(args.schema,
>        File "/work/armbru/qemu/scripts/qapi/main.py", line 50, in generate
>          schema = QAPISchema(schema_file)
>        File "/work/armbru/qemu/scripts/qapi/schema.py", line 852, in __init__
>          parser = QAPISchemaParser(fname)
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 59, in __init__
>          self._parse()
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 81, in _parse
>          expr = self.get_expr(False)
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 293, in get_expr
>          expr = self.get_members()
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 260, in get_members
>          expr[key] = self.get_expr(True)
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 297, in get_expr
>          elif self.tok in "'tf":
>      TypeError: 'in <string>' requires string as left operand, not NoneType
> 
> Likewise, the other hunk:
> 
>      $ echo -n "{'key': [" | python3 scripts/qapi-gen.py /dev/stdin
>      Traceback (most recent call last):
>        File "scripts/qapi-gen.py", line 19, in <module>
>          sys.exit(main.main())
>        File "/work/armbru/qemu/scripts/qapi/main.py", line 89, in main
>          generate(args.schema,
>        File "/work/armbru/qemu/scripts/qapi/main.py", line 51, in generate
>          schema = QAPISchema(schema_file)
>        File "/work/armbru/qemu/scripts/qapi/schema.py", line 860, in __init__
>          parser = QAPISchemaParser(fname)
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 71, in __init__
>          expr = self.get_expr(False)
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 270, in get_expr
>          expr = self.get_members()
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 238, in get_members
>          expr[key] = self.get_expr(True)
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 273, in get_expr
>          expr = self.get_values()
>        File "/work/armbru/qemu/scripts/qapi/parser.py", line 253, in get_values
>          if self.tok not in "{['tf":
>      TypeError: 'in <string>' requires string as left operand, not NoneType
> 
> Please add test cases.  I recommend adding them in a separate patch, so
> this one's diff shows clearly what's being fixed.
> 

Can't, again: because it's a crash, the test runner explodes.

Two choices, because I won't finish respinning this tonight:

(1) Amend the test runner to print generic exceptions using str(err), 
without the stack trace -- so we can check for crashes using the diffs 
-- again in its own commit.

(2) Just squish the tests and error messages into this commit like I did 
for the other crash fix I checked in.

I'd normally leap for #1, but you seem to have some affinity for 
allowing unpredictable things to explode very violently, so I am not sure.

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/22] qapi/parser: Fix typing of token membership tests
  2021-05-04  1:01         ` John Snow
@ 2021-05-05  6:29           ` Markus Armbruster
  0 siblings, 0 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-05-05  6:29 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

John Snow <jsnow@redhat.com> writes:

> On 4/27/21 3:00 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> On 4/25/21 3:59 AM, Markus Armbruster wrote:

[...]

>> Please add test cases.  I recommend adding them in a separate patch, so
>> this one's diff shows clearly what's being fixed.
>> 
>
> Can't, again: because it's a crash, the test runner explodes.
>
> Two choices, because I won't finish respinning this tonight:
>
> (1) Amend the test runner to print generic exceptions using str(err), 
> without the stack trace -- so we can check for crashes using the diffs 
> -- again in its own commit.
>
> (2) Just squish the tests and error messages into this commit like I did 
> for the other crash fix I checked in.
>
> I'd normally leap for #1, but you seem to have some affinity for 
> allowing unpredictable things to explode very violently, so I am not sure.

I love violent explosions.  Don't we all, as long as they're just bits?

(2) is fine.

If you'd like to provide for committing tests that currently explode:
the issue preventing it is insufficiently normalized output of
test-qapi.py.  test-qapi.py normalizes error messages (see except
QAPIError in test_and_diff()), but not tracebacks.

Omitting the tracebacks is an obvious and easy way to normalize.  But it
makes getting at the traceback harder: I need to know / remember how to
run the test by hand, without the normalization.  The cure seems worse
than the disease here.

To avoid the drawback, we'd need a simple and obvious way to run the
test so it shows the traceback.

Again, (2) is fine.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard
  2021-04-27  7:15       ` Markus Armbruster
@ 2021-05-05 19:09         ` John Snow
  0 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-05-05 19:09 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

On 4/27/21 3:15 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 4/25/21 8:32 AM, Markus Armbruster wrote:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> TypeGuards wont exist in Python proper until 3.10. Ah well. We can hack
>>>> up our own by declaring this function to return the type we claim it
>>>> checks for and using this to safely downcast object -> List[str].
>>>>
>>>> In so doing, I bring this function in-line under _pragma so it can use
>>>> the 'info' object in its closure. Having done this, _pragma also now
>>>> no longer needs to take a 'self' parameter, so drop it.
>>>>
>>>> Rename it to just _check(), to help us out with the line-length -- and
>>>> now that it's contained within _pragma, it is contextually easier to see
>>>> how it's used anyway -- especially with types.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>
>>>> ---
>>>>
>>>> I left (name, value) as args to avoid creating a fully magic "macro",
>>>> though, I thought this was too weird:
>>>>
>>>>       info.pragma.foobar = _check()
>>>>
>>>> and it looked more reasonable as:
>>>>
>>>>       info.pragma.foobar = _check(name, value)
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>    scripts/qapi/parser.py | 26 +++++++++++++-------------
>>>>    1 file changed, 13 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>>> index 16fd36f8391..d02a134aae9 100644
>>>> --- a/scripts/qapi/parser.py
>>>> +++ b/scripts/qapi/parser.py
>>>> @@ -17,6 +17,7 @@
>>>>    from collections import OrderedDict
>>>>    import os
>>>>    import re
>>>> +from typing import List
>>>>    
>>>>    from .common import match_nofail
>>>>    from .error import QAPISemError, QAPISourceError
>>>> @@ -151,28 +152,27 @@ def _include(include, info, incl_fname, previously_included):
>>>>                ) from err
>>>>    
>>>>        @staticmethod
>>>> -    def _check_pragma_list_of_str(name, value, info):
>>>> -        if (not isinstance(value, list)
>>>> -                or any([not isinstance(elt, str) for elt in value])):
>>>> -            raise QAPISemError(
>>>> -                info,
>>>> -                "pragma %s must be a list of strings" % name)
>>>> +    def _pragma(name, value, info):
>>>> +
>>>> +        def _check(name, value) -> List[str]:
>>>> +            if (not isinstance(value, list) or
>>>> +                    any([not isinstance(elt, str) for elt in value])):
>>>> +                raise QAPISemError(
>>>> +                    info,
>>>> +                    "pragma %s must be a list of strings" % name)
>>>> +            return value
>>>>    
>>>> -    def _pragma(self, name, value, info):
>>>>            if name == 'doc-required':
>>>>                if not isinstance(value, bool):
>>>>                    raise QAPISemError(info,
>>>>                                       "pragma 'doc-required' must be boolean")
>>>>                info.pragma.doc_required = value
>>>>            elif name == 'command-name-exceptions':
>>>> -            self._check_pragma_list_of_str(name, value, info)
>>>> -            info.pragma.command_name_exceptions = value
>>>> +            info.pragma.command_name_exceptions = _check(name, value)
>>>>            elif name == 'command-returns-exceptions':
>>>> -            self._check_pragma_list_of_str(name, value, info)
>>>> -            info.pragma.command_returns_exceptions = value
>>>> +            info.pragma.command_returns_exceptions = _check(name, value)
>>>>            elif name == 'member-name-exceptions':
>>>> -            self._check_pragma_list_of_str(name, value, info)
>>>> -            info.pragma.member_name_exceptions = value
>>>> +            info.pragma.member_name_exceptions = _check(name, value)
>>>>            else:
>>>>                raise QAPISemError(info, "unknown pragma '%s'" % name)
>>>
>>> While I appreciate the terseness, I'm not sure I like the generic name
>>> _check() for checking one of two special cases, namely "list of string".
>>> The other case being "boolean".  We could acquire more cases later.
>>>
>>
>> Yeah, sorry, just trying to make the line fit ...
> 
> I understand!
> 
>> The important thing is that we need to make sure this routine returns
>> some known type. It's just that the block down here has very long lines.
>>
>> Recommendations?
> 
> Moving the helper into _pragma() lets us drop shorten its name.  Still
> too long to fit the line:
> 
>              info.pragma.command_returns_exceptions = check_list_str(name, value)
> 
> We could break the line in the argument list:
> 
>              info.pragma.command_returns_exceptions = check_list_str(name,
>                                                                      value)
> 
> or
> 
>              info.pragma.command_returns_exceptions = check_list_str(
>                                                              name, value)
> 
> Not exactly pretty.
> 
> We could shorten the assignment's target:
> 
>              pragma.command_returns_exceptions = check_list_str(name, value)
> 
> with
> 
>          pragma.info = pragma
> 

🙃

> before the conditional.  I'm not too fond of creating aliases, but this
> one looks decent to me.  What do you think?
> 

If it works for you, it works for me!

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/22] qapi/parser: add type hint annotations
  2021-04-22  3:07 ` [PATCH 12/22] qapi/parser: add type hint annotations John Snow
  2021-04-25 12:34   ` Markus Armbruster
@ 2021-05-06  1:27   ` John Snow
  1 sibling, 0 replies; 67+ messages in thread
From: John Snow @ 2021-05-06  1:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael Roth, Markus Armbruster, Eduardo Habkost, Cleber Rosa

On 4/21/21 11:07 PM, John Snow wrote:
> +        self.exprs: List[Expression] = []

I did indeed intend to use Expression to mean TopLevelExpr ... However, 
in this case, that's not what actually gets stored here.

I tricked myself!

This stores the dict that associates 'expr', 'doc' and 'info'.

Fixing it to be the generic Dict[str, object] removes the last usage of 
TopLevelExpr from parser.py ... for now.

(pt5c, optional parser cleanups, uses a stronger type for parser's 
return type and sees the reintroduction of that type.)

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/22] qapi/parser: add type hint annotations
  2021-04-27  8:43       ` Markus Armbruster
@ 2021-05-06  1:49         ` John Snow
  0 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-05-06  1:49 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

On 4/27/21 4:43 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 4/25/21 8:34 AM, Markus Armbruster wrote:
>>> value: object isn't wrong, but why not _ExprValue?
>>>
>>
>> Updated excuse:
>>
>> because all the way back outside in _parse, we know that:
>>
>> 1. expr is a dict (because of get_expr(False))
>> 2. expr['pragma'] is also a dict, because we explicitly check it there.
> 
> Yes:
> 
>                  pragma = expr['pragma']
> -->             if not isinstance(pragma, dict):
> -->                 raise QAPISemError(
> -->                     info, "value of 'pragma' must be an object")
>                  for name, value in pragma.items():
>                      self._pragma(name, value, info)
> 
>> 3. We iterate over the keys; all we know so far is that the values are
>> ... something.
> 
> Actually, *we* know more about the values.  get_expr() returns a tree
> whose inner nodes are dict or list, and whose leaves are str or bool.
> Therefore, the values are dict, list, str, or bool.
> 
> It's *mypy* that doesn't know, because it lacks recursive types.
> 
> I know that you're prbably using "we" in the sense of "the toolchain".
> I'm annoying you with the difference between "the toolchain" and "we
> (you, me, and other humans) because I'm concerned about us humans
> dumbing ourselves down to mypy's level of understanding.
> 

Put in a gentler way: The risk is that type annotations that assume less 
because they *must* assume less will potentially miscommunicate the 
reality of the interface to future developers.

I agree, that is a genuine risk.

but ...

> To be honest, I'm less and less sure typing these trees without the
> necessary typing tools is worth the bother.  The notational overhead it
> more oppressive than elsewhere, and yet the typing remains weak.  The
> result fails to satisfy, and that's a constant source of discussions
> (between us as well as just in my head) on how to best mitigate.
> 

... What's the alternative? I still think strict typing has strong 
benefits -- it's found a few bugs, albeit small. It offers good 
refactoring assurance and can help communicate the expected types in an 
interface *very* quickly.

Whenever I type something as Dict[str, object] that is my genuine 
attempt at just cutting my losses and saying "It gets ... something. 
Figure it out, like you did before Python 3.6."

I could use 'Any', but that really just effectively shuts the checker 
off. You could pass <Lasagna> to the interface and mypy won't flinch.

Dict[str, object] at least enforces:

- It must be a dict
- 100% of its keys must be strings
- You cannot safely do anything with its values until you interrogate 
them at runtime

...And I think that's perfectly accurate. I tried too hard to accurately 
type introspect.py, and I am avoiding repeating that mistake.

>> 4. _pragma()'s job is to validate the type(s) anyway.
> 
> _pragma() can safely assume @value is dict, list, str, or bool.  It just
> happens not to rely on this assumption.
> 

Correct. Though, there's not too many operations that dict/list/str/bool 
all share, so you're going to be interrogating these types at runtime 
anyway.

Really, just about everything they share as an interface is probably 
perfectly summed up by the python object type.

So ... I dunno. I share your frustrations at the lack of expressiveness 
in recursive types, and it has been a major bummer while working on ... 
a recursive expression parser.

* abandons series *

/s

>> More or less, the _ExprValue type union isn't remembered here -- even
>> though it was once upon a time something returned by get_expr, it
>> happened in a nested call that is now opaque to mypy in this context.
> 
> Understand.
> 
>> So, it's some combination of "That's all we know about it" and "It
>> happens to be exactly sufficient for this function to operate."
> 
> I can accept "it's all mypy can figure out by itself, and it's good
> enough to get the static checking we want".
> 

Yep. I think the typing of this particular interface is as good as it 
can be for the moment, so I recommend leaving it as Dict[str, object].

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 16/22] qapi/parser: add docstrings
  2021-04-27  9:03       ` Markus Armbruster
@ 2021-05-06  2:08         ` John Snow
  0 siblings, 0 replies; 67+ messages in thread
From: John Snow @ 2021-05-06  2:08 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/27/21 5:03 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 4/25/21 9:27 AM, Markus Armbruster wrote:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>
>>>> ---
>>>>
>>>> My hubris is infinite.
>>>
>>> Score one of the three principal virtues of a programmer ;)
>>>
>>
>> It was written before the prior review, but I promise I am slowing down
>> on adding these. I just genuinely left them to help remind myself how
>> these modules are actually structured and work so that I will be able to
>> "pop in" quickly in the future and make a tactical, informed edit.
>>
>>>> OK, I only added a few -- to help me remember how the parser works at a glance.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>    scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 66 insertions(+)
>>>>
>>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>>> index dbbd0fcbc2f..8fc77808ace 100644
>>>> --- a/scripts/qapi/parser.py
>>>> +++ b/scripts/qapi/parser.py
>>>> @@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
>>>>    
>>>>    
>>>>    class QAPISchemaParser:
>>>> +    """
>>>> +    Performs parsing of a QAPI schema source file.
>>>
>>> Actually, this parses one of two layers, see qapi-code-gen.txt section
>>> "Schema syntax".  Pointing there might help.
>>>
>>
>> It sort of parses one-and-a-half layers, but yes ... I know the
>> distinction you're drawing here. This is *mostly* the JSON/AST level.
>>
>> (With some upper-level or mid-level parsing for Pragmas and Includes.)
> 
> True.  I chose simplicity over purity.
> 
>>>>    
>>>> +    :param fname: Path to the source file
>>>
>>> Either "Source file name" or "Source pathname", please.  I prefer "file
>>> name" for additional distance to "path" in the sense of a search path,
>>> i.e. a list of directory names.
>>>
>>
>> OK, I am not sure I have any ... prejudice about when to use which kind
>> of description for these sorts of things. I'm happy to defer to you, but
>> if there's some kind of existing standard vocabulary I'm trampling all
>> over, feel free to point me to your preferred hacker dictionary.
>>
>> Anyway, happy to adopt your phrasing here.
>>
>>>> +    :param previously_included:
>>>> +        The absolute paths of previously included source files.
>>>
>>> Either "absolute file name" or "absulute pathname".
>>>
>>
>> OK.
>>
>>>> +        Only used by recursive calls to avoid re-parsing files.
>>>
>>> Feels like detail, not sure it's needed here.
>>>
>>
>> You're probably right, but I suppose I wanted to hint/suggest that it
>> was not necessary to feed it this argument for the root schema, but it
>> was crucial for the recursive calls.
> 
> To me "if root schema, then nothing was previously included" feels
> obvious enough :)  But if you want to spell out proper use of the
> parameter, I recommend to stick to the interface, i.e. when to pass it,
> not what the function does with it (in the hope that the reader can
> then guess when to pass it).
> 
>> (Earlier I mentioned possibly just passing the parent parser in: that
>> helps eliminate some of this ambiguity, too.)
>>
>>>> +    :param incl_info:
>>>> +       `QAPISourceInfo` for the parent document.
>>>> +       This may be None if this is the root schema document.
>>>
>>> Recommend s/This maybe //.
>>>
>>> qapi-code-gen.txt calls a QAPI schema that uses include directives
>>> "modular", and the included files "sub-modules".  s/root schema
>>> document/root module/?
>>>
>>
>> Sure. All in favor of phrasing consistency.
>>
>> (By the way: I did write up a draft for converting qapi-code-gen.txt to
>> ReST format, and if I had finished that, it might be nice to hotlink to
>> it here. I stopped for now because I wanted to solidify some conventions
>> on how to markup certain constructs first, and wanted ... not to
>> overwhelm you with more doc-wrangling.)
> 
> Appreciated :)
> 
>>>> +
>>>> +    :ivar exprs: Resulting parsed expressions.
>>>> +    :ivar docs: Resulting parsed documentation blocks.
>>>
>>> Uh, why are these here?  A doc string is interface documentation...
>>>
>>
>> These *are* interface. It is how callers are expected to get the results
>> of parsing.
> 
> You're right, but is the constructor the right place to document
> attributes?
> 

This is the docstring for the class, actually.

https://www.python.org/dev/peps/pep-0257/ says:

"The docstring for a class should summarize its behavior and list the 
public methods and instance variables. If the class is intended to be 
subclassed, and has an additional interface for subclasses, this 
interface should be listed separately (in the docstring). The class 
constructor should be documented in the docstring for its __init__ 
method. Individual methods should be documented by their own docstring."

So that's where parameters for the init method goes, as well as class 
and instance variables.

One-stop shop for interface documentation.

>> We could change that, of course, but that is absolutely how this class
>> works today.
>>
>>>> +
>>>> +    :raise OSError: For problems opening the root schema document.
>>>> +    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
>>>> +    :raise QAPISemError: For various semantic issues with the schema.
>>>
>>> Should callers care for the difference between QAPIParseError and
>>> QAPISemError?
>>>
>>
>> That's up to the caller, I suppose. I just dutifully reported the truth
>> of the matter here.
>>
>> (That's a real non-answer, I know.)
>>
>> I could always document QAPISourceError instead, with a note about the
>> subclasses used for completeness.
>>
>> (The intent is that QAPIError is always assumed/implied to be sufficient
>> for capturing absolutely everything raised directly by this package, if
>> you want to ignore the meanings behind them.)
> 
> I honestly can't think of a reason for catching anything but QAPIError.
> The other classes exist only to give us more convenient ways to
> construct instances of QAPIError.  We could replace them all by
> functions returning QAPIError.
> 

Summary it is.

>>>> +    """
>>>>        def __init__(self,
>>>>                     fname: str,
>>>>                     previously_included: Optional[Set[str]] = None,
>>>> @@ -77,6 +94,11 @@ def __init__(self,
>>>>            self._parse()
>>>>    
>>>>        def _parse(self) -> None:
>>>> +        """
>>>> +        Parse the QAPI schema document.
>>>> +
>>>> +        :return: None; results are stored in ``exprs`` and ``docs``.
>>>
>>> Another ignorant doc string markup question...  how am I supposed to see
>>> that exprs and docs are attributes, and not global variables?
>>>
>>
>> I don't know, it's an unsolved mystery for me too. I need more time in
>> the Sphinx dungeon to figure out how this stuff is supposed to work.
>> You're right to wonder.
> 
> Use self.exprs and self.docs meanwhile?
> 

If I don't accidentally trip and fall and decide to care more about it 
by the time I finish revising the docs tomorrow, yes.

>>>> +        """
>>>>            cur_doc = None
>>>>    
>>>>            with open(self._fname, 'r', encoding='utf-8') as fp:
>>>> @@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
>>>>                raise QAPISemError(info, "unknown pragma '%s'" % name)
>>>>    
>>>>        def accept(self, skip_comment: bool = True) -> None:
>>>> +        """
>>>> +        Read the next lexeme and process it into a token.
>>>> +
>>>> +        :Object state:
>>>> +          :tok: represents the token type. See below for values.
>>>> +          :pos: is the position of the first character in the lexeme.
>>>> +          :cursor: is the position of the next character.
>>>
>>> Define "position" :)  It's an index in self.src.
>>>
>>
>> Good call.
>>
>>> self.cursor and self.pos are not used outside accept().  Not sure thet
>>> belong into interface documentation.
>>>
>>
>> Fair point, though I was on a mission to document exactly how the parser
>> works even at the internal level, because accept(), despite being
>> "public", is really more of an internal function here.
>>
>> I am somewhat partial to documenting these state variables for my own
>> sake so that I can remember the way this lexer behaves.
> 
> I understand why you want to document how they work.  Since they're
> internal to accept(), a comment in accept() seems more proper than
> accept() doc string.  Admittedly doesn't matter that much, as accept()
> is internal to the class.
> 

OK, I'll take it into consideration and see what subjectively looks and 
feels the nicest.

>>>> +          :val: is the variable value of the token, if any.
>>>
>>> Missing: self.info, which *is* used outside accept().
>>>
>>
>> Oh, yes.
>>
>>>> +
>>>> +        Single-character tokens:
>>>> +
>>>> +        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
>>>> +        ``LSQB``, and ``RSQB``.
>>>
>>> "These include ..." is misleading.  This is the complete list of
>>> single-character tokens.
>>>
>>
>> I'm just testing your ability to recognize the difference between proper
>> and improper subsets.
>>
>> (Joking. I'll reword to avoid that ambiguity.)
>>
>>>> +        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
>>>> +        lexeme.  ``val`` is ``None``.
>>>> +
>>>> +        Multi-character tokens:
>>>> +
>>>> +        - ``COMMENT``:
>>>> +
>>>> +          - This token is not normally yielded by the lexer, but it
>>>> +            can be when ``skip_comment`` is False.
>>>> +          - ``tok`` is the value ``"#"``.
>>>> +          - ``val`` is a string including all chars until end-of-line.
>>>> +
>>>> +        - ``STRING``:
>>>> +
>>>> +          - ``tok`` is the ``"'"``, the single quote.
>>>> +          - ``value`` is the string, *excluding* the quotes.
>>>> +
>>>> +        - ``TRUE`` and ``FALSE``:
>>>> +
>>>> +          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
>>>> +          - ``val`` is either ``True`` or ``False`` accordingly.
>>>> +
>>>> +        - ``NEWLINE`` and ``SPACE``:
>>>> +
>>>> +          - These are consumed by the lexer directly. ``line_pos`` and
>>>> +            ``info`` are advanced when ``NEWLINE`` is encountered.
>>>> +            ``tok`` is set to ``None`` upon reaching EOF.
>>>> +
>>>> +        :param skip_comment:
>>>> +            When false, return ``COMMENT`` tokens.
>>>> +            This is used when reading documentation blocks.
>>>
>>> The doc string mostly describes possible state on return of accept().
>>> *Within* accept(), self.tok may be any character.
>>>
>>> "Mostly" because item ``NEWLINE`` and ``SPACE`` is about something that
>>> happens within accept().
>>>
>>
>> Almost kinda-sorta. The value of "tok" is important there, too.
> 
> --verbose?
> 

Fair enough. I'll trim it down. There is some future bleed from some 
experimental stuff I cut out here.

(It's been banished to some realm even further beyond pt5c, the 
oft-feared but seldom-mentioned pt7. Spoken of in frightened whispers, 
leading QAPI scholars are as of yet unable to confirm it truly exists.)

>>> Perhaps phrasing it as a postcondition would be clearer:
>>>
>>>       Read and store the next token.
>>>
>>>       On return, self.tok is the token type, self.info is describes its
>>>       source location, and self.value is the token's value.
>>>
>>>       The possible token types and their values are
>>>
>>>       ...
>>>
>>
>> OK, I will play with this suggestion while I try to clean up the docs.
>>
>>>> +        """
>>>>            while True:
>>>>                self.tok = self.src[self.cursor]
>>>>                self.pos = self.cursor
>>
>> Thanks for taking a look at this one.
> 
> Thank *you* for documenting my[*] code!
> 
> 
> [*] Some of it mine in the sense I wrote it, some of it mine in the
> sense I maintain it.
> 
> 

I assure you it's entirely selfish. I have the memory of a goldfish and 
the docs I wrote myself here have *already* come in handy for reminding 
myself what's going on in here.

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 16/22] qapi/parser: add docstrings
  2021-04-25 13:27   ` Markus Armbruster
  2021-04-26 18:26     ` John Snow
@ 2021-05-07  1:34     ` John Snow
  2021-05-07  8:25       ` Markus Armbruster
  1 sibling, 1 reply; 67+ messages in thread
From: John Snow @ 2021-05-07  1:34 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Michael Roth, qemu-devel, Eduardo Habkost, Cleber Rosa

On 4/25/21 9:27 AM, Markus Armbruster wrote:
> Another ignorant doc string markup question...  how am I supposed to see
> that exprs and docs are attributes, and not global variables?
> 

The syntax is apparently supposed to be :py:attr:`MyClass.attr`. Though, 
it doesn't seem to be working for me. I can write :py:attr:`bzbxglkdsgl` 
and the build succeeds. I gotta hunch:

Sphinx was designed to parse ReST written by hand. The " .. py:method::" 
directives are ones you'd use when using sphinx in that style. Those 
directives are what create an object in Sphinx's cross-reference system. 
Later, if you use :py:meth:`foo`, it references that specific object.

Sphinx autodoc is a system that parses your code and automatically 
generates py:method:: and py:class:: directives for you, allowing the 
reference syntax to work.

MY HUNCH is that for field list markup within a docstring -- things like 
:ivar: -- that there is not any corresponding object being created, 
rendering cross-references for things at that scope when using autodoc 
ineffective.

BOO, BOO, A THOUSAND TIMES BOO TO THIS.

Argh, yep.

If I use:

     .. py:attribute:: exprs 

 

         Resulting parsed expressions. 


instead of

:ivar exprs: Resulting parsed expressions

then the syntax :attr:`qapi.parser.QAPISchemaParser.exprs` does resolve 
into a clickable hyperlink on the rendered output.

  ____   ___   ___   ___  _
| __ ) / _ \ / _ \ / _ \| |
|  _ \| | | | | | | | | | |
| |_) | |_| | |_| | |_| |_|
|____/ \___/ \___/ \___/(_)


Sigh. Well, while I'm here doing the research and talking to myself, the 
syntax :attr:`exprs` also works when you have the target defined. It 
doesn't have to be as verbose. With my testing setup of using the 
default role of "any", even just `exprs` works.

I wonder if there's the possibility of having sphinx enhance :ivar: and 
:cvar: to automatically create the same kind of reference target as 
py:attribute:: does.

Problems for later.

For now ...

``.exprs`` and ``.docs``?

--js



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 16/22] qapi/parser: add docstrings
  2021-05-07  1:34     ` John Snow
@ 2021-05-07  8:25       ` Markus Armbruster
  0 siblings, 0 replies; 67+ messages in thread
From: Markus Armbruster @ 2021-05-07  8:25 UTC (permalink / raw)
  To: John Snow; +Cc: Michael Roth, Cleber Rosa, qemu-devel, Eduardo Habkost

John Snow <jsnow@redhat.com> writes:

> On 4/25/21 9:27 AM, Markus Armbruster wrote:
>> Another ignorant doc string markup question...  how am I supposed to see
>> that exprs and docs are attributes, and not global variables?
>> 
>
> The syntax is apparently supposed to be :py:attr:`MyClass.attr`. Though, 
> it doesn't seem to be working for me. I can write :py:attr:`bzbxglkdsgl` 
> and the build succeeds. I gotta hunch:

[Problems...]

>
>   ____   ___   ___   ___  _
> | __ ) / _ \ / _ \ / _ \| |
> |  _ \| | | | | | | | | | |
> | |_) | |_| | |_| | |_| |_|
> |____/ \___/ \___/ \___/(_)
>
>
> Sigh. Well, while I'm here doing the research and talking to myself, the 
> syntax :attr:`exprs` also works when you have the target defined. It 
> doesn't have to be as verbose. With my testing setup of using the 
> default role of "any", even just `exprs` works.
>
> I wonder if there's the possibility of having sphinx enhance :ivar: and 
> :cvar: to automatically create the same kind of reference target as 
> py:attribute:: does.
>
> Problems for later.
>
> For now ...
>
> ``.exprs`` and ``.docs``?

Works for me.



^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2021-05-07  8:26 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22  3:06 [PATCH 00/22] qapi: static typing conversion, pt5a John Snow
2021-04-22  3:06 ` [PATCH 01/22] qapi/parser: Don't try to handle file errors John Snow
2021-04-23 15:46   ` Markus Armbruster
2021-04-23 19:20     ` John Snow
2021-04-27 13:47       ` Markus Armbruster
2021-04-27 17:58         ` John Snow
2021-04-28  5:48           ` Markus Armbruster
2021-04-22  3:07 ` [PATCH 02/22] qapi/source: [RFC] add "with_column" contextmanager John Snow
2021-04-27  9:33   ` Markus Armbruster
2021-04-22  3:07 ` [PATCH 03/22] qapi/source: Remove line number from QAPISourceInfo initializer John Snow
2021-04-24  6:38   ` Markus Armbruster
2021-04-26 17:39     ` John Snow
2021-04-26 23:14     ` John Snow
2021-04-27  6:07       ` Markus Armbruster
2021-04-22  3:07 ` [PATCH 04/22] qapi/parser: factor parsing routine into method John Snow
2021-04-22  3:07 ` [PATCH 05/22] qapi/parser: Assert lexer value is a string John Snow
2021-04-24  8:33   ` Markus Armbruster
2021-04-26 17:43     ` John Snow
2021-04-27 12:30       ` Markus Armbruster
2021-04-27 13:58         ` John Snow
2021-04-22  3:07 ` [PATCH 06/22] qapi/parser: assert get_expr returns object in outer loop John Snow
2021-04-25  7:23   ` Markus Armbruster
2021-04-27 15:03     ` John Snow
2021-04-22  3:07 ` [PATCH 07/22] qapi/parser: assert object keys are strings John Snow
2021-04-25  7:27   ` Markus Armbruster
2021-04-26 17:46     ` John Snow
2021-04-27  6:13       ` Markus Armbruster
2021-04-27 14:15         ` John Snow
2021-04-22  3:07 ` [PATCH 08/22] qapi/parser: Use @staticmethod where appropriate John Snow
2021-04-22  3:07 ` [PATCH 09/22] qapi: add match_nofail helper John Snow
2021-04-25  7:54   ` Markus Armbruster
2021-04-26 17:48     ` John Snow
2021-04-22  3:07 ` [PATCH 10/22] qapi/parser: Fix typing of token membership tests John Snow
2021-04-25  7:59   ` Markus Armbruster
2021-04-26 17:51     ` John Snow
2021-04-27  7:00       ` Markus Armbruster
2021-05-04  1:01         ` John Snow
2021-05-05  6:29           ` Markus Armbruster
2021-04-22  3:07 ` [PATCH 11/22] qapi/parser: Rework _check_pragma_list_of_str as a TypeGuard John Snow
2021-04-25 12:32   ` Markus Armbruster
2021-04-26 23:48     ` John Snow
2021-04-27  7:15       ` Markus Armbruster
2021-05-05 19:09         ` John Snow
2021-04-22  3:07 ` [PATCH 12/22] qapi/parser: add type hint annotations John Snow
2021-04-25 12:34   ` Markus Armbruster
2021-04-26 18:00     ` John Snow
2021-04-27  8:21       ` Markus Armbruster
2021-04-26 23:55     ` John Snow
2021-04-27  8:43       ` Markus Armbruster
2021-05-06  1:49         ` John Snow
2021-05-06  1:27   ` John Snow
2021-04-22  3:07 ` [PATCH 13/22] qapi/parser: [RFC] overload the return type of get_expr John Snow
2021-04-22  3:07 ` [PATCH 14/22] qapi/parser: Remove superfluous list constructor John Snow
2021-04-22  3:07 ` [PATCH 15/22] qapi/parser: allow 'ch' variable name John Snow
2021-04-22  3:07 ` [PATCH 16/22] qapi/parser: add docstrings John Snow
2021-04-25 13:27   ` Markus Armbruster
2021-04-26 18:26     ` John Snow
2021-04-27  9:03       ` Markus Armbruster
2021-05-06  2:08         ` John Snow
2021-05-07  1:34     ` John Snow
2021-05-07  8:25       ` Markus Armbruster
2021-04-22  3:07 ` [PATCH 17/22] CHECKPOINT John Snow
2021-04-22  3:07 ` [PATCH 18/22] qapi: [WIP] Rip QAPIDoc out of parser.py John Snow
2021-04-22  3:07 ` [PATCH 19/22] qapi: [WIP] Add type ignores for qapidoc.py John Snow
2021-04-22  3:07 ` [PATCH 20/22] qapi: [WIP] Import QAPIDoc from qapidoc Signed-off-by: John Snow <jsnow@redhat.com> John Snow
2021-04-22  3:07 ` [PATCH 21/22] qapi: [WIP] Add QAPIDocError John Snow
2021-04-22  3:07 ` [PATCH 22/22] qapi: [WIP] Enable linters on parser.py John Snow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).