All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Johansson via <qemu-devel@nongnu.org>
To: qemu-devel@nongnu.org
Cc: ale@rev.ng, tsimpson@quicinc.com, bcain@quicinc.com,
	babush@rev.ng, nizzo@rev.ng, richard.henderson@linaro.org
Subject: [PATCH v7 02/13] target/hexagon: import README for idef-parser
Date: Fri, 17 Dec 2021 10:01:18 +0100	[thread overview]
Message-ID: <20211217090129.23242-3-anjo@rev.ng> (raw)
In-Reply-To: <20211217090129.23242-1-anjo@rev.ng>

From: Alessandro Di Federico <ale@rev.ng>

Signed-off-by: Alessandro Di Federico <ale@rev.ng>
Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/README                 |   5 +
 target/hexagon/idef-parser/README.rst | 722 ++++++++++++++++++++++++++
 2 files changed, 727 insertions(+)
 create mode 100644 target/hexagon/idef-parser/README.rst

diff --git a/target/hexagon/README b/target/hexagon/README
index 372e24747c..6cb5affddb 100644
--- a/target/hexagon/README
+++ b/target/hexagon/README
@@ -27,6 +27,10 @@ Hexagon-specific code are
         encode*.def             Encoding patterns for each instruction
         iclass.def              Instruction class definitions used to determine
                                 legal VLIW slots for each instruction
+    qemu/target/hexagon/idef-parser
+        Parser that, given the high-level definitions of an instruction,
+        produces a C function generating equivalent tiny code instructions.
+        See README.rst.
     qemu/linux-user/hexagon
         Helpers for loading the ELF file and making Linux system calls,
         signals, etc
@@ -47,6 +51,7 @@ header files in <BUILD_DIR>/target/hexagon
         gen_tcg_funcs.py                -> tcg_funcs_generated.c.inc
         gen_tcg_func_table.py           -> tcg_func_table_generated.c.inc
         gen_helper_funcs.py             -> helper_funcs_generated.c.inc
+        gen_idef_parser_funcs.py        -> idef_parser_input.h
 
 Qemu helper functions have 3 parts
     DEF_HELPER declaration indicates the signature of the helper
diff --git a/target/hexagon/idef-parser/README.rst b/target/hexagon/idef-parser/README.rst
new file mode 100644
index 0000000000..65e6bf4ee5
--- /dev/null
+++ b/target/hexagon/idef-parser/README.rst
@@ -0,0 +1,722 @@
+Hexagon ISA instruction definitions to tinycode generator compiler
+------------------------------------------------------------------
+
+idef-parser is a small compiler able to translate the Hexagon ISA description
+language into tinycode generator code, that can be easily integrated into QEMU.
+
+Compilation Example
+-------------------
+
+To better understand the scope of the idef-parser, we'll explore an applicative
+example. Let's start by one of the simplest Hexagon instruction: the ``add``.
+
+The ISA description language represents the ``add`` instruction as
+follows:
+
+.. code:: c
+
+   A2_add(RdV, in RsV, in RtV) {
+       { RdV=RsV+RtV;}
+   }
+
+idef-parser will compile the above code into the following code:
+
+.. code:: c
+
+   /* A2_add */
+   void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV,
+                    TCGv_i32 RsV, TCGv_i32 RtV)
+   /*  { RdV=RsV+RtV;} */
+   {
+       TCGv_i32 tmp_0 = tcg_temp_new_i32();
+       tcg_gen_add_i32(tmp_0, RsV, RtV);
+       tcg_gen_mov_i32(RdV, tmp_0);
+       tcg_temp_free_i32(tmp_0);
+   }
+
+The output of the compilation process will be a function, containing the
+tinycode generator code, implementing the correct semantics. That function will
+not access any global variable, because all the accessed data structures will be
+passed explicitly as function parameters. Among the passed parameters we will
+have TCGv (tinycode variables) representing the input and output registers of
+the architecture, integers representing the immediates that come from the code,
+and other data structures which hold information about the disassemblation
+context (``DisasContext`` struct).
+
+Let's begin by describing the input code. The ``add`` instruction is associated
+with a unique identifier, in this case ``A2_add``, which allows to distinguish
+variants of the same instruction, and expresses the class to which the
+instruction belongs, in this case ``A2`` corresponds to the Hexagon
+``ALU32/ALU`` instruction subclass.
+
+After the instruction identifier, we have a series of parameters that represents
+TCG variables that will be passed to the generated function. Parameters marked
+with ``in`` are already initialized, while the others are output parameters.
+
+We will leverage this information to infer several information:
+
+-  Fill in the output function signature with the correct TCGv registers
+-  Fill in the output function signature with the immediate integers
+-  Keep track of which registers, among the declared one, have been
+   initialized
+
+Let's now observe the actual instruction description code, in this case:
+
+.. code:: c
+
+   { RdV=RsV+RtV;}
+
+This code is composed by a subset of the C syntax, and is the result of the
+application of some macro definitions contained in the ``macros.h`` file.
+
+This file is used to reduce the complexity of the input language where complex
+variants of similar constructs can be mapped to a unique primitive, so that the
+idef-parser has to handle a lower number of computation primitives.
+
+As you may notice, the description code modifies the registers which have been
+declared by the declaration statements. In this case all the three registers
+will be declared, ``RsV`` and ``RtV`` will also be read and ``RdV`` will be
+written.
+
+Now let's have a quick look at the generated code, line by line.
+
+::
+
+   TCGv_i32 tmp_0 = tcg_temp_new_i32();
+
+This code starts by declaring a temporary TCGv to hold the result from the sum
+operation.
+
+::
+
+   tcg_gen_add_i32(tmp_0, RsV, RtV);
+
+Then, we are generating the sum tinycode operator between the selected
+registers, storing the result in the just declared temporary.
+
+::
+
+   tcg_gen_mov_i32(RdV, tmp_0);
+
+The result of the addition is now stored in the temporary, we move it into the
+correct destination register. This code may seem inefficient, but QEMU will
+perform some optimizations on the tinycode, reducing the unnecessary copy.
+
+::
+
+   tcg_temp_free_i32(tmp_0);
+
+Finally, we free the temporary we used to hold the addition result.
+
+Parser Input
+------------
+
+Before moving on to the structure of idef-parser itself, let us spend some words
+on its' input. There are two preprocessing steps applied to the generated
+instruction semantics in ``semantics_generated.pyinc`` that we need to consider.
+Firstly,
+
+::
+
+    gen_idef_parser_funcs.py
+
+which takes instruction semantics in ``semantics_generated.pyinc`` to C-like
+pseudo code, output into ``idef_parser_input.h.inc``. For instance, the
+``J2_jumpr`` instruction which jumps to an address stored in a register
+argument. This is instruction is defined as
+
+::
+
+    SEMANTICS( \
+        "J2_jumpr", \
+        "jumpr Rs32", \
+        """{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}""" \
+    )
+
+in ``semantics_generated.pyinc``. Running ``gen_idef_parser_funcs.py``
+we obtain the pseudo code
+
+::
+
+    J2_jumpr(in RsV) {
+        {fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}
+    }
+
+with macros such as ``fJUMPR`` intact.
+
+The second step is to expand macros into a form suitable for our parser.
+These macros are defined in ``idef-parser/macros.inc`` and the step is
+carried out by the ``prepare`` script which runs the C preprocessor on
+``idef_parser_input.h.inc`` to produce
+``idef_parser_input.preprocessed.h.inc``.
+
+To finish the above example, after preprocessing ``J2_jumpr`` we obtain
+
+::
+
+    J2_jumpr(in RsV) {
+        {(PC = RsV);}
+    }
+
+where ``fJUMPR(RsN,RsV,COF_TYPE_JUMPR);`` was expanded to ``(PC = RsV)``,
+signifying a write to the Program Counter ``PC``.  Note, that ``PC`` in
+this expression is not a variable in the strict C sense since it is not
+declared anywhere, but rather a symbol which is easy to match in
+idef-parser later on.
+
+Parser Structure
+----------------
+
+The idef-parser is built using the ``flex`` and ``bison``.
+
+``flex`` is used to split the input string into tokens, each described using a
+regular expression. The token description is contained in the
+``idef-parser.lex`` source file. The flex-generated scanner takes care also to
+extract from the input text other meaningful information, e.g., the numerical
+value in case of an immediate constant, and decorates the token with the
+extracted information.
+
+``bison`` is used to generate the actual parser, starting from the parsing
+description contained in the ``idef-parser.y`` file. The generated parser
+executes the ``main`` function at the end of the ``idef-parser.y`` file, which
+opens input and output files, creates the parsing context, and eventually calls
+the ``yyparse()`` function, which starts the execution of the LALR(1) parser
+(see `Wikipedia <https://en.wikipedia.org/wiki/LALR_parser>`__ for more
+information about LALR parsing techniques). The LALR(1) parser, whenever it has
+to shift a token, calls the ``yylex()`` function, which is defined by the
+flex-generated code, and reads the input file returning the next scanned token.
+
+The tokens are mapped on the source language grammar, defined in the
+``idef-parser.y`` file to build a unique syntactic tree, according to the
+specified operator precedences and associativity rules.
+
+The grammar describes the whole file which contains the Hexagon instruction
+descriptions, therefore it starts from the ``input`` nonterminal, which is a
+list of instructions, each instruction is represented by the following grammar
+rule, representing the structure of the input file shown above:
+
+::
+
+   instruction : INAME arguments code
+               | error
+
+   arguments : '(' ')'
+             | '(' argument_list ')';
+
+   argument_list : argument_decl ',' argument_list
+                 | argument_decl
+
+   argument_decl : REG
+                 | PRED
+                 | IN REG
+                 | IN PRED
+                 | IMM
+                 | var
+                 ;
+
+   code        : '{' statements '}'
+
+   statements  : statements statement
+               | statement
+
+   statement   : control_statement
+               | var_decl ';'
+               | rvalue ';'
+               | code_block
+               | ';'
+
+   code_block  : '{' statements '}'
+               | '{' '}'
+
+With this initial portion of the grammar we are defining the instruction, its'
+arguments, and its' statements. Each argument is defined by the
+``argument_decl`` rule, and can be either
+
+::
+
+    Description                  Example
+    ----------------------------------------
+    output register              RsV
+    output predicate register    P0
+    input register               in RsV
+    input predicate register     in P0
+    immediate value              1234
+    local variable               EA
+
+Note, the only local variable allowed to be used as an argument is the effective
+address ``EA``. Similarly, each statement can be a ``control_statement``, a
+variable declaration such as ``int a;``, a code block, which is just a
+bracket-enclosed list of statements, a ``';'``, which is a ``nop`` instruction,
+and an ``rvalue ';'``.
+
+Expressions
+~~~~~~~~~~~
+
+Allowed in the input code are C language expressions with a few exceptions
+to simplify parsing. For instance, variable names such as ``RdV``, ``RssV``,
+``PdV``, ``CsV``, and other idiomatic register names from Hexagon, are
+reserved specifically for register arguments. These arguments then map to
+``TCGv_i32`` or ``TCGv_i64`` depending on the register size. Similarly, ``UiV``,
+``riV``, etc. refer to immediate arguments and will map to C integers.
+
+Also, as mentioned earlier, the names ``PC``, ``SP``, ``FP``, etc. are used to
+refer to Hexagon registers such as the program counter, stack pointer, and frame
+pointer seen here. Writes to these registers then correspond to assignments
+``PC = ...``, and reads correspond to uses of the variable ``PC``.
+
+Moreover, another example of one such exception is the selective expansion of
+macros present in ``macros.h``. As an example, consider the ``fABS`` macro which
+in plain C is defined as
+
+::
+
+    #define fABS(A) (((A) < 0) ? (-(A)) : (A))
+
+and returns the absolute value of the argument ``A``. This macro is not included
+in ``idef-parser/macros.inc`` and as such is not expanded and kept as a "call"
+``fABS(...)``. Reason being, that ``fABS`` is easier to match and map to
+``tcg_gen_abs_<width>``, compared to the full ternary expression above. Loads of
+macros in ``macros.h`` are kept unexpanded to aid in parsing, as seen in the
+example above, for more information see ``idef-parser/idef-parser.lex``.
+
+Finally, in mapping these input expressions to tinycode generators, idef-parser
+tries to perform as much as possible in plain C. Such as, performing binary
+operations in C instead of tinycode generators, thus effectively constant
+folding the expression.
+
+Variables and Variable Declarations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Similarly to C, variables in the input code must be explicitly declared, such as
+``int var1;`` which declares an uninitialized variable ``var1``. Initialization
+``int var2 = 0;`` is also allowed and behaves as expected. In tinycode
+generators the previous declarations are mapped to
+
+::
+
+    int var1;           ->      TCGv_i32 var1 = tcg_temp_local_new_i32();
+
+    int var2 = 0;       ->      TCGv_i32 var1 = tcg_temp_local_new_i32();
+                                tcg_gen_movi_i32(j, ((int64_t) 0ULL));
+
+which are later automatically freed at the end of the function they're declared
+in. Contrary to C, we only allow variables to be declared with an integer type
+specified in the following table (without permutation of keywords)
+
+::
+
+    type                        bit-width    signedness
+    ----------------------------------------------------------
+    int                         32           signed
+    signed
+    signed int
+
+    unsigned                    32           unsigned
+    unsigned int
+
+    long                        64           signed
+    long int
+    signed long
+    signed long int
+
+    unsigned long               64           unsigned
+    unsigned long int
+
+    long long                   64           signed
+    long long int
+    signed long long
+    signed long long int
+
+    unsigned long long          64           unsigned
+    unsigned long long int
+
+    size[1,2,4,8][s,u]_t        8-64         signed or unsigned
+
+In idef-parser, variable names are matched by a generic ``VARID`` token,
+which will feature the variable name as a decoration. For a variable declaration
+idef-parser calls ``gen_varid_allocate`` with the ``VARID`` token to save the
+name, size, and bit width of the newly declared variable. In addition, this
+function also ensures that variables aren't declared multiple times, and prints
+and error message if that is the case. Upon use of a variable, the ``VARID``
+token is used to lookup the size and bit width of the variable.
+
+Type System
+~~~~~~~~~~~
+
+idef-parser features a simple type system which is used to correctly implement
+the signedness and bit width of the operations.
+
+The type of each ``rvalue`` is determined by two attributes: its bit width
+(``unsigned bit_width``) and its signedness (``HexSignedness signedness``).
+
+For each operation, the type of ``rvalue``\ s influence the way in which the
+operands are handled and emitted. For example a right shift between signed
+operators will be an arithmetic shift, while one between unsigned operators
+will be a logical shift. If one of the two operands is signed, and the other
+is unsigned, the operation will be signed.
+
+The bit width also influences the outcome of the operations, in particular while
+the input languages features a fine granularity type system, with types of 8,
+16, 32, 64 (and more for vectorial instructions) bits, the tinycode only
+features 32 and 64 bit widths. We propagate as much as possible the fine
+granularity type, until the value has to be used inside an operation between
+``rvalue``\ s; in that case if one of the two operands is greater than 32 bits
+we promote the whole operation to 64 bit, taking care of properly extending the
+two operands. Fortunately, the most critical instructions already feature
+explicit casts and zero/sign extensions which are properly propagated down to
+our parser.
+
+The combination of ``rvalue``\ s are handled through the use of the
+``gen_bin_op`` and ``gen_bin_cmp`` helper functions. These two functions handle
+the appropriate compile-time or run-time emission of operations to perform the
+required computation.
+
+Control Statements
+~~~~~~~~~~~~~~~~~~
+
+``control_statement``\ s are all the statements which modify the order of
+execution of the generated code according to input parameters. They are expanded
+by the following grammar rule:
+
+::
+
+   control_statement : frame_check
+                     | cancel_statement
+                     | if_statement
+                     | for_statement
+                     | fpart1_statement
+
+``if_statement``\ s require the emission of labels and branch instructions which
+effectively perform conditional jumps (``tcg_gen_brcondi``) according to the
+value of an expression. Note, the tinycode generators we produce for conditional
+statements do not perfectly mirror what would be expected in C, for instance we
+do not reproduce short-circuiting of the ``&&`` operator, and use of the ``||``
+operator is disallowed. All the predicated instructions, and in general all the
+instructions where there could be alternative values assigned to an ``lvalue``,
+like C-style ternary expressions:
+
+::
+
+   rvalue            : rvalue QMARK rvalue COLON rvalue
+
+are handled using the conditional move tinycode instruction
+(``tcg_gen_movcond``), which avoids the additional complexity of managing labels
+and jumps.
+
+Instead, regarding the ``for`` loops, exploiting the fact that they always
+iterate on immediate values, therefore their iteration ranges are always known
+at compile time, we implemented those emitting plain C ``for`` loops. This is
+possible because the loops will be executed in the QEMU code, leading to the
+consequential unrolling of the for loop, since the tinycode generator
+instructions will be executed multiple times, and the respective generated
+tinycode will represent the unrolled execution of the loop.
+
+Parsing Context
+~~~~~~~~~~~~~~~
+
+All the helper functions in ``idef-parser.y`` carry two fixed parameters, which
+are the parsing context ``c`` and the ``YYLLOC`` location information. The
+context is explicitly passed to all the functions because the parser we generate
+is a reentrant one, meaning that it does not have any global variable, and
+therefore the instruction compilation could easily be parallelized in the
+future. Finally for each rule we propagate information about the location of the
+involved tokens to generate pretty error reporting, able to highlight the
+portion of the input code which generated each error.
+
+Debugging
+---------
+
+Developing the idef-parser can lead to two types of errors: compile-time errors
+and parsing errors.
+
+Compile-time errors in Bison-generated parsers are usually due to conflicts in
+the described grammar. Conflicts forbid the grammar to produce a unique
+derivation tree, thus must be solved (except for the dangling else problem,
+which is marked as expected through the ``%expect 1`` Bison option).
+
+For solving conflicts you need a basic understanding of `shift-reduce conflicts
+<https://www.gnu.org/software/Bison/manual/html_node/Shift_002fReduce.html>`__
+and `reduce-reduce conflicts
+<https://www.gnu.org/software/Bison/manual/html_node/Reduce_002fReduce.html>`__,
+then, if you are using a Bison version > 3.7.1 you can ask Bison to generate
+some counterexamples which highlight ambiguous derivations, passing the
+``-Wcex`` option to Bison. In general shift/reduce conflicts are solved by
+redesigning the grammar in an unambiguous way or by setting the token priority
+correctly, while reduce/reduce conflicts are solved by redesigning the
+interested part of the grammar.
+
+Run-time errors can be divided between lexing and parsing errors, lexing errors
+are hard to detect, since the ``var`` token will catch everything which is not
+catched by other tokens, but easy to fix, because most of the time a simple
+regex editing will be enough.
+
+idef-parser features a fancy parsing error reporting scheme, which for each
+parsing error reports the fragment of the input text which was involved in the
+parsing rule that generated an error.
+
+Implementing an instruction goes through several sequential steps, here are some
+suggestions to make each instruction proceed to the next step.
+
+-  not-emitted
+
+   Means that the parsing of the input code relative to that instruction failed,
+   this could be due to a lexical error or to some mismatch between the order of
+   valid tokens and a parser rule. You should check that tokens are correctly
+   identified and mapped, and that there is a rule matching the token sequence
+   that you need to parse.
+
+-  emitted
+
+   This instruction class contains all the instructions which are emitted but
+   fail to compile when included in QEMU. The compilation errors are shown by
+   the QEMU building process and will lead to fixing the bug.  Most common
+   errors regard the mismatch of parameters for tinycode generator functions,
+   which boil down to errors in the idef-parser type system.
+
+-  compiled
+
+   These instruction generate valid tinycode generator code, which however fail
+   the QEMU or the harness tests, these cases must be handled manually by
+   looking into the failing tests and looking at the generated tinycode
+   generator instruction and at the generated tinycode itself. Tip: handle the
+   failing harness tests first, because they usually feature only a single
+   instruction, thus will require less execution trace navigation. If a
+   multi-threaded test fail, fixing all the other tests will be the easier
+   option, hoping that the multi-threaded one will be indirectly fixed.
+
+   An example of debugging this type of failure is provided in the following
+   section.
+
+-  tests-passed
+
+   This is the final goal for each instruction, meaning that the instruction
+   passes the test suite.
+
+Another approach to fix QEMU system test, where many instructions might fail, is
+to compare the execution trace of your implementation with the reference
+implementations already present in QEMU. To do so you should obtain a QEMU build
+where the instruction pass the test, and run it with the following command:
+
+::
+
+   sudo unshare -p sudo -u <USER> bash -c \
+   'env -i <qemu-hexagon full path> -d cpu <TEST>'
+
+And do the same for your implementation, the generated execution traces will be
+inherently aligned and can be inspected for behavioral differences using the
+``diff`` tool.
+
+Example of debugging erroneous tinycode generator code
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The goal of this section is to provide a complete example of debugging
+incorrectly emitted tinycode generator for a single instruction.
+
+Let's first introduce a bug in the tinycode generator of the ``A2_add``
+instruction,
+
+::
+
+    void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV,
+                     TCGv_i32 RsV, TCGv_i32 RtV)
+    /*  RdV=RsV+RtV;} */
+    {
+        TCGv_i32 tmp_0 = tcg_temp_new_i32();
+        tcg_gen_add_i32(tmp_0, RsV, RsV);
+        tcg_gen_mov_i32(RdV, tmp_0);
+        tcg_temp_free_i32(tmp_0);
+    }
+
+Here the bug, albeit hard to spot, is in ``tcg_gen_add_i32(tmp_0, RsV, RsV);``
+where we compute ``RsV + RsV`` instead of ``RsV + RtV``, as would be expected.
+This particular bug is a bit tricky to pinpoint when debugging, since the
+``A2_add`` instruction is so ubiquitous. As a result, pretty much all tests will
+fail and therefore not provide a lot of information about the bug.
+
+For example, let's run the ``check-tcg`` tests
+
+::
+
+    make check-tcg TIMEOUT=1200 \
+                   DOCKER_IMAGE=debian-hexagon-cross \
+                   ENGINE=podman V=1 \
+                   DOCKER_CROSS_CC_GUEST=hexagon-unknown-linux-musl-clang
+
+In the output, we find a failure in the very first test case ``float_convs``
+due to a segmentation fault. Similarly, all harness and libc tests will fail as
+well. At this point we have no clue where the actual bug lies, and need to start
+ruling out instructions. As such a good starting point is to utilize the debug
+options ``-d in_asm,cpu`` of QEMU to inspect the Hexagon instructions being run,
+alongside the CPU state. We additionally need a working version of the emulator
+to compare our buggy CPU state against, running
+
+::
+
+    meson configure -Dhexagon_idef_parser_enabled=false
+
+will disable the idef-parser for all instructions and fallback on manual
+tinycode generator overrides, or on helper function implementations. Recompiling
+gives us ``qemu-hexagon`` which passes all tests. If ``qemu-heaxgon-buggy`` is
+our binary with the incorrect tinycode generators, we can compare the CPU state
+between the two versions
+
+::
+
+    ./qemu-hexagon-buggy -d in_asm,cpu float_convs &> out_buggy
+    ./qemu-hexagon       -d in_asm,cpu float_convs &> out_working
+
+Looking at ``diff -u out_buggy out_working`` shows us that the CPU state begins
+to diverge on line 141, with an incorrect value in the ``R1`` register
+
+::
+
+    @@ -138,7 +138,7 @@
+
+     General Purpose Registers = {
+       r0 = 0x4100f9c0
+    -  r1 = 0x00042108
+    +  r1 = 0x00000000
+       r2 = 0x00021084
+       r3 = 0x00000000
+       r4 = 0x00000000
+
+If we also look into ``out_buggy`` directly we can inspect the input assembly
+which the caused the incorrect CPU state, around line 141 we find
+
+::
+
+    116 |  ----------------
+    117 |  IN: _start_c
+    118 |  0x000210b0:  0xa09dc002	{	allocframe(R29,#0x10):raw }
+    ... |  ...
+    137 |  0x000210fc:  0x5a00c4aa	{	call PC+2388 }
+    138 |
+    139 |  General Purpose Registers = {
+    140 |    r0 = 0x4100fa70
+    141 |    r1 = 0x00042108
+    142 |    r2 = 0x00021084
+    143 |    r3 = 0x00000000
+
+Importantly, we see some Hexagon assembly followed by a dump of the CPU state,
+now the CPU state is actually dumped before the input assembly above is ran.
+As such, we are actually interested in the instructions ran before this.
+
+Scrolling up a bit, we find
+
+::
+
+    54 |  ----------------
+    55 |  IN: _start
+    56 |  0x00021088:  0x6a09c002	{	R2 = C9/pc }
+    57 |  0x0002108c:  0xbfe2ff82	{	R2 = add(R2,#0xfffffffc) }
+    58 |  0x00021090:  0x9182c001	{	R1 = memw(R2+#0x0) }
+    59 |  0x00021094:  0xf302c101	{	R1 = add(R2,R1) }
+    60 |  0x00021098:  0x7800c01e	{	R30 = #0x0 }
+    61 |  0x0002109c:  0x707dc000	{	R0 = R29 }
+    62 |  0x000210a0:  0x763dfe1d	{	R29 = and(R29,#0xfffffff0) }
+    63 |  0x000210a4:  0xa79dfdfe	{	memw(R29+#0xfffffff8) = R29 }
+    64 |  0x000210a8:  0xbffdff1d	{	R29 = add(R29,#0xfffffff8) }
+    65 |  0x000210ac:  0x5a00c002	{	call PC+4 }
+    66 |
+    67 |  General Purpose Registers = {
+    68 |    r0 = 0x00000000
+    69 |    r1 = 0x00000000
+    70 |    r2 = 0x00000000
+    71 |    r3 = 0x00000000
+
+Remember, the instructions on lines 56-65 are ran on the CPU state shown below
+instructions, and as the CPU state has not diverged at this point, we know the
+starting state is accurate. The bug must then lie within the instructions shown
+here. Next we may notice that ``R1`` is only touched by lines 57 and 58, that is
+by
+
+::
+
+    58 |  0x00021090:  0x9182c001	{	R1 = memw(R2+#0x0) }
+    59 |  0x00021094:  0xf302c101	{	R1 = add(R2,R1) }
+
+Therefore, we are either dealing with an correct load instruction
+``R1 = memw(R2+#0x0)`` or with an incorrect add ``R1 = add(R2,R1)``. At this
+point it might be easy enough to go directly to the emitted code for the
+instructions mentioned and look for bugs, but we could also run
+``./qemu-heaxgon -d op,in_asm float_conv`` where we find for the following
+tinycode for the Hexagon ``add`` instruction
+
+::
+
+   ---- 00021094
+   mov_i32 pkt_has_store_s1,$0x0
+   add_i32 tmp0,r2,r2
+   mov_i32 loc2,tmp0
+   mov_i32 new_r1,loc2
+   mov_i32 r1,new_r1
+
+Here we have finally located our bug ``add_i32 tmp0,r2,r2``.
+
+Limitations and Future Development
+----------------------------------
+
+The main limitation of the current parser is given by the syntax-driven nature
+of the Bison-generated parsers. This has the severe implication of only being
+able to generate code in the order of evaluation of the various rules, without,
+in any case, being able to backtrack and alter the generated code.
+
+An example limitation is highlighted by this statement of the input language:
+
+::
+
+   { (PsV==0xff) ? (PdV=0xff) : (PdV=0x00); }
+
+This ternary assignment, when written in this form requires us to emit some
+proper control flow statements, which emit a jump to the first or to the second
+code block, whose implementation is extremely convoluted, because when matching
+the ternary assignment, the code evaluating the two assignments will be already
+generated.
+
+Instead we pre-process that statement, making it become:
+
+::
+
+   { PdV = ((PsV==0xff)) ? 0xff : 0x00; }
+
+Which can be easily matched by the following parser rules:
+
+::
+
+   statement             | rvalue ';'
+
+   rvalue                : rvalue QMARK rvalue COLON rvalue
+                         | rvalue EQ rvalue
+                         | LPAR rvalue RPAR
+                         | assign_statement
+                         | IMM
+
+   assign_statement      : pred ASSIGN rvalue
+
+Another example that highlight the limitation of the flex/bison parser can be
+found even in the add operation we already saw:
+
+::
+
+   TCGv_i32 tmp_0 = tcg_temp_new_i32();
+   tcg_gen_add_i32(tmp_0, RsV, RtV);
+   tcg_gen_mov_i32(RdV, tmp_0);
+
+The fact that we cannot directly use ``RdV`` as the destination of the sum is a
+consequence of the syntax-driven nature of the parser. In fact when we parse the
+assignment, the ``rvalue`` token, representing the sum has already been reduced,
+and thus its code emitted and unchangeable. We rely on the fact that QEMU will
+optimize our code reducing the useless move operations and the relative
+temporaries.
+
+A possible improvement of the parser regards the support for vectorial
+instructions and floating point instructions, which will require the extension
+of the scanner, the parser, and a partial re-design of the type system, allowing
+to build the vectorial semantics over the available vectorial tinycode generator
+primitives.
+
+A more radical improvement will use the parser, not to generate directly the
+tinycode generator code, but to generate an intermediate representation like the
+LLVM IR, which in turn could be compiled using the clang TCG backend. That code
+could be furtherly optimized, overcoming the limitations of the syntax-driven
+parsing and could lead to a more optimized generated code.
-- 
2.33.1



  parent reply	other threads:[~2021-12-17  9:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-17  9:01 [PATCH v7 00/13] target/hexagon: introduce idef-parser Anton Johansson via
2021-12-17  9:01 ` [PATCH v7 01/13] target/hexagon: update MAINTAINERS for idef-parser Anton Johansson via
2021-12-20 22:50   ` Taylor Simpson
2021-12-17  9:01 ` Anton Johansson via [this message]
2021-12-22 21:05   ` [PATCH v7 02/13] target/hexagon: import README " Taylor Simpson
2021-12-17  9:01 ` [PATCH v7 03/13] target/hexagon: make slot number an unsigned Anton Johansson via
2021-12-22 21:47   ` Taylor Simpson
2021-12-17  9:01 ` [PATCH v7 04/13] target/hexagon: make helper functions non-static Anton Johansson via
2021-12-17  9:01 ` [PATCH v7 05/13] target/hexagon: introduce new helper functions Anton Johansson via
2021-12-21 18:51   ` Taylor Simpson
2021-12-22 19:29   ` Taylor Simpson
2021-12-17  9:01 ` [PATCH v7 06/13] target/hexagon: expose next PC in DisasContext Anton Johansson via
2021-12-17  9:01 ` [PATCH v7 07/13] target/hexagon: prepare input for the idef-parser Anton Johansson via
2021-12-22 21:53   ` Taylor Simpson
2021-12-23  1:53   ` Taylor Simpson
2021-12-17  9:01 ` [PATCH v7 08/13] target/hexagon: import flex/bison to docker files Anton Johansson via
2021-12-17  9:01 ` [PATCH v7 09/13] target/hexagon: import lexer for idef-parser Anton Johansson via
2021-12-21 18:48   ` Taylor Simpson
2021-12-17  9:01 ` [PATCH v7 10/13] target/hexagon: import parser " Anton Johansson via
2021-12-17  9:01 ` [PATCH v7 11/13] target/hexagon: call idef-parser functions Anton Johansson via
2021-12-22 23:21   ` Taylor Simpson
2021-12-17  9:01 ` [PATCH v7 12/13] target/hexagon: import additional tests Anton Johansson via
2021-12-23 18:16   ` Taylor Simpson
2021-12-17  9:01 ` [PATCH v7 13/13] gitlab-ci: do not use qemu-project Docker registry Anton Johansson via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211217090129.23242-3-anjo@rev.ng \
    --to=qemu-devel@nongnu.org \
    --cc=ale@rev.ng \
    --cc=anjo@rev.ng \
    --cc=babush@rev.ng \
    --cc=bcain@quicinc.com \
    --cc=nizzo@rev.ng \
    --cc=richard.henderson@linaro.org \
    --cc=tsimpson@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.