qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] decodetree: Allow 'dot' in opcode names
@ 2021-01-12 18:41 Philippe Mathieu-Daudé
  2021-01-12 20:44 ` Richard Henderson
  2021-01-12 21:05 ` Eduardo Habkost
  0 siblings, 2 replies; 8+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-01-12 18:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	Richard Henderson, Eduardo Habkost, Cleber Rosa

Some ISA use a dot in their opcodes. Allow the decodetree
script to process them. The dot is replaced by an underscore
in the generated code.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 scripts/decodetree.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 47aa9caf6d1..b7572589e64 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -49,7 +49,7 @@
 re_arg_ident = '&[a-zA-Z0-9_]*'
 re_fld_ident = '%[a-zA-Z0-9_]*'
 re_fmt_ident = '@[a-zA-Z0-9_]*'
-re_pat_ident = '[a-zA-Z0-9_]*'
+re_pat_ident = '[a-zA-Z0-9_.]*'
 
 def error_with_file(file, lineno, *args):
     """Print an error message from file:line and args and exit."""
@@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
         elif re.fullmatch(re_fmt_ident, name):
             parse_generic(start_lineno, None, name[1:], toks)
         elif re.fullmatch(re_pat_ident, name):
+            name = name.replace('.', '_')
             parse_generic(start_lineno, parent_pat, name, toks)
         else:
             error(lineno, 'invalid token "{0}"'.format(name))
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] decodetree: Allow 'dot' in opcode names
  2021-01-12 18:41 [PATCH] decodetree: Allow 'dot' in opcode names Philippe Mathieu-Daudé
@ 2021-01-12 20:44 ` Richard Henderson
  2021-01-12 22:15   ` Philippe Mathieu-Daudé
  2021-01-12 21:05 ` Eduardo Habkost
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Henderson @ 2021-01-12 20:44 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Paolo Bonzini, Eduardo Habkost, Cleber Rosa

On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
> Some ISA use a dot in their opcodes. Allow the decodetree
> script to process them. The dot is replaced by an underscore
> in the generated code.

Given that you then have to remember to use '_' on the C side, what advantage
does this give?


r~


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] decodetree: Allow 'dot' in opcode names
  2021-01-12 18:41 [PATCH] decodetree: Allow 'dot' in opcode names Philippe Mathieu-Daudé
  2021-01-12 20:44 ` Richard Henderson
@ 2021-01-12 21:05 ` Eduardo Habkost
  2021-01-12 22:19   ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 8+ messages in thread
From: Eduardo Habkost @ 2021-01-12 21:05 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Paolo Bonzini, Richard Henderson, qemu-devel, Cleber Rosa

On Tue, Jan 12, 2021 at 07:41:56PM +0100, Philippe Mathieu-Daudé wrote:
> Some ISA use a dot in their opcodes. Allow the decodetree
> script to process them. The dot is replaced by an underscore
> in the generated code.

Will something break if we just use underscores instead of dots
in the input file?

> 
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>  scripts/decodetree.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 47aa9caf6d1..b7572589e64 100644
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -49,7 +49,7 @@
>  re_arg_ident = '&[a-zA-Z0-9_]*'
>  re_fld_ident = '%[a-zA-Z0-9_]*'
>  re_fmt_ident = '@[a-zA-Z0-9_]*'
> -re_pat_ident = '[a-zA-Z0-9_]*'
> +re_pat_ident = '[a-zA-Z0-9_.]*'

If pattern identifiers are going to follow different rules,
doesn't this need to be documented at docs/devel/decodetree.rst?

>  
>  def error_with_file(file, lineno, *args):
>      """Print an error message from file:line and args and exit."""
> @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
>          elif re.fullmatch(re_fmt_ident, name):
>              parse_generic(start_lineno, None, name[1:], toks)
>          elif re.fullmatch(re_pat_ident, name):
> +            name = name.replace('.', '_')
>              parse_generic(start_lineno, parent_pat, name, toks)

Do we want error messages generated by the script to use the
modified identifier with underscores, or the original identifier
with dots?  (This patch does the former)

>          else:
>              error(lineno, 'invalid token "{0}"'.format(name))
> -- 
> 2.26.2
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] decodetree: Allow 'dot' in opcode names
  2021-01-12 20:44 ` Richard Henderson
@ 2021-01-12 22:15   ` Philippe Mathieu-Daudé
  2021-01-12 22:28     ` Eduardo Habkost
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-01-12 22:15 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Paolo Bonzini, Eduardo Habkost, Cleber Rosa

On 1/12/21 9:44 PM, Richard Henderson wrote:
> On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
>> Some ISA use a dot in their opcodes. Allow the decodetree
>> script to process them. The dot is replaced by an underscore
>> in the generated code.
> 
> Given that you then have to remember to use '_' on the C side, what advantage
> does this give?

The direct advantage is you can copy/paste the opcode in a PDF viewer
without having to edit it :)

See i.e. some Loongson opcodes [*]:

MULT.G          011100 ..... ..... ..... 00000 010000   @rs_rt_rd
DMULT.G         011100 ..... ..... ..... 00000 010001   @rs_rt_rd
MULTU.G         011100 ..... ..... ..... 00000 010010   @rs_rt_rd
DMULTU.G        011100 ..... ..... ..... 00000 010011   @rs_rt_rd

DIV.G           011100 ..... ..... ..... 00000 010100   @rs_rt_rd
DDIV.G          011100 ..... ..... ..... 00000 010101   @rs_rt_rd
DIVU.G          011100 ..... ..... ..... 00000 010110   @rs_rt_rd
DDIVU.G         011100 ..... ..... ..... 00000 010111   @rs_rt_rd

MOD.G           011100 ..... ..... ..... 00000 011100   @rs_rt_rd
DMOD.G          011100 ..... ..... ..... 00000 011101   @rs_rt_rd
MODU.G          011100 ..... ..... ..... 00000 011110   @rs_rt_rd
DMODU.G         011100 ..... ..... ..... 00000 011111   @rs_rt_rd

The other - remote - advantage I see is when using a disassembler
based on decodetree (as AVR does), the opcode displayed also matches
the specs. We are not yet there with MIPS, but I have something in
progress...

[*] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg02509.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] decodetree: Allow 'dot' in opcode names
  2021-01-12 21:05 ` Eduardo Habkost
@ 2021-01-12 22:19   ` Philippe Mathieu-Daudé
  2021-01-12 22:30     ` Eduardo Habkost
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-01-12 22:19 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, Richard Henderson, qemu-devel, Cleber Rosa

On 1/12/21 10:05 PM, Eduardo Habkost wrote:
> On Tue, Jan 12, 2021 at 07:41:56PM +0100, Philippe Mathieu-Daudé wrote:
>> Some ISA use a dot in their opcodes. Allow the decodetree
>> script to process them. The dot is replaced by an underscore
>> in the generated code.
> 
> Will something break if we just use underscores instead of dots
> in the input file?

No, but then the opcode doesn't really match the spec.

> 
>>
>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> ---
>>  scripts/decodetree.py | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
>> index 47aa9caf6d1..b7572589e64 100644
>> --- a/scripts/decodetree.py
>> +++ b/scripts/decodetree.py
>> @@ -49,7 +49,7 @@
>>  re_arg_ident = '&[a-zA-Z0-9_]*'
>>  re_fld_ident = '%[a-zA-Z0-9_]*'
>>  re_fmt_ident = '@[a-zA-Z0-9_]*'
>> -re_pat_ident = '[a-zA-Z0-9_]*'
>> +re_pat_ident = '[a-zA-Z0-9_.]*'
> 
> If pattern identifiers are going to follow different rules,
> doesn't this need to be documented at docs/devel/decodetree.rst?

I checked and luckily for me the opcode pattern identifiers is
not documented <:)

> 
>>  
>>  def error_with_file(file, lineno, *args):
>>      """Print an error message from file:line and args and exit."""
>> @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
>>          elif re.fullmatch(re_fmt_ident, name):
>>              parse_generic(start_lineno, None, name[1:], toks)
>>          elif re.fullmatch(re_pat_ident, name):
>> +            name = name.replace('.', '_')
>>              parse_generic(start_lineno, parent_pat, name, toks)
> 
> Do we want error messages generated by the script to use the
> modified identifier with underscores, or the original identifier
> with dots?  (This patch does the former)

You are right, we want the former in the error message (the input
format).

Thanks,

Phil.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] decodetree: Allow 'dot' in opcode names
  2021-01-12 22:15   ` Philippe Mathieu-Daudé
@ 2021-01-12 22:28     ` Eduardo Habkost
  2021-01-12 22:46       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 8+ messages in thread
From: Eduardo Habkost @ 2021-01-12 22:28 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Paolo Bonzini, Richard Henderson, qemu-devel, Cleber Rosa

On Tue, Jan 12, 2021 at 11:15:38PM +0100, Philippe Mathieu-Daudé wrote:
> On 1/12/21 9:44 PM, Richard Henderson wrote:
> > On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
> >> Some ISA use a dot in their opcodes. Allow the decodetree
> >> script to process them. The dot is replaced by an underscore
> >> in the generated code.
> > 
> > Given that you then have to remember to use '_' on the C side, what advantage
> > does this give?
> 
> The direct advantage is you can copy/paste the opcode in a PDF viewer
> without having to edit it :)
> 
> See i.e. some Loongson opcodes [*]:
> 
> MULT.G          011100 ..... ..... ..... 00000 010000   @rs_rt_rd
> DMULT.G         011100 ..... ..... ..... 00000 010001   @rs_rt_rd
> MULTU.G         011100 ..... ..... ..... 00000 010010   @rs_rt_rd
> DMULTU.G        011100 ..... ..... ..... 00000 010011   @rs_rt_rd
> 
> DIV.G           011100 ..... ..... ..... 00000 010100   @rs_rt_rd
> DDIV.G          011100 ..... ..... ..... 00000 010101   @rs_rt_rd
> DIVU.G          011100 ..... ..... ..... 00000 010110   @rs_rt_rd
> DDIVU.G         011100 ..... ..... ..... 00000 010111   @rs_rt_rd
> 
> MOD.G           011100 ..... ..... ..... 00000 011100   @rs_rt_rd
> DMOD.G          011100 ..... ..... ..... 00000 011101   @rs_rt_rd
> MODU.G          011100 ..... ..... ..... 00000 011110   @rs_rt_rd
> DMODU.G         011100 ..... ..... ..... 00000 011111   @rs_rt_rd
> 
> The other - remote - advantage I see is when using a disassembler
> based on decodetree (as AVR does), the opcode displayed also matches
> the specs. We are not yet there with MIPS, but I have something in
> progress...

Interesting.  So, the decodetree format is not used exclusively
inside the QEMU source tree, but also by other projects?  Is
there a specification somewhere else?

> 
> [*] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg02509.html
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] decodetree: Allow 'dot' in opcode names
  2021-01-12 22:19   ` Philippe Mathieu-Daudé
@ 2021-01-12 22:30     ` Eduardo Habkost
  0 siblings, 0 replies; 8+ messages in thread
From: Eduardo Habkost @ 2021-01-12 22:30 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Paolo Bonzini, Richard Henderson, qemu-devel, Cleber Rosa

On Tue, Jan 12, 2021 at 11:19:49PM +0100, Philippe Mathieu-Daudé wrote:
> On 1/12/21 10:05 PM, Eduardo Habkost wrote:
[...]
> >> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> >> index 47aa9caf6d1..b7572589e64 100644
> >> --- a/scripts/decodetree.py
> >> +++ b/scripts/decodetree.py
> >> @@ -49,7 +49,7 @@
> >>  re_arg_ident = '&[a-zA-Z0-9_]*'
> >>  re_fld_ident = '%[a-zA-Z0-9_]*'
> >>  re_fmt_ident = '@[a-zA-Z0-9_]*'
> >> -re_pat_ident = '[a-zA-Z0-9_]*'
> >> +re_pat_ident = '[a-zA-Z0-9_.]*'
> > 
> > If pattern identifiers are going to follow different rules,
> > doesn't this need to be documented at docs/devel/decodetree.rst?
> 
> I checked and luckily for me the opcode pattern identifiers is
> not documented <:)

The format is not documented, but the specification grammar
implies the same rules apply to all identifiers.

-- 
Eduardo



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] decodetree: Allow 'dot' in opcode names
  2021-01-12 22:28     ` Eduardo Habkost
@ 2021-01-12 22:46       ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 8+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-01-12 22:46 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, Richard Henderson, qemu-devel, Cleber Rosa

On 1/12/21 11:28 PM, Eduardo Habkost wrote:
> On Tue, Jan 12, 2021 at 11:15:38PM +0100, Philippe Mathieu-Daudé wrote:
>> On 1/12/21 9:44 PM, Richard Henderson wrote:
>>> On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
>>>> Some ISA use a dot in their opcodes. Allow the decodetree
>>>> script to process them. The dot is replaced by an underscore
>>>> in the generated code.
>>>
>>> Given that you then have to remember to use '_' on the C side, what advantage
>>> does this give?
>>
>> The direct advantage is you can copy/paste the opcode in a PDF viewer
>> without having to edit it :)
>>
>> See i.e. some Loongson opcodes [*]:
>>
>> MULT.G          011100 ..... ..... ..... 00000 010000   @rs_rt_rd
>> DMULT.G         011100 ..... ..... ..... 00000 010001   @rs_rt_rd
>> MULTU.G         011100 ..... ..... ..... 00000 010010   @rs_rt_rd
>> DMULTU.G        011100 ..... ..... ..... 00000 010011   @rs_rt_rd
>>
>> DIV.G           011100 ..... ..... ..... 00000 010100   @rs_rt_rd
>> DDIV.G          011100 ..... ..... ..... 00000 010101   @rs_rt_rd
>> DIVU.G          011100 ..... ..... ..... 00000 010110   @rs_rt_rd
>> DDIVU.G         011100 ..... ..... ..... 00000 010111   @rs_rt_rd
>>
>> MOD.G           011100 ..... ..... ..... 00000 011100   @rs_rt_rd
>> DMOD.G          011100 ..... ..... ..... 00000 011101   @rs_rt_rd
>> MODU.G          011100 ..... ..... ..... 00000 011110   @rs_rt_rd
>> DMODU.G         011100 ..... ..... ..... 00000 011111   @rs_rt_rd
>>
>> The other - remote - advantage I see is when using a disassembler
>> based on decodetree (as AVR does), the opcode displayed also matches
>> the specs. We are not yet there with MIPS, but I have something in
>> progress...
> 
> Interesting.  So, the decodetree format is not used exclusively
> inside the QEMU source tree, but also by other projects?  Is
> there a specification somewhere else?

"as AVR does in QEMU", see commit 9d8caa67a24
("target/avr: Add support for disassembling via option '-d in_asm'").

What seduces me with decodetree is we don't need to match QEMU
instruction class with each CPU capabilities. IOW we can use the
same decoder for TCG and disassembly, and the disassembly matches
the instruction set of the CPU (with all the specific instructions).

Currently some specific opcodes are displayed as generic ones (or
as unknown via hexadecimal value). Unfortunately not something we
can show with QEMU AVR target because the ISA is very simple.

Regards,

Phil.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-12 22:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-12 18:41 [PATCH] decodetree: Allow 'dot' in opcode names Philippe Mathieu-Daudé
2021-01-12 20:44 ` Richard Henderson
2021-01-12 22:15   ` Philippe Mathieu-Daudé
2021-01-12 22:28     ` Eduardo Habkost
2021-01-12 22:46       ` Philippe Mathieu-Daudé
2021-01-12 21:05 ` Eduardo Habkost
2021-01-12 22:19   ` Philippe Mathieu-Daudé
2021-01-12 22:30     ` Eduardo Habkost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).