[Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches
@ 2017-12-18 17:45 Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py Richard Henderson
                   ` (23 more replies)
  0 siblings, 24 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

The most important part here, for review, is the first patch.

I add a code generator, writen in python, which takes an input file
that describes the opcode bits and field bits of the instructions,
and outputs a function that does all of the decoding.

The subsequent patches begin to add SVE support and also demonstrate
how I envision how both the decoder and the tcg host vector support
are to be used.  Thus, review of the direction would be appreciated
before there are another 100 patches along the same style.


r~


Richard Henderson (23):
  scripts: Add decodetree.py
  target/arm: Add SVE decode skeleton
  target/arm: Implement SVE Bitwise Logical - Unpredicated Group
  target/arm: Implement PTRUE, PFALSE, SETFFR
  target/arm: Implement SVE predicate logical operations
  target/arm: Implement SVE load vector/predicate
  target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
  target/arm: Handle SVE registers in write_fp_dreg
  target/arm: Handle SVE registers when using clear_vec_high
  target/arm: Implement SVE Integer Reduction Group
  target/arm: Implement SVE bitwise shift by immediate (predicated)
  target/arm: Implement SVE bitwise shift by vector (predicated)
  target/arm: Implement SVE bitwise shift by wide elements (predicated)
  target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  target/arm: Implement SVE Integer Multiply-Add Group
  target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
  target/arm: Implement SVE Index Generation Group
  target/arm: Implement SVE Stack Allocation Group
  target/arm: Implement SVE Bitwise Shift - Unpredicated Group
  target/arm: Implement SVE Compute Vector Address Group
  target/arm: Implement SVE floating-point exponential accelerator
  target/arm: Implement SVE floating-point trig select coefficient
  target/arm: Implement SVE Element Count Group, register destinations

 target/arm/helper-sve.h    |  409 ++++++++++++++
 target/arm/helper.h        |    1 +
 target/arm/translate-a64.h |  112 ++++
 target/arm/sve_helper.c    | 1177 +++++++++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c |  272 +++------
 target/arm/translate-sve.c | 1313 ++++++++++++++++++++++++++++++++++++++++++++
 .gitignore                 |    1 +
 scripts/decodetree.py      |  984 +++++++++++++++++++++++++++++++++
 target/arm/Makefile.objs   |   11 +
 target/arm/sve.def         |  328 +++++++++++
 10 files changed, 4418 insertions(+), 190 deletions(-)
 create mode 100644 target/arm/helper-sve.h
 create mode 100644 target/arm/translate-a64.h
 create mode 100644 target/arm/sve_helper.c
 create mode 100644 target/arm/translate-sve.c
 create mode 100755 scripts/decodetree.py
 create mode 100644 target/arm/sve.def

-- 
2.14.3

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2018-01-11 18:06   ` Peter Maydell
  2018-01-12 11:57   ` Peter Maydell
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton Richard Henderson
                   ` (22 subsequent siblings)
  23 siblings, 2 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

To be used to decode ARM SVE, but could be used for any 32-bit RISC.
It would need additional work to extend to insn sizes other than 32-bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 scripts/decodetree.py | 984 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 984 insertions(+)
 create mode 100755 scripts/decodetree.py

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
new file mode 100755
index 0000000000..acb0243915
--- /dev/null
+++ b/scripts/decodetree.py
@@ -0,0 +1,984 @@
+#!/usr/bin/env python
+#
+# Generate a decoding tree from a specification file.
+#
+# The tree is built from instruction "patterns".  A pattern may represent
+# a single architectural instruction or a group of same, depending on what
+# is convenient for further processing.
+#
+# Each pattern has "fixedbits" & "fixedmask", the combination of which
+# describes the condition under which the pattern is matched:
+#
+#   (insn & fixedmask) == fixedbits
+#
+# Each pattern may have "fields", which are extracted from the insn and
+# passed along to the translator.  Examples of such are registers,
+# immediates, and sub-opcodes.
+#
+# In support of patterns, one may declare fields, argument sets, and
+# formats, each of which may be re-used to simplify further definitions.
+#
+## Field syntax:
+#
+# field_def	:= '%' identifier ( unnamed_field )+ ( !function=identifier )?
+# unnamed_field := number ':' ( 's' ) number
+#
+# For unnamed_field, the first number is the least-significant bit position of
+# the field and the second number is the length of the field.  If the 's' is
+# present, the field is considered signed.  If multiple unnamed_fields are
+# present, they are concatenated.  In this way one can define disjoint fields.
+#
+# If !function is specified, the concatenated result is passed through the
+# named function, taking and returning an integral value.
+#
+# FIXME: the fields of the structure into which this result will be stored
+# is restricted to "int".  Which means that we cannot expand 64-bit items.
+#
+# Field examples:
+#
+#   %disp   0:s16          -- sextract(i, 0, 16)
+#   %imm9   16:6 10:3      -- extract(i, 16, 6) << 3 | extract(i, 10, 3)
+#   %disp12 0:s1 1:1 2:10  -- sextract(i, 0, 1) << 11
+#                             | extract(i, 1, 1) << 10
+#                             | extract(i, 2, 10)
+#   %shimm8 5:s8 13:1 !function=expand_shimm8
+#                          -- expand_shimm8(sextract(i, 5, 8) << 1
+#                                           | extract(i, 13, 1))
+#
+## Argument set syntax:
+#
+# args_def    := '&' identifier ( args_elt )+
+# args_elt    := identifier
+#
+# Each args_elt defines an argument within the argument set.
+# Each argument set will be rendered as a C structure "arg_$name"
+# with each of the fields being one of the member arguments.
+#
+# Argument set examples:
+#
+#   &reg3       ra rb rc
+#   &loadstore  reg base offset
+#
+## Format syntax:
+#
+# fmt_def      := '@' identifier ( fmt_elt )+
+# fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
+# fixedbit_elt := [01.]+
+# field_elt    := identifier ':' 's'? number
+# field_ref    := '%' identifier | identifier '=' '%' identifier
+# args_ref     := '&' identifier
+#
+# Defining a format is a handy way to avoid replicating groups of fields
+# across many instruction patterns.
+#
+# A fixedbit_elt describes a contiguous sequence of bits that must
+# be 1, 0, or "." for don't care.
+#
+# A field_elt describes a simple field only given a width; the position of
+# the field is implied by its position with respect to other fixedbit_elt
+# and field_elt.
+#
+# If any fixedbit_elt or field_elt appear then all 32 bits must be defined.
+# Padding with a fixedbit_elt of all '.' is an easy way to accomplish that.
+#
+# A field_ref incorporates a field by reference.  This is the only way to
+# add a complex field to a format.  A field may be renamed in the process
+# via assignment to another identifier.  This is intended to allow the
+# same argument set be used with disjoint named fields.
+#
+# A single args_ref may specify an argument set to use for the format.
+# The set of fields in the format must be a subset of the arguments in
+# the argument set.  If an argument set is not specified, one will be
+# inferred from the set of fields.
+#
+# It is recommended, but not required, that all field_ref and args_ref
+# appear at the end of the line, not interleaving with fixedbit_elf or
+# field_elt.
+#
+# Format examples:
+#
+#   @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
+#   @opi    ...... ra:5 lit:8    1 ....... rc:5
+#
+## Pattern syntax:
+#
+# pat_def      := identifier ( pat_elt )+
+# pat_elt      := fixedbit_elt | field_elt | field_ref
+#               | args_ref | fmt_ref | const_elt
+# fmt_ref      := '@' identifier
+# const_elt    := identifier '=' number
+#
+# The fixedbit_elt and field_elt specifiers are unchanged from formats.
+# A pattern that does not specify a named format will have one inferred
+# from a referenced argument set (if present) and the set of fields.
+#
+# A const_elt allows a argument to be set to a constant value.  This may
+# come in handy when fields overlap between patterns and one has to
+# include the values in the fixedbit_elt instead.
+#
+# The decoder will call a translator function for each pattern matched.
+#
+# Pattern examples:
+#
+#   addl_r   010000 ..... ..... .... 0000000 ..... @opr
+#   addl_i   010000 ..... ..... .... 0000000 ..... @opi
+#
+# which will, in part, invoke
+#
+#   trans_addl_r(ctx, &arg_opr, insn)
+# and
+#   trans_addl_i(ctx, &arg_opi, insn)
+#
+
+import io
+import re
+import sys
+import getopt
+import pdb
+
+# ??? Parameterize insn_width from 32.
+fields = {}
+arguments = {}
+formats = {}
+patterns = []
+
+translate_prefix = 'trans'
+output_file = sys.stdout
+
+re_ident = '[a-zA-Z][a-zA-Z0-9_]*'
+
+def error(lineno, *args):
+    if lineno:
+        r = 'error:{0}:'.format(lineno)
+    else:
+        r = 'error:'
+    for a in args:
+        r += ' ' + str(a)
+    r += '\n'
+    sys.stderr.write(r)
+    exit(1)
+
+def output(*args):
+    global output_file
+    for a in args:
+        output_file.write(a)
+
+if sys.version_info >= (3, 0):
+    re_fullmatch = re.fullmatch
+else:
+    def re_fullmatch(pat, str):
+        return re.match('^' + pat + '$', str)
+
+def output_autogen():
+    output('/* This file is autogenerated.  */\n\n')
+
+def str_indent(c):
+    """Return a string with C spaces"""
+    r = ''
+    for i in range(0, c):
+        r += ' '
+    return r
+
+def str_fields(fields):
+    """Return a string uniquely identifing FIELDS"""
+
+    r = ''
+    for n in sorted(fields.keys()):
+        r += '_' + n;
+    return r[1:]
+
+def str_match_bits(bits, mask):
+    """Return a string pretty-printing BITS/MASK"""
+    i = 0x80000000
+    space = 0x01010100
+    r = ''
+    while i != 0:
+        if i & mask:
+            if i & bits:
+                r += '1'
+            else:
+                r += '0'
+        else:
+            r += '.'
+        if i & space:
+            r += ' '
+        i >>= 1
+    return r
+
+def is_pow2(bits):
+    return (bits & (bits - 1)) == 0
+
+def popcount(b):
+    b = (b & 0x55555555) + ((b >> 1) & 0x55555555)
+    b = (b & 0x33333333) + ((b >> 2) & 0x33333333)
+    b = (b & 0x0f0f0f0f) + ((b >> 4) & 0x0f0f0f0f)
+    b = (b + (b >> 8)) & 0x00ff00ff
+    b = (b + (b >> 16)) & 0xffff
+    return b
+
+def ctz(b):
+    r = 0
+    while ((b >> r) & 1) == 0:
+        r += 1
+    return r
+
+def is_contiguous(bits):
+    shift = ctz(bits)
+    if is_pow2((bits >> shift) + 1):
+        return shift
+    else:
+        return -1
+
+def bit_iterate(bits):
+    iter = 0
+    yield iter
+    if bits == 0:
+        return
+    while True:
+        this = bits
+        while True:
+            lsb = this & -this
+            if iter & lsb:
+                iter ^= lsb
+                this ^= lsb
+            else:
+                iter ^= lsb
+                break
+            if this == 0:
+                return
+        yield iter
+
+def eq_fields_for_args(flds_a, flds_b):
+    if len(flds_a) != len(flds_b):
+        return False
+    for k, a in flds_a.items():
+        if k not in flds_b:
+            return False
+    return True
+
+def eq_fields_for_fmts(flds_a, flds_b):
+    if len(flds_a) != len(flds_b):
+        return False
+    for k, a in flds_a.items():
+        if k not in flds_b:
+            return False
+        b = flds_b[k]
+        if a.__class__ != b.__class__ or a != b:
+            return False
+    return True
+
+class Field:
+    """Class representing a simple instruction field"""
+    def __init__(self, sign, pos, len):
+        self.sign = sign
+        self.pos = pos
+        self.len = len
+        self.mask = ((1 << len) - 1) << pos
+
+    def __str__(self):
+        if self.sign:
+            s = 's'
+        else:
+            s = ''
+        return str(pos) + ':' + s + str(len)
+
+    def str_extract(self):
+        if self.sign:
+            extr = 'sextract32'
+        else:
+            extr = 'extract32'
+        return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
+
+    def __eq__(self, other):
+        return self.sign == other.sign and self.sign == other.sign
+
+    def __ne__(self, other):
+        return not self.__eq__(other)
+# end Field
+
+class MultiField:
+    """Class representing a compound instruction field"""
+    def __init__(self, subs):
+        self.subs = subs
+        self.sign = subs[0].sign
+        mask = 0
+        for s in subs:
+            mask |= s.mask
+        self.mask = mask
+
+    def __str__(self):
+        return str(self.subs)
+
+    def str_extract(self):
+        ret = '0'
+        pos = 0
+        for f in reversed(self.subs):
+            if pos == 0:
+                ret = f.str_extract()
+            else:
+                ret = 'deposit32({0}, {1}, {2}, {3})'.format(ret, pos, 32 - pos, f.str_extract())
+            pos += f.len
+        return ret
+
+    def __ne__(self, other):
+        if len(self.subs) != len(other.subs):
+            return True
+        for a, b in zip(self.subs, other.subs):
+            if a.__class__ != b.__class__ or a != b:
+                return True
+        return False;
+
+    def __eq__(self, other):
+        return not self.__ne__(other)
+# end MultiField
+
+class ConstField:
+    """Class representing an argument field with constant value"""
+    def __init__(self, value):
+        self.value = value
+        self.mask = 0
+        self.sign = value < 0
+
+    def __str__(self):
+        return str(self.value)
+
+    def str_extract(self):
+        return str(self.value)
+
+    def __cmp__(self, other):
+        return self.value - other.value
+# end ConstField
+
+class FunctionField:
+    """Class representing a field passed through an expander"""
+    def __init__(self, func, base):
+        self.mask = base.mask
+        self.sign = base.sign
+        self.base = base
+        self.func = func
+
+    def __str__(self):
+        return self.func + '(' + str(self.base) + ')'
+
+    def str_extract(self):
+        return self.func + '(' + self.base.str_extract() + ')'
+
+    def __eq__(self, other):
+        return self.func == other.func and self.base == other.base
+    def __ne__(self, other):
+        return not self.__eq__(other)
+# end FunctionField
+
+class Arguments:
+    """Class representing the extracted fields of a format"""
+    def __init__(self, nm, flds):
+        self.name = nm
+        self.fields = sorted(flds)
+
+    def __str__(self):
+        return self.name + ' ' + str(self.fields)
+
+    def struct_name(self):
+        return 'arg_' + self.name
+
+    def output_def(self):
+        output('typedef struct {\n')
+        for n in self.fields:
+            output('    int ', n, ';\n')
+        output('} ', self.struct_name(), ';\n\n')
+# end Arguments
+
+class General:
+    """Common code between instruction formats and instruction patterns"""
+    def __init__(self, name, base, fixb, fixm, fldm, flds):
+        self.name = name
+        self.base = base
+        self.fixedbits = fixb
+        self.fixedmask = fixm
+        self.fieldmask = fldm
+        self.fields = flds
+
+    def __str__(self):
+        r = self.name
+        if self.base:
+            r = r + ' ' + self.base.name
+        else:
+            r = r + ' ' + str(self.fields)
+        r = r + ' ' + str_match_bits(self.fixedbits, self.fixedmask)
+        return r
+
+    def str1(self, i):
+        return str_indent(i) + self.__str__()
+# end General
+
+class Format(General):
+    """Class representing an instruction format"""
+
+    def extract_name(self):
+        return 'extract_' + self.name
+
+    def output_extract(self):
+        output('static void ', self.extract_name(), '(',
+               self.base.struct_name(), ' *a, uint32_t insn)\n{\n')
+        for n, f in self.fields.items():
+            output('    a->', n, ' = ', f.str_extract(), ';\n')
+        output('}\n\n')
+# end Format
+
+class Pattern(General):
+    """Class representing an instruction pattern"""
+
+    def output_decl(self):
+        global translate_prefix
+        output('typedef ', self.base.base.struct_name(),
+               ' arg_', self.name, ';\n')
+        output('void ', translate_prefix, '_', self.name,
+               '(DisasContext *ctx, arg_', self.name,
+               ' *a, uint32_t insn);\n')
+
+    def output_code(self, i, extracted, outerbits, outermask):
+        global translate_prefix
+        ind = str_indent(i)
+        arg = self.base.base.name
+        if not extracted:
+            output(ind, self.base.extract_name(), '(&u.f_', arg, ', insn);\n')
+        for n, f in self.fields.items():
+            output(ind, 'u.f_', arg, '.', n, ' = ', f.str_extract(), ';\n')
+        output(ind, translate_prefix, '_', self.name,
+               '(ctx, &u.f_', arg, ', insn);\n')
+        output(ind, 'return true;\n')
+# end Pattern
+
+def parse_field(lineno, name, toks):
+    """Parse one instruction field from TOKS at LINENO"""
+    global fields
+    global re_ident
+
+    # A "simple" field will have only one entry; a "multifield" will have several.
+    subs = []
+    width = 0
+    func = None
+    for t in toks:
+        if re_fullmatch('!function=' + re_ident, t):
+            if func:
+                error(lineno, 'duplicate function')
+            func = t.split('=')
+            func = func[1]
+            continue
+
+        if re_fullmatch('[0-9]+:s[0-9]+', t):
+            # Signed field extract
+            subtoks = t.split(':s')
+            sign = True
+        elif re_fullmatch('[0-9]+:[0-9]+', t):
+            # Unsigned field extract
+            subtoks = t.split(':')
+            sign = False
+        else:
+            error(lineno, 'invalid field token "{0}"'.format(t))
+        p = int(subtoks[0])
+        l = int(subtoks[1])
+        if p + l > 32:
+            error(lineno, 'field {0} too large'.format(t))
+        f = Field(sign, p, l)
+        subs.append(f)
+        width += l
+
+    if width > 32:
+        error(lineno, 'field too large')
+    if len(subs) == 1:
+        f = subs[0]
+    else:
+        f = MultiField(subs)
+    if func:
+        f = FunctionField(func, f)
+
+    if name in fields:
+        error(lineno, 'duplicate field', name)
+    fields[name] = f
+# end parse_field
+
+def parse_arguments(lineno, name, toks):
+    """Parse one argument set from TOKS at LINENO"""
+    global arguments
+    global re_ident
+
+    flds = []
+    for t in toks:
+        if not re_fullmatch(re_ident, t):
+            error(lineno, 'invalid argument set token "{0}"'.format(t))
+        flds.append(t)
+
+    if name in arguments:
+        error(lineno, 'duplicate argument set', name)
+    arguments[name] = Arguments(name, flds)
+# end parse_arguments
+
+def lookup_field(lineno, name):
+    global fields
+    if name in fields:
+        return fields[name]
+    error(lineno, 'undefined field', name)
+
+def add_field(lineno, flds, new_name, f):
+    if new_name in flds:
+        error(lineno, 'duplicate field', new_name)
+    flds[new_name] = f
+    return flds
+
+def add_field_byname(lineno, flds, new_name, old_name):
+    return add_field(lineno, flds, new_name, lookup_field(lineno, old_name))
+
+def infer_argument_set(flds):
+    global arguments
+
+    for arg in arguments.values():
+        if eq_fields_for_args(flds, arg.fields):
+            return arg
+
+    name = str(len(arguments))
+    arg = Arguments(name, flds.keys())
+    arguments[name] = arg
+    return arg
+
+def infer_format(arg, fieldmask, flds):
+    global arguments
+    global formats
+
+    const_flds = {}
+    var_flds = {}
+    for n, c in flds.items():
+        if c is ConstField:
+            const_flds[n] = c
+        else:
+            var_flds[n] = c
+
+    # Look for an existing format with the same argument set and fields
+    for fmt in formats.values():
+        if arg and fmt.base != arg:
+            continue
+        if fieldmask != fmt.fieldmask:
+            continue
+        if not eq_fields_for_fmts(flds, fmt.fields):
+            continue
+        return (fmt, const_flds)
+
+    name = 'Fmt_' + str(len(formats))
+    if not arg:
+        arg = infer_argument_set(flds)
+
+    fmt = Format(name, arg, 0, 0, fieldmask, var_flds)
+    formats[name] = fmt
+
+    return (fmt, const_flds)
+# end infer_format
+
+def parse_generic(lineno, is_format, name, toks):
+    """Parse one instruction format from TOKS at LINENO"""
+    global fields
+    global arguments
+    global formats
+    global patterns
+    global re_ident
+
+    fixedmask = 0
+    fixedbits = 0
+    width = 0
+    flds = {}
+    arg = None
+    fmt = None
+    for t in toks:
+        # '&Foo' gives a format an explcit argument set.
+        if t[0] == '&':
+            tt = t[1:]
+            if arg:
+                error(lineno, 'multiple argument sets')
+            if tt in arguments:
+                arg = arguments[tt]
+            else:
+                error(lineno, 'undefined argument set', t)
+            continue
+
+        # '@Foo' gives a pattern an explicit format.
+        if t[0] == '@':
+            tt = t[1:]
+            if fmt:
+                error(lineno, 'multiple formats')
+            if tt in formats:
+                fmt = formats[tt]
+            else:
+                error(lineno, 'undefined format', t)
+            continue
+
+        # '%Foo' imports a field.
+        if t[0] == '%':
+            tt = t[1:]
+            flds = add_field_byname(lineno, flds, tt, tt)
+            continue
+
+        # 'Foo=%Bar' imports a field with a different name.
+        if re_fullmatch(re_ident + '=%' + re_ident, t):
+            (fname, iname) = t.split('=%')
+            flds = add_field_byname(lineno, flds, fname, iname)
+            continue
+
+        # 'Foo=number' sets an argument field to a constant value
+        if re_fullmatch(re_ident + '=[0-9]+', t):
+            (fname, value) = t.split('=')
+            value = int(value)
+            flds = add_field(lineno, flds, fname, ConstField(value))
+            continue
+
+        # Pattern of 0s, 1s and dots indicate required zeros,
+        # required ones, or dont-cares.
+        if re_fullmatch('[01.]+', t):
+            shift = len(t)
+            fms = t.replace('0','1')
+            fms = fms.replace('.','0')
+            fbs = t.replace('.','0')
+            fms = int(fms, 2)
+            fbs = int(fbs, 2)
+            fixedbits = (fixedbits << shift) | fbs
+            fixedmask = (fixedmask << shift) | fms
+        # Otherwise, fieldname:fieldwidth
+        elif re_fullmatch(re_ident + ':s?[0-9]+', t):
+            (fname, flen) = t.split(':')
+            sign = False;
+            if flen[0] == 's':
+                sign = True
+                flen = flen[1:]
+            shift = int(flen, 10)
+            f = Field(sign, 32 - width - shift, shift)
+            flds = add_field(lineno, flds, fname, f)
+            fixedbits <<= shift
+            fixedmask <<= shift
+        else:
+            error(lineno, 'invalid token "{0}"'.format(t))
+        width += shift
+
+    # We should have filled in all of the bits of the instruction.
+    if width != 32:
+        error(lineno, 'definition has {0} bits'.format(width))
+
+    # The fields that we add, or import, cannot overlap bits that we specify
+    fieldmask = 0
+    for f in flds.values():
+        fieldmask |= f.mask
+
+    # Fix up what we've parsed to match either a format or a pattern.
+    if is_format:
+        # Formats cannot reference formats.
+        if fmt:
+            error(lineno, 'format referencing format')
+        # If an argument set is given, then there should be no fields
+        # without a place to store it.
+        if arg:
+            for f in flds.keys():
+                if f not in arg.fields:
+                    error(lineno, 'field {0} not in argument set {1}'.format(f, arg.name))
+        else:
+            arg = infer_argument_set(flds)
+        if name in formats:
+            error(lineno, 'duplicate format name', name)
+        fmt = Format(name, arg, fixedbits, fixedmask, fieldmask, flds)
+        formats[name] = fmt
+    else:
+        # Patterns can reference a format ...
+        if fmt:
+            # ... but not an argument simultaneously
+            if arg:
+                error(lineno, 'pattern specifies both format and argument set')
+            fieldmask |= fmt.fieldmask
+            fixedbits |= fmt.fixedbits
+            fixedmask |= fmt.fixedmask
+        else:
+            (fmt, flds) = infer_format(arg, fieldmask, flds)
+        arg = fmt.base
+        for f in flds.keys():
+            if f not in arg.fields:
+                error(lineno, 'field {0} not in argument set {1}'.format(f, arg.name))
+        pat = Pattern(name, fmt, fixedbits, fixedmask, fieldmask, flds)
+        patterns.append(pat)
+
+    if fieldmask & fixedmask:
+        error(lineno, 'fieldmask overlaps fixedmask (0x{0:08x} & 0x{1:08x})'.format(fieldmask, fixedmask))
+# end parse_general
+
+def parse_file(f):
+    """Parse all of the patterns within a file"""
+
+    # Read all of the lines of the file.  Concatenate lines
+    # ending in backslash; discard empty lines and comments.
+    toks = []
+    lineno = 0
+    for line in f:
+        lineno += 1
+
+	# Discard comments
+        end = line.find('#')
+        if end >= 0:
+            line = line[:end]
+
+        t = line.split()
+        if len(toks) != 0:
+            # Next line after continuation
+            toks.extend(t)
+        elif len(t) == 0:
+            # Empty line
+            continue
+        else:
+            toks = t
+
+        # Continuation?
+        if toks[-1] == '\\':
+            toks.pop()
+            continue
+
+        if len(toks) < 2:
+            error(lineno, 'short line')
+
+        name = toks[0]
+        del toks[0]
+
+        # Determine the type of object needing to be parsed.
+        if name[0] == '%':
+            parse_field(lineno, name[1:], toks)
+        elif name[0] == '&':
+            parse_arguments(lineno, name[1:], toks)
+        elif name[0] == '@':
+            parse_generic(lineno, True, name[1:], toks)
+        else:
+            parse_generic(lineno, False, name, toks)
+        toks = []
+# end parse_file
+
+class Tree:
+    """Class representing a node in a decode tree"""
+
+    def __init__(self, fm, tm):
+        self.fixedmask = fm
+        self.thismask = tm
+        self.subs = []
+        self.base = None
+
+    def str1(self, i):
+        ind = str_indent(i)
+        r = '{0}{1:08x}'.format(ind, self.fixedmask)
+        if self.format:
+            r += ' ' + self.format.name
+        r += ' [\n'
+        for (b, s) in self.subs:
+            r += '{0}  {1:08x}:\n'.format(ind, b)
+            r += s.str1(i + 4) + '\n'
+        r += ind + ']'
+        return r
+
+    def __str__(self):
+        return self.str1(0)
+
+    def output_code(self, i, extracted, outerbits, outermask):
+        ind = str_indent(i)
+
+        # If we identified all nodes below have the same format,
+        # extract the fields now.
+        if not extracted and self.base:
+            output(ind, self.base.extract_name(),
+                   '(&u.f_', self.base.base.name, ', insn);\n')
+            extracted = True
+
+        # Attempt to aid the compiler in producing compact switch statements.
+        # If the bits in the mask are contiguous, extract them.
+        sh = is_contiguous(self.thismask)
+        if sh > 0:
+            str_switch = lambda b: \
+                '(insn >> {0}) & 0x{1:x}'.format(sh, b >> sh)
+            str_case = lambda b: '0x{0:x}'.format(b >> sh)
+        else:
+            str_switch = lambda b: 'insn & 0x{0:08x}'.format(b)
+            str_case = lambda b: '0x{0:08x}'.format(b)
+
+        output(ind, 'switch (', str_switch(self.thismask), ') {\n')
+        for b, s in sorted(self.subs):
+            rept = self.thismask & ~s.fixedmask
+            innermask = outermask | (self.thismask & ~rept)
+            innerbits = outerbits | b
+            for bb in bit_iterate(rept):
+                output(ind, 'case ', str_case(b | bb), ':\n')
+            output(ind, '    /* ',
+                   str_match_bits(innerbits, innermask), ' */\n')
+            s.output_code(i + 4, extracted, innerbits, innermask)
+        output(ind, '}\n')
+        output(ind, 'return false;\n')
+# end Tree
+
+def build_tree(pats, outerbits, outermask):
+    # Find the intersection of all remaining fixedmask.
+    innermask = ~outermask
+    for i in pats:
+        innermask &= i.fixedmask
+
+    if innermask == 0:
+        pnames = []
+        for p in pats:
+            pnames.append(p.name)
+        #pdb.set_trace()
+        error(0, 'overlapping patterns:', pnames)
+
+    fullmask = outermask | innermask
+    extramask = 0
+
+    # If there are few enough items, see how many undecoded bits remain.
+    # Otherwise, attempt to avoid a subsequent Tree level testing one bit.
+    if len(pats) < 8:
+        for i in pats:
+            extramask |= i.fixedmask & ~fullmask
+    else:
+        for i in pats:
+            e = i.fixedmask & ~fullmask
+            if e != 0 and popcount(e) <= 2:
+                extramask |= e
+
+    if popcount(extramask) < 4:
+        innermask |= extramask
+        fullmask |= extramask
+
+    # Sort each element of pats into the bin selected by the mask.
+    bins = {}
+    for i in pats:
+        fb = i.fixedbits & innermask
+        if fb in bins:
+            bins[fb].append(i)
+        else:
+            bins[fb] = [i]
+
+    # We must recurse if any bin has more than one element or if
+    # the single element in the bin has not been fully matched.
+    t = Tree(fullmask, innermask)
+
+    for b, l in bins.items():
+        s = l[0]
+        if len(l) > 1 or s.fixedmask & ~fullmask != 0:
+            s = build_tree(l, b | outerbits, fullmask)
+        t.subs.append((b, s))
+
+    return t
+# end build_tree
+
+def prop_format(tree):
+    """Propagate Format objects into the decode tree"""
+
+    # Depth first search.
+    for (b, s) in tree.subs:
+        if isinstance(s, Tree):
+            prop_format(s)
+
+    # If all entries in SUBS have the same format, then
+    # propagate that into the tree.
+    f = None
+    for (b, s) in tree.subs:
+        if f is None:
+            f = s.base
+            if f is None:
+                return
+        if f is not s.base:
+            return
+    tree.base = f
+# end prop_format
+
+
+def main():
+    global arguments
+    global formats
+    global patterns
+    global translate_prefix
+    global output_file
+
+    h_file = None
+    c_file = None
+    decode_function = 'decode'
+
+    long_opts = [ 'decode=', 'translate=', 'header=', 'output=' ]
+    try:
+        (opts, args) = getopt.getopt(sys.argv[1:], 'h:o:', long_opts)
+    except getopt.GetoptError as err:
+        error(0, err)
+    for o, a in opts:
+        if o in ('-h', '--header'):
+            h_file = a
+        elif o in ('-o', '--output'):
+            c_file = a
+        elif o == '--decode':
+            decode_function = a
+        elif o == '--translate':
+            translate_prefix = a
+        else:
+            assert False, 'unhandled option'
+
+    if len(args) < 1:
+        error(0, 'missing input file')
+    f = open(args[0], 'r')
+    parse_file(f)
+    f.close()
+
+    t = build_tree(patterns, 0, 0)
+    prop_format(t)
+
+    if h_file:
+        output_file = open(h_file, 'w')
+    elif c_file:
+        output_file = open(c_file, 'w')
+    else:
+        output_file = sys.stdout
+
+    output_autogen()
+    for n in sorted(arguments.keys()):
+        f = arguments[n]
+        f.output_def()
+
+    if h_file:
+        output('bool ', decode_function,
+               '(DisasContext *ctx, uint32_t insn);\n\n')
+
+    # A single translate function can be invoked for different patterns.
+    # Make sure that the argument sets are the same, and declare the
+    # function only once.
+    out_pats = {}
+    for i in patterns:
+        if i.name in out_pats:
+            p = out_pats[i.name]
+            if i.base.base != p.base.base:
+                error(0, i.name, ' has conflicting argument sets')
+        else:
+            i.output_decl()
+            out_pats[i.name] = i
+
+    if h_file:
+        output_file.close()
+        if c_file:
+            output_file = open(c_file, 'w')
+            output_autogen()
+
+    for n in sorted(formats.keys()):
+        f = formats[n]
+        f.output_extract()
+
+    output('bool ', decode_function,
+           '(DisasContext *ctx, uint32_t insn)\n{\n')
+
+    i4 = str_indent(4)
+    output(i4, 'union {\n')
+    for n in sorted(arguments.keys()):
+        f = arguments[n]
+        output(i4, i4, f.struct_name(), ' f_', f.name, ';\n')
+    output(i4, '} u;\n\n')
+
+    t.output_code(4, False, 0, 0)
+
+    output('}\n')
+
+    if c_file:
+        output_file.close()
+#end main
+
+if __name__ == '__main__':
+    main()
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2018-01-11 18:20   ` Peter Maydell
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 03/23] target/arm: Implement SVE Bitwise Logical - Unpredicated Group Richard Henderson
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Including only 4, as-yet unimplemented, instruction patterns
so that the whole thing compiles.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.h | 111 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c |  91 +++++++------------------------------
 target/arm/translate-sve.c |  48 ++++++++++++++++++++
 .gitignore                 |   1 +
 target/arm/Makefile.objs   |  11 +++++
 target/arm/sve.def         |  45 ++++++++++++++++++
 6 files changed, 233 insertions(+), 74 deletions(-)
 create mode 100644 target/arm/translate-a64.h
 create mode 100644 target/arm/translate-sve.c
 create mode 100644 target/arm/sve.def

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
new file mode 100644
index 0000000000..9014b5bf8b
--- /dev/null
+++ b/target/arm/translate-a64.h
@@ -0,0 +1,111 @@
+/*
+ *  AArch64 translation, common definitions.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TARGET_ARM_TRANSLATE_A64_H
+#define TARGET_ARM_TRANSLATE_A64_H
+
+void unallocated_encoding(DisasContext *s);
+
+#define unsupported_encoding(s, insn)                                    \
+    do {                                                                 \
+        qemu_log_mask(LOG_UNIMP,                                         \
+                      "%s:%d: unsupported instruction encoding 0x%08x "  \
+                      "at pc=%016" PRIx64 "\n",                          \
+                      __FILE__, __LINE__, insn, s->pc - 4);              \
+        unallocated_encoding(s);                                         \
+    } while (0);
+
+TCGv_i64 new_tmp_a64(DisasContext *s);
+TCGv_i64 new_tmp_a64_zero(DisasContext *s);
+TCGv_i64 cpu_reg(DisasContext *s, int reg);
+TCGv_i64 cpu_reg_sp(DisasContext *s, int reg);
+TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf);
+TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf);
+
+/* We should have at some point before trying to access an FP register
+ * done the necessary access check, so assert that
+ * (a) we did the check and
+ * (b) we didn't then just plough ahead anyway if it failed.
+ * Print the instruction pattern in the abort message so we can figure
+ * out what we need to fix if a user encounters this problem in the wild.
+ */
+static inline void assert_fp_access_checked(DisasContext *s)
+{
+#ifdef CONFIG_DEBUG_TCG
+    if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
+        fprintf(stderr, "target-arm: FP access check missing for "
+                "instruction 0x%08x\n", s->insn);
+        abort();
+    }
+#endif
+}
+
+/* Return the offset into CPUARMState of an element of specified
+ * size, 'element' places in from the least significant end of
+ * the FP/vector register Qn.
+ */
+static inline int vec_reg_offset(DisasContext *s, int regno,
+                                 int element, TCGMemOp size)
+{
+    int offs = 0;
+#ifdef HOST_WORDS_BIGENDIAN
+    /* This is complicated slightly because vfp.zregs[n].d[0] is
+     * still the low half and vfp.zregs[n].d[1] the high half
+     * of the 128 bit vector, even on big endian systems.
+     * Calculate the offset assuming a fully bigendian 128 bits,
+     * then XOR to account for the order of the two 64 bit halves.
+     */
+    offs += (16 - ((element + 1) * (1 << size)));
+    offs ^= 8;
+#else
+    offs += element * (1 << size);
+#endif
+    offs += offsetof(CPUARMState, vfp.zregs[regno]);
+    assert_fp_access_checked(s);
+    return offs;
+}
+
+/* Return the offset info CPUARMState of the "whole" vector register Qn.  */
+static inline int vec_full_reg_offset(DisasContext *s, int regno)
+{
+    assert_fp_access_checked(s);
+    return offsetof(CPUARMState, vfp.zregs[regno]);
+}
+
+/* Return the offset info CPUARMState of the predicate vector register Pn.
+ * Note for this purpose, FFR is P16.  */
+static inline int pred_full_reg_offset(DisasContext *s, int regno)
+{
+    assert_fp_access_checked(s);
+    return offsetof(CPUARMState, vfp.pregs[regno]);
+}
+
+/* Return the byte size of the "whole" vector register, VL / 8.  */
+static inline int vec_full_reg_size(DisasContext *s)
+{
+    return s->sve_len;
+}
+
+/* Return the byte size of the whole predicate register, VL / 64.  */
+static inline int pred_full_reg_size(DisasContext *s)
+{
+    return s->sve_len >> 3;
+}
+
+bool disas_sve(DisasContext *, uint32_t);
+
+#endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index ecb72e4d9c..8be1660661 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -36,13 +36,13 @@
 #include "exec/log.h"
 
 #include "trace-tcg.h"
+#include "translate-a64.h"
 
 static TCGv_i64 cpu_X[32];
 static TCGv_i64 cpu_pc;
 
 /* Load/store exclusive handling */
 static TCGv_i64 cpu_exclusive_high;
-static TCGv_i64 cpu_reg(DisasContext *s, int reg);
 
 static const char *regnames[] = {
     "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7",
@@ -390,22 +390,13 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
     }
 }
 
-static void unallocated_encoding(DisasContext *s)
+void unallocated_encoding(DisasContext *s)
 {
     /* Unallocated and reserved encodings are uncategorized */
     gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
                        default_exception_el(s));
 }
 
-#define unsupported_encoding(s, insn)                                    \
-    do {                                                                 \
-        qemu_log_mask(LOG_UNIMP,                                         \
-                      "%s:%d: unsupported instruction encoding 0x%08x "  \
-                      "at pc=%016" PRIx64 "\n",                          \
-                      __FILE__, __LINE__, insn, s->pc - 4);              \
-        unallocated_encoding(s);                                         \
-    } while (0);
-
 static void init_tmp_a64_array(DisasContext *s)
 {
 #ifdef CONFIG_DEBUG_TCG
@@ -423,13 +414,13 @@ static void free_tmp_a64(DisasContext *s)
     init_tmp_a64_array(s);
 }
 
-static TCGv_i64 new_tmp_a64(DisasContext *s)
+TCGv_i64 new_tmp_a64(DisasContext *s)
 {
     assert(s->tmp_a64_count < TMP_A64_MAX);
     return s->tmp_a64[s->tmp_a64_count++] = tcg_temp_new_i64();
 }
 
-static TCGv_i64 new_tmp_a64_zero(DisasContext *s)
+TCGv_i64 new_tmp_a64_zero(DisasContext *s)
 {
     TCGv_i64 t = new_tmp_a64(s);
     tcg_gen_movi_i64(t, 0);
@@ -451,7 +442,7 @@ static TCGv_i64 new_tmp_a64_zero(DisasContext *s)
  * to cpu_X[31] and ZR accesses to a temporary which can be discarded.
  * This is the point of the _sp forms.
  */
-static TCGv_i64 cpu_reg(DisasContext *s, int reg)
+TCGv_i64 cpu_reg(DisasContext *s, int reg)
 {
     if (reg == 31) {
         return new_tmp_a64_zero(s);
@@ -461,7 +452,7 @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg)
 }
 
 /* register access for when 31 == SP */
-static TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
+TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
 {
     return cpu_X[reg];
 }
@@ -470,7 +461,7 @@ static TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
  * representing the register contents. This TCGv is an auto-freed
  * temporary so it need not be explicitly freed, and may be modified.
  */
-static TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
+TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
 {
     TCGv_i64 v = new_tmp_a64(s);
     if (reg != 31) {
@@ -485,7 +476,7 @@ static TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
     return v;
 }
 
-static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
+TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
 {
     TCGv_i64 v = new_tmp_a64(s);
     if (sf) {
@@ -496,62 +487,6 @@ static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
     return v;
 }
 
-/* We should have at some point before trying to access an FP register
- * done the necessary access check, so assert that
- * (a) we did the check and
- * (b) we didn't then just plough ahead anyway if it failed.
- * Print the instruction pattern in the abort message so we can figure
- * out what we need to fix if a user encounters this problem in the wild.
- */
-static inline void assert_fp_access_checked(DisasContext *s)
-{
-#ifdef CONFIG_DEBUG_TCG
-    if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
-        fprintf(stderr, "target-arm: FP access check missing for "
-                "instruction 0x%08x\n", s->insn);
-        abort();
-    }
-#endif
-}
-
-/* Return the offset into CPUARMState of an element of specified
- * size, 'element' places in from the least significant end of
- * the FP/vector register Qn.
- */
-static inline int vec_reg_offset(DisasContext *s, int regno,
-                                 int element, TCGMemOp size)
-{
-    int offs = 0;
-#ifdef HOST_WORDS_BIGENDIAN
-    /* This is complicated slightly because vfp.zregs[n].d[0] is
-     * still the low half and vfp.zregs[n].d[1] the high half
-     * of the 128 bit vector, even on big endian systems.
-     * Calculate the offset assuming a fully bigendian 128 bits,
-     * then XOR to account for the order of the two 64 bit halves.
-     */
-    offs += (16 - ((element + 1) * (1 << size)));
-    offs ^= 8;
-#else
-    offs += element * (1 << size);
-#endif
-    offs += offsetof(CPUARMState, vfp.zregs[regno]);
-    assert_fp_access_checked(s);
-    return offs;
-}
-
-/* Return the offset info CPUARMState of the "whole" vector register Qn.  */
-static inline int vec_full_reg_offset(DisasContext *s, int regno)
-{
-    assert_fp_access_checked(s);
-    return offsetof(CPUARMState, vfp.zregs[regno]);
-}
-
-/* Return the byte size of the "whole" vector register, VL / 8.  */
-static inline int vec_full_reg_size(DisasContext *s)
-{
-    return s->sve_len;
-}
-
 /* Return a newly allocated pointer to the vector register.  */
 static TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
 {
@@ -12705,7 +12640,15 @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
     s->fp_access_checked = false;
 
     switch (extract32(insn, 25, 4)) {
-    case 0x0: case 0x1: case 0x2: case 0x3: /* UNALLOCATED */
+    case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
+        unallocated_encoding(s);
+        break;
+    case 0x2:
+        if (arm_dc_feature(s, ARM_FEATURE_SVE)) {
+            if (!fp_access_check(s) || disas_sve(s, insn)) {
+                break;
+            }
+        }
         unallocated_encoding(s);
         break;
     case 0x8: case 0x9: /* Data processing - immediate */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
new file mode 100644
index 0000000000..67ad94e310
--- /dev/null
+++ b/target/arm/translate-sve.c
@@ -0,0 +1,48 @@
+/*
+ *  AArch64 SVE translation
+ *
+ *  Copyright (c) 2017 Linaro, Ltd
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "tcg-op.h"
+#include "tcg-op-gvec.h"
+#include "qemu/log.h"
+#include "arm_ldst.h"
+#include "translate.h"
+#include "internals.h"
+#include "exec/helper-proto.h"
+#include "exec/helper-gen.h"
+#include "exec/log.h"
+#include "trace-tcg.h"
+#include "translate-a64.h"
+
+/*
+ * Include the generated decoder.
+ */
+
+#include "decode-sve.inc.c"
+
+/*
+ * Implement all of the translator functions referenced by the decoder.
+ */
+
+void trans_AND_zzz(DisasContext *s, arg_AND_zzz *a, uint32_t insn) { unsupported_encoding(s, insn); }
+void trans_ORR_zzz(DisasContext *s, arg_ORR_zzz *a, uint32_t insn) { unsupported_encoding(s, insn); }
+void trans_EOR_zzz(DisasContext *s, arg_EOR_zzz *a, uint32_t insn) { unsupported_encoding(s, insn); }
+void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn) { unsupported_encoding(s, insn); }
diff --git a/.gitignore b/.gitignore
index 588769b250..e5fc04de07 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,3 +140,4 @@ trace-dtrace-root.h
 trace-dtrace-root.dtrace
 trace-ust-all.h
 trace-ust-all.c
+/target/arm/decode-sve.inc.c
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index c2d32988f9..d1ca1f799b 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -10,3 +10,14 @@ obj-y += gdbstub.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
 obj-y += crypto_helper.o
 obj-$(CONFIG_SOFTMMU) += arm-powerctl.o
+
+DECODETREE = $(SRC_PATH)/scripts/decodetree.py
+
+target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.def $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) -o $@ --decode disas_sve \
+		$(SRC_PATH)/target/arm/sve.def || rm -f $@, \
+		"GEN", $@)
+
+target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+obj-$(TARGET_AARCH64) += translate-sve.o
diff --git a/target/arm/sve.def b/target/arm/sve.def
new file mode 100644
index 0000000000..0f47a21ef0
--- /dev/null
+++ b/target/arm/sve.def
@@ -0,0 +1,45 @@
+# AArch64 SVE instruction descriptions
+#
+#  Copyright (c) 2017 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+###########################################################################
+# Named attribute sets.  These are used to make nice(er) names
+# when creating helpers common to those for the individual
+# instruction patterns.
+
+&rrr_esz		rd rn rm esz
+
+###########################################################################
+# Named instruction formats.  These are generally used to
+# reduce the amount of duplication between instruction patterns.
+
+# Three operand with unused vector element size
+@rd_rn_rm		........ ... rm:5  ... ...  rn:5 rd:5		&rrr_esz esz=0
+
+###########################################################################
+# Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
+
+### SVE Logical - Unpredicated Group
+
+# SVE bitwise logical operations (unpredicated)
+AND_zzz			00000100 00 1 ..... 001 100 ..... .....		@rd_rn_rm
+ORR_zzz			00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm
+EOR_zzz			00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm
+BIC_zzz			00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 03/23] target/arm: Implement SVE Bitwise Logical - Unpredicated Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 04/23] target/arm: Implement PTRUE, PFALSE, SETFFR Richard Henderson
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

These were the instructions that were stubbed out when
introducing the decode skeleton.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 61 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 57 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 67ad94e310..43420fa124 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -32,6 +32,10 @@
 #include "trace-tcg.h"
 #include "translate-a64.h"
 
+typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
+typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
+
 /*
  * Include the generated decoder.
  */
@@ -42,7 +46,56 @@
  * Implement all of the translator functions referenced by the decoder.
  */
 
-void trans_AND_zzz(DisasContext *s, arg_AND_zzz *a, uint32_t insn) { unsupported_encoding(s, insn); }
-void trans_ORR_zzz(DisasContext *s, arg_ORR_zzz *a, uint32_t insn) { unsupported_encoding(s, insn); }
-void trans_EOR_zzz(DisasContext *s, arg_EOR_zzz *a, uint32_t insn) { unsupported_encoding(s, insn); }
-void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn) { unsupported_encoding(s, insn); }
+static unsigned size_for_gvec(unsigned s)
+{
+    if (s <= 8) {
+        return 8;
+    } else {
+        return QEMU_ALIGN_UP(s, 16);
+    }
+}
+
+static void do_genfn2(DisasContext *s, GVecGen2Fn *gvec_fn,
+                      int esz, int rd, int rn)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    gvec_fn(esz, vec_full_reg_offset(s, rd),
+            vec_full_reg_offset(s, rn), vsz, vsz);
+}
+
+static void do_genfn3(DisasContext *s, GVecGen3Fn *gvec_fn,
+                      int esz, int rd, int rn, int rm)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    gvec_fn(esz, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
+            vec_full_reg_offset(s, rm), vsz, vsz);
+}
+
+static void do_zzz_genfn(DisasContext *s, arg_rrr_esz *a, GVecGen3Fn *gvec_fn)
+{
+    do_genfn3(s, gvec_fn, a->esz, a->rd, a->rn, a->rm);
+}
+
+void trans_AND_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_and);
+}
+
+void trans_ORR_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    if (a->rn == a->rm) { /* MOV */
+        do_genfn2(s, tcg_gen_gvec_mov, 0, a->rd, a->rn);
+    } else {
+        do_genfn3(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
+    }
+}
+
+void trans_EOR_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_xor);
+}
+
+void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_andc);
+}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 04/23] target/arm: Implement PTRUE, PFALSE, SETFFR
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (2 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 03/23] target/arm: Implement SVE Bitwise Logical - Unpredicated Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 05/23] target/arm: Implement SVE predicate logical operations Richard Henderson
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 117 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.def         |  12 +++++
 2 files changed, 129 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 43420fa124..fabf6f0a67 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -99,3 +99,120 @@ void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
 {
     do_zzz_genfn(s, a, tcg_gen_gvec_andc);
 }
+
+static uint64_t pred_esz_mask[4] = {
+    0xffffffffffffffffull, 0x5555555555555555ull,
+    0x1111111111111111ull, 0x0101010101010101ull
+};
+
+/* See the ARM pseudocode DecodePredCount.  */
+static unsigned decode_pred_count(unsigned fullsz, int pattern, int esz)
+{
+    unsigned elements = fullsz >> esz;
+
+    switch (pattern) {
+    case 0x0: /* POW2 */
+        return pow2floor(elements);
+    case 0x1: /* VL1 */
+    case 0x2: /* VL2 */
+    case 0x3: /* VL3 */
+    case 0x4: /* VL4 */
+    case 0x5: /* VL5 */
+    case 0x6: /* VL6 */
+    case 0x7: /* VL7 */
+    case 0x8: /* VL8 */
+        return MIN(pattern, elements);
+    case 0x9: /* VL16 */
+    case 0xa: /* VL32 */
+    case 0xb: /* VL64 */
+    case 0xc: /* VL128 */
+    case 0xd: /* VL256 */
+        return MIN(16 << (pattern - 9), elements);
+    case 0x1d: /* MUL4 */
+        return elements - elements % 4;
+    case 0x1e: /* MUL3 */
+        return elements - elements % 3;
+    case 0x1f: /* ALL */
+        return elements;
+    default:   /* #uimm5 */
+        return 0;
+    }
+}
+
+/* For PTRUE, PTRUES, PFALSE, SETFFR.  */
+void trans_pred_set(DisasContext *s, arg_pred_set *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem, setsz, setalign, allalign, ofs;
+    uint64_t word, lastword;
+    TCGv_i64 t;
+
+    numelem = decode_pred_count(fullsz, a->pat, a->esz);
+
+    /* Determine what we must store into each bit, and how many.  */
+    if (numelem == 0 || a->i == 0) {
+        lastword = word = 0;
+        setsz = fullsz;
+    } else {
+        setsz = numelem << a->esz;
+        lastword = word = pred_esz_mask[a->esz];
+        if (setsz % 64) {
+            lastword &= ~(-1ull << (setsz % 64));
+        }
+    }
+
+    /* Rescale from bits to bytes.  */
+    fullsz /= 8;
+    setsz /= 8;
+
+    ofs = pred_full_reg_offset(s, a->rd);
+    setalign = QEMU_ALIGN_DOWN(setsz, 8);
+    allalign = QEMU_ALIGN_UP(fullsz, 16);
+
+    /* Perform the stores.  Use the vector infrastructure if the sizes
+       are large enough.  */
+    if (fullsz > 8) {
+        if (setsz >= 16 && setsz % 16 == 0) {
+            tcg_gen_gvec_dup64i(ofs, setsz, allalign, word);
+        } else if (setsz <= 8 && fullsz > 16) {
+            tcg_gen_gvec_dup64i(ofs, allalign, allalign, 0);
+        } else if (fullsz - setsz <= 8 && fullsz > 16) {
+            tcg_gen_gvec_dup64i(ofs, allalign, allalign, word);
+        } else {
+            unsigned i = 0;
+
+            t = tcg_temp_new_i64();
+            if (setalign > 0) {
+                tcg_gen_movi_i64(t, word);
+                for (; i < setalign; i += 8) {
+                    tcg_gen_st_i64(t, cpu_env, ofs + i);
+                }
+            }
+            if (lastword != word) {
+                tcg_gen_movi_i64(t, lastword);
+                tcg_gen_st_i64(t, cpu_env, ofs + i);
+                i += 8;
+            }
+            if (i < fullsz) {
+                tcg_gen_movi_i64(t, 0);
+                for (; i < fullsz; i += 8) {
+                    tcg_gen_st_i64(t, cpu_env, ofs + i);
+                }
+            }
+            tcg_temp_free_i64(t);
+            goto done;
+        }
+    }
+    t = tcg_const_i64(lastword);
+    tcg_gen_st_i64(t, cpu_env, ofs + setalign);
+    tcg_temp_free_i64(t);
+
+ done:
+    /* PTRUES */
+    if (a->s) {
+        tcg_gen_movi_i32(cpu_NF, -(lastword != 0));
+        tcg_gen_movi_i32(cpu_CF, lastword != 0);
+        tcg_gen_movi_i32(cpu_ZF, lastword == 0);
+        tcg_gen_movi_i32(cpu_VF, 0);
+    }
+}
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 0f47a21ef0..f802031f51 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -25,6 +25,7 @@
 # instruction patterns.
 
 &rrr_esz		rd rn rm esz
+&pred_set		rd pat esz i s
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -43,3 +44,14 @@ AND_zzz			00000100 00 1 ..... 001 100 ..... .....		@rd_rn_rm
 ORR_zzz			00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm
 EOR_zzz			00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm
 BIC_zzz			00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm
+
+### SVE Predicate Generation Group
+
+# SVE initialize predicate (PTRUE, PTRUES)
+pred_set		00100101 esz:2 011 00 s:1 111000 pat:5 0 rd:4	&pred_set i=1
+
+# SVE zero predicate register (PFALSE)
+pred_set		00100101 00 011 000 1110 0100 0000 rd:4		&pred_set pat=31 esz=0 i=0 s=0
+
+# SVE initialize FFR (SETFFR)
+pred_set		00100101 0010 1100 1001 0000 0000 0000		&pred_set pat=31 esz=0 rd=16 i=1 s=0
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 05/23] target/arm: Implement SVE predicate logical operations
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (3 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 04/23] target/arm: Implement PTRUE, PFALSE, SETFFR Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 06/23] target/arm: Implement SVE load vector/predicate Richard Henderson
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  37 ++++++
 target/arm/helper.h        |   1 +
 target/arm/sve_helper.c    | 126 ++++++++++++++++++
 target/arm/translate-sve.c | 314 ++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/Makefile.objs   |   2 +-
 target/arm/sve.def         |  21 +++
 6 files changed, 498 insertions(+), 3 deletions(-)
 create mode 100644 target/arm/helper-sve.h
 create mode 100644 target/arm/sve_helper.c

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
new file mode 100644
index 0000000000..4a923a33b8
--- /dev/null
+++ b/target/arm/helper-sve.h
@@ -0,0 +1,37 @@
+/*
+ *  AArch64 SVE specific helper definitions
+ *
+ *  Copyright (c) 2017 Linaro, Ltd
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sel_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orr_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orn_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_nor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_nand_pred, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_ands_pred, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bics_pred, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eors_pred, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orrs_pred, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orns_pred, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_nors_pred, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_nands_pred, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 206e39a207..3c4fca220e 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -587,4 +587,5 @@ DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG,
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
+#include "helper-sve.h"
 #endif
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
new file mode 100644
index 0000000000..5d2a6b2239
--- /dev/null
+++ b/target/arm/sve_helper.c
@@ -0,0 +1,126 @@
+/*
+ *  ARM SVE Operations
+ *
+ *  Copyright (c) 2017 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
+
+
+/* Note that vector data is stored in host-endian 64-bit chunks,
+   so addressing units smaller than that needs a host-endian fixup.  */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)  ((x) ^ 7)
+#define H2(x)  ((x) ^ 3)
+#define H4(x)  ((x) ^ 1)
+#else
+#define H1(x)  (x)
+#define H2(x)  (x)
+#define H4(x)  (x)
+#endif
+
+
+/* Given the first and last word of the result, the first and last word
+   of the governing mask, and the sum of the result, return a mask that
+   can be used to quickly set NZCV.  */
+static uint32_t predtest(uint64_t first_d, uint64_t first_g, uint64_t last_d,
+                         uint64_t last_g, uint64_t sum_d, uint64_t size_mask)
+{
+    first_g &= size_mask;
+    first_d &= first_g & -first_g;
+    last_g &= size_mask;
+    last_d &= pow2floor(last_g);
+
+    return ((first_d != 0) << 31) | ((sum_d != 0) << 1) | (last_d == 0);
+}
+
+#define LOGICAL_PRED(NAME, FUNC) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)  \
+{                                                                         \
+    uintptr_t opr_sz = simd_oprsz(desc);                                  \
+    uint64_t *d = vd, *n = vn, *m = vm, *g = vg;                          \
+    uintptr_t i;                                                          \
+    for (i = 0; i < opr_sz / 8; ++i) {                                    \
+        d[i] = FUNC(n[i], m[i], g[i]);                                    \
+    }                                                                     \
+}
+
+#define LOGICAL_PRED_FLAGS(NAME, FUNC) \
+uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t bits) \
+{                                                                            \
+    uint64_t *d = vd, *n = vn, *m = vm, *g = vg;                             \
+    uint64_t first_d = 0, first_g = 0, last_d = 0, last_g = 0, sum_d = 0;    \
+    uintptr_t i = 0;                                                         \
+    for (; i < bits / 64; ++i) {                                             \
+        last_g = g[i];                                                       \
+        d[i] = last_d = FUNC(n[i], m[i], last_g);                            \
+        sum_d |= last_d;                                                     \
+        if (i == 0) {                                                        \
+            first_g = last_g, first_d = last_d;                              \
+        }                                                                    \
+        d[i] = last_d;                                                       \
+    }                                                                        \
+    if (bits % 64) {                                                         \
+        last_g = g[i] & ~(-1ull << bits % 64);                               \
+        d[i] = last_d = FUNC(n[i], m[i], last_g);                            \
+        sum_d |= last_d;                                                     \
+        if (i == 0) {                                                        \
+            first_g = last_g, first_d = last_d;                              \
+        }                                                                    \
+    }                                                                        \
+    return predtest(first_d, first_g, last_d, last_g, sum_d, -1);            \
+}
+
+#define DO_AND(N, M, G)  (((N) & (M)) & (G))
+#define DO_BIC(N, M, G)  (((N) & ~(M)) & (G))
+#define DO_EOR(N, M, G)  (((N) ^ (M)) & (G))
+#define DO_ORR(N, M, G)  (((N) | (M)) & (G))
+#define DO_ORN(N, M, G)  (((N) | ~(M)) & (G))
+#define DO_NOR(N, M, G)  (~((N) | (M)) & (G))
+#define DO_NAND(N, M, G) (~((N) & (M)) & (G))
+#define DO_SEL(N, M, G)  (((N) & (G)) | ((M) & ~(G)))
+
+LOGICAL_PRED(sve_and_pred, DO_AND)
+LOGICAL_PRED(sve_bic_pred, DO_BIC)
+LOGICAL_PRED(sve_eor_pred, DO_EOR)
+LOGICAL_PRED(sve_sel_pred, DO_SEL)
+LOGICAL_PRED(sve_orr_pred, DO_ORR)
+LOGICAL_PRED(sve_orn_pred, DO_ORN)
+LOGICAL_PRED(sve_nor_pred, DO_NOR)
+LOGICAL_PRED(sve_nand_pred, DO_NAND)
+
+LOGICAL_PRED_FLAGS(sve_ands_pred, DO_AND)
+LOGICAL_PRED_FLAGS(sve_bics_pred, DO_BIC)
+LOGICAL_PRED_FLAGS(sve_eors_pred, DO_EOR)
+LOGICAL_PRED_FLAGS(sve_orrs_pred, DO_ORR)
+LOGICAL_PRED_FLAGS(sve_orns_pred, DO_ORN)
+LOGICAL_PRED_FLAGS(sve_nors_pred, DO_NOR)
+LOGICAL_PRED_FLAGS(sve_nands_pred, DO_NAND)
+
+#undef LOGICAL_PRED
+#undef LOGICAL_PRED_FLAGS
+#undef DO_ADD
+#undef DO_BIC
+#undef DO_EOR
+#undef DO_ORR
+#undef DO_ORN
+#undef DO_NOR
+#undef DO_NAND
+#undef DO_SEL
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index fabf6f0a67..ab03ead000 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -63,6 +63,14 @@ static void do_genfn2(DisasContext *s, GVecGen2Fn *gvec_fn,
             vec_full_reg_offset(s, rn), vsz, vsz);
 }
 
+static void do_genfn2_p(DisasContext *s, GVecGen2Fn *gvec_fn,
+                        int esz, int rd, int rn)
+{
+    unsigned vsz = size_for_gvec(pred_full_reg_size(s));
+    gvec_fn(esz, pred_full_reg_offset(s, rd),
+            pred_full_reg_offset(s, rn), vsz, vsz);
+}
+
 static void do_genfn3(DisasContext *s, GVecGen3Fn *gvec_fn,
                       int esz, int rd, int rn, int rm)
 {
@@ -71,9 +79,27 @@ static void do_genfn3(DisasContext *s, GVecGen3Fn *gvec_fn,
             vec_full_reg_offset(s, rm), vsz, vsz);
 }
 
-static void do_zzz_genfn(DisasContext *s, arg_rrr_esz *a, GVecGen3Fn *gvec_fn)
+static void do_genfn3_p(DisasContext *s, GVecGen3Fn *gvec_fn,
+                        int esz, int rd, int rn, int rm)
+{
+    unsigned vsz = size_for_gvec(pred_full_reg_size(s));
+    gvec_fn(esz, pred_full_reg_offset(s, rd), pred_full_reg_offset(s, rn),
+            pred_full_reg_offset(s, rm), vsz, vsz);
+}
+
+static void do_genop4_p(DisasContext *s, const GVecGen4 *gvec_op,
+                        int rd, int rn, int rm, int pg)
+{
+    unsigned vsz = size_for_gvec(pred_full_reg_size(s));
+    tcg_gen_gvec_4(pred_full_reg_offset(s, rd), pred_full_reg_offset(s, rn),
+                   pred_full_reg_offset(s, rm), pred_full_reg_offset(s, pg),
+                   vsz, vsz, gvec_op);
+}
+
+
+static void do_zzz_genfn(DisasContext *s, arg_rrr_esz *a, GVecGen3Fn *fn)
 {
-    do_genfn3(s, gvec_fn, a->esz, a->rd, a->rn, a->rm);
+    do_genfn3(s, fn, a->esz, a->rd, a->rn, a->rm);
 }
 
 void trans_AND_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
@@ -216,3 +242,287 @@ void trans_pred_set(DisasContext *s, arg_pred_set *a, uint32_t insn)
         tcg_gen_movi_i32(cpu_VF, 0);
     }
 }
+
+static void do_mov_p(DisasContext *s, int rd, int rn)
+{
+    do_genfn2_p(s, tcg_gen_gvec_mov, 0, rd, rn);
+}
+
+static void gen_and_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_and_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_and_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_and_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+void trans_AND_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static const GVecGen4 and_pg = {
+        .fni8 = gen_and_pg_i64,
+        .fniv = gen_and_pg_vec,
+        .fno = gen_helper_sve_and_pred,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+
+    if (a->pg == a->rn && a->rn == a->rm) {
+        do_mov_p(s, a->rd, a->rn);
+    } else if (a->pg == a->rn || a->pg == a->rm) {
+        do_genfn3_p(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->rm);
+    } else {
+        do_genop4_p(s, &and_pg, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_bic_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_andc_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_bic_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_andc_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+void trans_BIC_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static const GVecGen4 bic_pg = {
+        .fni8 = gen_bic_pg_i64,
+        .fniv = gen_bic_pg_vec,
+        .fno = gen_helper_sve_bic_pred,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+
+    if (a->pg == a->rn) {
+        do_genfn3_p(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
+    } else {
+        do_genop4_p(s, &bic_pg, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_eor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_xor_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_eor_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_xor_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+void trans_EOR_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static const GVecGen4 eor_pg = {
+        .fni8 = gen_eor_pg_i64,
+        .fniv = gen_eor_pg_vec,
+        .fno = gen_helper_sve_eor_pred,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    do_genop4_p(s, &eor_pg, a->rd, a->rn, a->rm, a->pg);
+}
+
+static void gen_sel_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_and_i64(pn, pn, pg);
+    tcg_gen_andc_i64(pm, pm, pg);
+    tcg_gen_or_i64(pd, pn, pm);
+}
+
+static void gen_sel_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_and_vec(vece, pn, pn, pg);
+    tcg_gen_andc_vec(vece, pm, pm, pg);
+    tcg_gen_or_vec(vece, pd, pn, pm);
+}
+
+void trans_SEL_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static const GVecGen4 sel_pg = {
+        .fni8 = gen_sel_pg_i64,
+        .fniv = gen_sel_pg_vec,
+        .fno = gen_helper_sve_sel_pred,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    do_genop4_p(s, &sel_pg, a->rd, a->rn, a->rm, a->pg);
+}
+
+static void gen_orr_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_or_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_orr_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_or_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+void trans_ORR_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static const GVecGen4 orr_pg = {
+        .fni8 = gen_orr_pg_i64,
+        .fniv = gen_orr_pg_vec,
+        .fno = gen_helper_sve_orr_pred,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+
+    if (a->pg == a->rn && a->rn == a->rm) {
+        do_mov_p(s, a->rd, a->rn);
+    } else {
+        do_genop4_p(s, &orr_pg, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_orn_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_orc_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_orn_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_orc_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+void trans_ORN_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static const GVecGen4 orn_pg = {
+        .fni8 = gen_orn_pg_i64,
+        .fniv = gen_orn_pg_vec,
+        .fno = gen_helper_sve_orn_pred,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    do_genop4_p(s, &orn_pg, a->rd, a->rn, a->rm, a->pg);
+}
+
+static void gen_nor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_or_i64(pd, pn, pm);
+    tcg_gen_andc_i64(pd, pg, pd);
+}
+
+static void gen_nor_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_or_vec(vece, pd, pn, pm);
+    tcg_gen_andc_vec(vece, pd, pg, pd);
+}
+
+void trans_NOR_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static const GVecGen4 nor_pg = {
+        .fni8 = gen_nor_pg_i64,
+        .fniv = gen_nor_pg_vec,
+        .fno = gen_helper_sve_nor_pred,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    do_genop4_p(s, &nor_pg, a->rd, a->rn, a->rm, a->pg);
+}
+
+static void gen_nand_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_and_i64(pd, pn, pm);
+    tcg_gen_andc_i64(pd, pg, pd);
+}
+
+static void gen_nand_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_and_vec(vece, pd, pn, pm);
+    tcg_gen_andc_vec(vece, pd, pg, pd);
+}
+
+void trans_NAND_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static const GVecGen4 nand_pg = {
+        .fni8 = gen_nand_pg_i64,
+        .fniv = gen_nand_pg_vec,
+        .fno = gen_helper_sve_nand_pred,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    do_genop4_p(s, &nand_pg, a->rd, a->rn, a->rm, a->pg);
+}
+
+/* A predicate logical operation that sets the flags is always implemented
+   out of line.  The helper returns a 3-bit mask to set N,Z,C --
+   N in bit 31, Z in bit 2, and C in bit 1.  */
+static void do_logical_pppp_flags(DisasContext *s, arg_rprr_esz *a,
+                                  void (*gen_fn)(TCGv_i32, TCGv_ptr, TCGv_ptr,
+                                                 TCGv_ptr, TCGv_ptr, TCGv_i32))
+{
+    TCGv_i32 t = tcg_const_i32(vec_full_reg_size(s));
+    TCGv_ptr pd = tcg_temp_new_ptr();
+    TCGv_ptr pn = tcg_temp_new_ptr();
+    TCGv_ptr pm = tcg_temp_new_ptr();
+    TCGv_ptr pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(pd, cpu_env, pred_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(pn, cpu_env, pred_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(pm, cpu_env, pred_full_reg_offset(s, a->rm));
+    tcg_gen_addi_ptr(pg, cpu_env, pred_full_reg_offset(s, a->pg));
+
+    gen_fn(t, pd, pn, pm, pg, t);
+
+    tcg_temp_free_ptr(pd);
+    tcg_temp_free_ptr(pn);
+    tcg_temp_free_ptr(pm);
+    tcg_temp_free_ptr(pg);
+
+    tcg_gen_sari_i32(cpu_NF, t, 31);
+    tcg_gen_andi_i32(cpu_ZF, t, 2);
+    tcg_gen_andi_i32(cpu_CF, t, 1);
+    tcg_gen_movi_i32(cpu_VF, 0);
+
+    tcg_temp_free_i32(t);
+}
+
+void trans_ANDS_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_logical_pppp_flags(s, a, gen_helper_sve_ands_pred);
+}
+
+void trans_BICS_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_logical_pppp_flags(s, a, gen_helper_sve_bics_pred);
+}
+
+void trans_EORS_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_logical_pppp_flags(s, a, gen_helper_sve_eors_pred);
+}
+
+void trans_ORRS_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_logical_pppp_flags(s, a, gen_helper_sve_orrs_pred);
+}
+
+void trans_ORNS_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_logical_pppp_flags(s, a, gen_helper_sve_orns_pred);
+}
+
+void trans_NORS_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_logical_pppp_flags(s, a, gen_helper_sve_nors_pred);
+}
+
+void trans_NANDS_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_logical_pppp_flags(s, a, gen_helper_sve_nands_pred);
+}
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index d1ca1f799b..edcd32db88 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -20,4 +20,4 @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.def $(DECODETREE)
 		"GEN", $@)
 
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
-obj-$(TARGET_AARCH64) += translate-sve.o
+obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
diff --git a/target/arm/sve.def b/target/arm/sve.def
index f802031f51..77f96510d8 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -25,6 +25,7 @@
 # instruction patterns.
 
 &rrr_esz		rd rn rm esz
+&rprr_esz		rd pg rn rm esz
 &pred_set		rd pat esz i s
 
 ###########################################################################
@@ -34,6 +35,9 @@
 # Three operand with unused vector element size
 @rd_rn_rm		........ ... rm:5  ... ...  rn:5 rd:5		&rrr_esz esz=0
 
+# Three prediate operand, with governing predicate, unused vector element size
+@pd_pg_pn_pm		........ .... rm:4 .. pg:4 . rn:4 . rd:4	&rprr_esz esz=0
+
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
@@ -55,3 +59,20 @@ pred_set		00100101 00 011 000 1110 0100 0000 rd:4		&pred_set pat=31 esz=0 i=0 s=
 
 # SVE initialize FFR (SETFFR)
 pred_set		00100101 0010 1100 1001 0000 0000 0000		&pred_set pat=31 esz=0 rd=16 i=1 s=0
+
+# SVE predicate logical operations
+AND_pppp		00100101 00 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm
+BIC_pppp		00100101 00 00 .... 01 .... 0 .... 1 ....	@pd_pg_pn_pm
+EOR_pppp		00100101 00 00 .... 01 .... 1 .... 0 ....	@pd_pg_pn_pm
+SEL_pppp		00100101 00 00 .... 01 .... 1 .... 1 ....	@pd_pg_pn_pm
+ANDS_pppp		00100101 01 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm
+BICS_pppp		00100101 01 00 .... 01 .... 0 .... 1 ....	@pd_pg_pn_pm
+EORS_pppp		00100101 01 00 .... 01 .... 1 .... 0 ....	@pd_pg_pn_pm
+ORR_pppp		00100101 10 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm
+ORN_pppp		00100101 10 00 .... 01 .... 0 .... 1 ....	@pd_pg_pn_pm
+NOR_pppp		00100101 10 00 .... 01 .... 1 .... 0 ....	@pd_pg_pn_pm
+NAND_pppp		00100101 10 00 .... 01 .... 1 .... 1 ....	@pd_pg_pn_pm
+ORRS_pppp		00100101 11 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm
+ORNS_pppp		00100101 11 00 .... 01 .... 0 .... 1 ....	@pd_pg_pn_pm
+NORS_pppp		00100101 11 00 .... 01 .... 1 .... 0 ....	@pd_pg_pn_pm
+NANDS_pppp		00100101 11 00 .... 01 .... 1 .... 1 ....	@pd_pg_pn_pm
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 06/23] target/arm: Implement SVE load vector/predicate
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (4 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 05/23] target/arm: Implement SVE predicate logical operations Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 07/23] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group Richard Henderson
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  2 ++
 target/arm/sve_helper.c    | 31 +++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 32 ++++++++++++++++++++++++++++++++
 target/arm/sve.def         | 16 ++++++++++++++++
 4 files changed, 81 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 4a923a33b8..8b382a962d 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -35,3 +35,5 @@ DEF_HELPER_FLAGS_5(sve_orns_pred, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_nors_pred, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_nands_pred, TCG_CALL_NO_RWG,
                    i32, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldr, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 5d2a6b2239..a605e623f7 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
 
@@ -124,3 +125,33 @@ LOGICAL_PRED_FLAGS(sve_nands_pred, DO_NAND)
 #undef DO_NOR
 #undef DO_NAND
 #undef DO_SEL
+
+void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
+{
+    intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
+
+    for (i = 0; i < len_align; i += 8) {
+        *(uint64_t *)(d + i) = cpu_ldq_data(env, addr + i);
+    }
+
+    /* For LDR of predicate registers, we can have any multiple of 2.  */
+    switch (len % 8) {
+    case 0:
+        break;
+    case 2:
+        *(uint64_t *)(d + i) = cpu_lduw_data(env, addr + i);
+        break;
+    case 4:
+        *(uint64_t *)(d + i) = (uint32_t)cpu_ldl_data(env, addr + i);
+        break;
+    case 6:
+        {
+            uint32_t t0 = cpu_ldl_data(env, addr + i);
+            uint32_t t1 = cpu_lduw_data(env, addr + i + 2);
+            *(uint64_t *)(d + i) = deposit64(t0, 32, 32, t1);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index ab03ead000..0e988c03aa 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -526,3 +526,35 @@ void trans_NANDS_pppp(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 {
     do_logical_pppp_flags(s, a, gen_helper_sve_nands_pred);
 }
+
+static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
+                   int rn, int imm)
+{
+    TCGv_ptr vptr;
+    TCGv_i32 tlen;
+    TCGv_i64 addr = tcg_temp_new_i64();
+
+    tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm);
+
+    vptr = tcg_temp_new_ptr();
+    tlen = tcg_const_i32(len);
+    tcg_gen_addi_ptr(vptr, cpu_env, vofs);
+
+    gen_helper_sve_ldr(cpu_env, vptr, addr, tlen);
+
+    tcg_temp_free_ptr(vptr);
+    tcg_temp_free_i32(tlen);
+    tcg_temp_free_i64(addr);
+}
+
+void trans_LDR_zri(DisasContext *s, arg_rri *a, uint32_t insn)
+{
+    int size = vec_full_reg_size(s);
+    do_ldr(s, vec_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
+}
+
+void trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn)
+{
+    int size = pred_full_reg_size(s);
+    do_ldr(s, pred_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
+}
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 77f96510d8..d1172296e0 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -19,11 +19,17 @@
 # This file is processed by scripts/decodetree.py
 #
 
+###########################################################################
+# Named fields.  These are primarily for disjoint fields.
+
+%imm9_16_10		16:s6 10:3
+
 ###########################################################################
 # Named attribute sets.  These are used to make nice(er) names
 # when creating helpers common to those for the individual
 # instruction patterns.
 
+&rri			rd rn imm
 &rrr_esz		rd rn rm esz
 &rprr_esz		rd pg rn rm esz
 &pred_set		rd pat esz i s
@@ -38,6 +44,10 @@
 # Three prediate operand, with governing predicate, unused vector element size
 @pd_pg_pn_pm		........ .... rm:4 .. pg:4 . rn:4 . rd:4	&rprr_esz esz=0
 
+# Basic Load/Store with 9-bit immediate offset
+@pd_rn_i9		........ ........ ...... rn:5 . rd:4		&rri imm=%imm9_16_10
+@rd_rn_i9		........ ........ ...... rn:5 rd:5		&rri imm=%imm9_16_10
+
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
@@ -76,3 +86,9 @@ ORRS_pppp		00100101 11 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm
 ORNS_pppp		00100101 11 00 .... 01 .... 0 .... 1 ....	@pd_pg_pn_pm
 NORS_pppp		00100101 11 00 .... 01 .... 1 .... 0 ....	@pd_pg_pn_pm
 NANDS_pppp		00100101 11 00 .... 01 .... 1 .... 1 ....	@pd_pg_pn_pm
+
+# SVE load predicate register
+LDR_pri			10000101 10 ...... 000 ... ..... 0 ....		@pd_rn_i9
+
+# SVE load vector register
+LDR_zri			1000010110 ...... 010 ... ..... .....		@rd_rn_i9
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 07/23] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (5 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 06/23] target/arm: Implement SVE load vector/predicate Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 08/23] target/arm: Handle SVE registers in write_fp_dreg Richard Henderson
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 145 ++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 203 +++++++++++++++++++++++++++++++++++++++++++--
 target/arm/translate-sve.c |  75 +++++++++++++++++
 target/arm/sve.def         |  39 +++++++++
 4 files changed, 455 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 8b382a962d..964b15b104 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -17,6 +17,151 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 
+DEF_HELPER_FLAGS_5(sve_and_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_and_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_and_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_and_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_add_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_add_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_add_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_add_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index a605e623f7..b617ea2c04 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -28,13 +28,17 @@
 /* Note that vector data is stored in host-endian 64-bit chunks,
    so addressing units smaller than that needs a host-endian fixup.  */
 #ifdef HOST_WORDS_BIGENDIAN
-#define H1(x)  ((x) ^ 7)
-#define H2(x)  ((x) ^ 3)
-#define H4(x)  ((x) ^ 1)
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
 #else
-#define H1(x)  (x)
-#define H2(x)  (x)
-#define H4(x)  (x)
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
 #endif
 
 
@@ -117,7 +121,7 @@ LOGICAL_PRED_FLAGS(sve_nands_pred, DO_NAND)
 
 #undef LOGICAL_PRED
 #undef LOGICAL_PRED_FLAGS
-#undef DO_ADD
+#undef DO_AND
 #undef DO_BIC
 #undef DO_EOR
 #undef DO_ORR
@@ -126,6 +130,191 @@ LOGICAL_PRED_FLAGS(sve_nands_pred, DO_NAND)
 #undef DO_NAND
 #undef DO_SEL
 
+/* Fully general three-operand expander, controlled by a predicate.
+ * This is complicated by the host-endian storage of the register file.
+ */
+/* ??? I don't expect the compiler could ever vectorize this itself.
+ * With some tables we can convert bit masks to byte masks, and with
+ * extra care wrt byte/word ordering we could use gcc generic vectors
+ * and do 16 bytes at a time.
+ */
+#define DO_ZPZZ(NAME, TYPE, H, OP)                                       \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
+{                                                                       \
+    intptr_t iv = 0, ib = 0, opr_sz = simd_oprsz(desc);                 \
+    for (iv = ib = 0; iv < opr_sz; iv += 16, ib += 2) {                 \
+        uint16_t pg = *(uint16_t *)(vg + H2(ib));                       \
+        intptr_t i = 0;                                                 \
+        do {                                                            \
+            if (pg & 1) {                                               \
+                TYPE nn = *(TYPE *)(vn + iv + H(i));                    \
+                TYPE mm = *(TYPE *)(vm + iv + H(i));                    \
+                *(TYPE *)(vd + iv + H(i)) = OP(nn, mm);                 \
+            }                                                           \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);                     \
+        } while (pg);                                                   \
+    }                                                                   \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZZ_D(NAME, TYPE, OP)                                \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                  \
+    TYPE *d = vd, *n = vn, *m = vm;                             \
+    uint8_t *pg = vg;                                           \
+    for (i = 0; i < opr_sz; i += 1) {                           \
+        if (pg[H1(i)] & 1) {                                    \
+            TYPE nn = n[i], mm = m[i];                          \
+            d[i] = OP(nn, mm);                                  \
+        }                                                       \
+    }                                                           \
+}
+
+#define DO_AND(N, M)  (N & M)
+#define DO_EOR(N, M)  (N ^ M)
+#define DO_ORR(N, M)  (N | M)
+#define DO_BIC(N, M)  (N &~ M)
+
+DO_ZPZZ(sve_and_zpzz_b, uint8_t, H1, DO_AND)
+DO_ZPZZ(sve_orr_zpzz_b, uint8_t, H1, DO_ORR)
+DO_ZPZZ(sve_eor_zpzz_b, uint8_t, H1, DO_EOR)
+DO_ZPZZ(sve_bic_zpzz_b, uint8_t, H1, DO_BIC)
+
+DO_ZPZZ(sve_and_zpzz_h, uint16_t, H1_2, DO_AND)
+DO_ZPZZ(sve_orr_zpzz_h, uint16_t, H1_2, DO_ORR)
+DO_ZPZZ(sve_eor_zpzz_h, uint16_t, H1_2, DO_EOR)
+DO_ZPZZ(sve_bic_zpzz_h, uint16_t, H1_2, DO_BIC)
+
+DO_ZPZZ(sve_and_zpzz_s, uint32_t, H1_4, DO_AND)
+DO_ZPZZ(sve_orr_zpzz_s, uint32_t, H1_4, DO_ORR)
+DO_ZPZZ(sve_eor_zpzz_s, uint32_t, H1_4, DO_EOR)
+DO_ZPZZ(sve_bic_zpzz_s, uint32_t, H1_4, DO_BIC)
+
+DO_ZPZZ_D(sve_and_zpzz_d, uint64_t, DO_AND)
+DO_ZPZZ_D(sve_orr_zpzz_d, uint64_t, DO_ORR)
+DO_ZPZZ_D(sve_eor_zpzz_d, uint64_t, DO_EOR)
+DO_ZPZZ_D(sve_bic_zpzz_d, uint64_t, DO_BIC)
+
+#undef DO_AND
+#undef DO_ORR
+#undef DO_EOR
+#undef DO_BIC
+
+#define DO_ADD(N, M)  (N + M)
+#define DO_SUB(N, M)  (N - M)
+
+DO_ZPZZ(sve_add_zpzz_b, uint8_t, H1, DO_ADD)
+DO_ZPZZ(sve_sub_zpzz_b, uint8_t, H1, DO_SUB)
+
+DO_ZPZZ(sve_add_zpzz_h, uint16_t, H1_2, DO_ADD)
+DO_ZPZZ(sve_sub_zpzz_h, uint16_t, H1_2, DO_SUB)
+
+DO_ZPZZ(sve_add_zpzz_s, uint32_t, H1_4, DO_ADD)
+DO_ZPZZ(sve_sub_zpzz_s, uint32_t, H1_4, DO_SUB)
+
+DO_ZPZZ_D(sve_add_zpzz_d, uint64_t, DO_ADD)
+DO_ZPZZ_D(sve_sub_zpzz_d, uint64_t, DO_SUB)
+
+#undef DO_ADD
+#undef DO_SUB
+
+#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
+#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
+#define DO_ABD(N, M)  ((N) >= (M) ? (N) - (M) : (M) - (N))
+
+DO_ZPZZ(sve_smax_zpzz_b, int8_t,  H1, DO_MAX)
+DO_ZPZZ(sve_umax_zpzz_b, uint8_t, H1, DO_MAX)
+DO_ZPZZ(sve_smin_zpzz_b, int8_t,  H1, DO_MIN)
+DO_ZPZZ(sve_umin_zpzz_b, uint8_t, H1, DO_MIN)
+DO_ZPZZ(sve_sabd_zpzz_b, int8_t,  H1, DO_ABD)
+DO_ZPZZ(sve_uabd_zpzz_b, uint8_t, H1, DO_ABD)
+
+DO_ZPZZ(sve_smax_zpzz_h, int16_t,  H1_2, DO_MAX)
+DO_ZPZZ(sve_umax_zpzz_h, uint16_t, H1_2, DO_MAX)
+DO_ZPZZ(sve_smin_zpzz_h, int16_t,  H1_2, DO_MIN)
+DO_ZPZZ(sve_umin_zpzz_h, uint16_t, H1_2, DO_MIN)
+DO_ZPZZ(sve_sabd_zpzz_h, int16_t,  H1_2, DO_ABD)
+DO_ZPZZ(sve_uabd_zpzz_h, uint16_t, H1_2, DO_ABD)
+
+DO_ZPZZ(sve_smax_zpzz_s, int32_t,  H1_4, DO_MAX)
+DO_ZPZZ(sve_umax_zpzz_s, uint32_t, H1_4, DO_MAX)
+DO_ZPZZ(sve_smin_zpzz_s, int32_t,  H1_4, DO_MIN)
+DO_ZPZZ(sve_umin_zpzz_s, uint32_t, H1_4, DO_MIN)
+DO_ZPZZ(sve_sabd_zpzz_s, int32_t,  H1_4, DO_ABD)
+DO_ZPZZ(sve_uabd_zpzz_s, uint32_t, H1_4, DO_ABD)
+
+DO_ZPZZ_D(sve_smax_zpzz_d, int64_t,  DO_MAX)
+DO_ZPZZ_D(sve_umax_zpzz_d, uint64_t, DO_MAX)
+DO_ZPZZ_D(sve_smin_zpzz_d, int64_t,  DO_MIN)
+DO_ZPZZ_D(sve_umin_zpzz_d, uint64_t, DO_MIN)
+DO_ZPZZ_D(sve_sabd_zpzz_d, int64_t,  DO_ABD)
+DO_ZPZZ_D(sve_uabd_zpzz_d, uint64_t, DO_ABD)
+
+#undef DO_MAX
+#undef DO_MIN
+#undef DO_ABD
+
+#define DO_MUL(N, M)  (N * M)
+#define DO_DIV(N, M)  (M ? N / M : 0)
+
+/* Because the computation type is at least twice as large as required,
+   these work for both signed and unsigned source types.  */
+static inline uint8_t do_mulh_b(int32_t n, int32_t m)
+{
+    return (n * m) >> 8;
+}
+
+static inline uint16_t do_mulh_h(int32_t n, int32_t m)
+{
+    return (n * m) >> 16;
+}
+
+static inline uint32_t do_mulh_s(int64_t n, int64_t m)
+{
+    return (n * m) >> 32;
+}
+
+static inline uint64_t do_smulh_d(uint64_t n, uint64_t m)
+{
+    uint64_t lo, hi;
+    muls64(&lo, &hi, n, m);
+    return hi;
+}
+
+static inline uint64_t do_umulh_d(uint64_t n, uint64_t m)
+{
+    uint64_t lo, hi;
+    mulu64(&lo, &hi, n, m);
+    return hi;
+}
+
+DO_ZPZZ(sve_mul_zpzz_b, uint8_t, H1, DO_MUL)
+DO_ZPZZ(sve_smulh_zpzz_b, int8_t, H1, do_mulh_b)
+DO_ZPZZ(sve_umulh_zpzz_b, uint8_t, H1, do_mulh_b)
+
+DO_ZPZZ(sve_mul_zpzz_h, uint16_t, H1_2, DO_MUL)
+DO_ZPZZ(sve_smulh_zpzz_h, int16_t, H1_2, do_mulh_h)
+DO_ZPZZ(sve_umulh_zpzz_h, uint16_t, H1_2, do_mulh_h)
+
+DO_ZPZZ(sve_mul_zpzz_s, uint32_t, H1_4, DO_MUL)
+DO_ZPZZ(sve_smulh_zpzz_s, int32_t, H1_4, do_mulh_s)
+DO_ZPZZ(sve_umulh_zpzz_s, uint32_t, H1_4, do_mulh_s)
+DO_ZPZZ(sve_sdiv_zpzz_s, int32_t, H1_4, DO_DIV)
+DO_ZPZZ(sve_udiv_zpzz_s, uint32_t, H1_4, DO_DIV)
+
+DO_ZPZZ_D(sve_mul_zpzz_d, uint64_t, DO_MUL)
+DO_ZPZZ_D(sve_smulh_zpzz_d, uint64_t, do_smulh_d)
+DO_ZPZZ_D(sve_umulh_zpzz_d, uint64_t, do_umulh_d)
+DO_ZPZZ_D(sve_sdiv_zpzz_d, int64_t, DO_DIV)
+DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
+
+#undef DO_MUL
+#undef DO_DIV
+
+#undef DO_ZPZZ
+#undef DO_ZPZZ_D
+
 void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
 {
     intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 0e988c03aa..d8b34020bb 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -126,6 +126,81 @@ void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
     do_zzz_genfn(s, a, tcg_gen_gvec_andc);
 }
 
+static void do_zpzz_ool(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4 *fn)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, 0, fn);
+}
+
+#define DO_ZPZZ(NAME, name) \
+void trans_##NAME##_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn) \
+{                                                                         \
+    static gen_helper_gvec_4 * const fns[4] = {                           \
+        gen_helper_sve_##name##_zpzz_b, gen_helper_sve_##name##_zpzz_h,   \
+        gen_helper_sve_##name##_zpzz_s, gen_helper_sve_##name##_zpzz_d,   \
+    };                                                                    \
+    do_zpzz_ool(s, a, fns[a->esz]);                                       \
+}
+
+DO_ZPZZ(AND, and)
+DO_ZPZZ(EOR, eor)
+DO_ZPZZ(ORR, orr)
+DO_ZPZZ(BIC, bic)
+
+DO_ZPZZ(ADD, add)
+DO_ZPZZ(SUB, sub)
+
+DO_ZPZZ(SMAX, smax)
+DO_ZPZZ(UMAX, umax)
+DO_ZPZZ(SMIN, smin)
+DO_ZPZZ(UMIN, umin)
+DO_ZPZZ(SABD, sabd)
+DO_ZPZZ(UABD, uabd)
+
+DO_ZPZZ(MUL, mul)
+DO_ZPZZ(SMULH, smulh)
+DO_ZPZZ(UMULH, umulh)
+
+void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    gen_helper_gvec_4 *fn;
+    switch (a->esz) {
+    default:
+        unallocated_encoding(s);
+        return;
+    case 2:
+        fn = gen_helper_sve_sdiv_zpzz_s;
+        break;
+    case 3:
+        fn = gen_helper_sve_sdiv_zpzz_d;
+        break;
+    };
+    do_zpzz_ool(s, a, fn);
+}
+
+void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    gen_helper_gvec_4 *fn;
+    switch (a->esz) {
+    default:
+        unallocated_encoding(s);
+        return;
+    case 2:
+        fn = gen_helper_sve_udiv_zpzz_s;
+        break;
+    case 3:
+        fn = gen_helper_sve_udiv_zpzz_d;
+        break;
+    };
+    do_zpzz_ool(s, a, fn);
+}
+
+#undef DO_ZPZZ
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index d1172296e0..3bb4faaf89 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -24,6 +24,10 @@
 
 %imm9_16_10		16:s6 10:3
 
+# Either a copy of rd (at bit 0), or a different source
+# as propagated via the MOVPRFX instruction.
+%reg_movprfx		0:5
+
 ###########################################################################
 # Named attribute sets.  These are used to make nice(er) names
 # when creating helpers common to those for the individual
@@ -44,6 +48,10 @@
 # Three prediate operand, with governing predicate, unused vector element size
 @pd_pg_pn_pm		........ .... rm:4 .. pg:4 . rn:4 . rd:4	&rprr_esz esz=0
 
+# Two register operand, with governing predicate, vector element size
+@rdn_pg_rm_esz		........ esz:2 ... ... ... pg:3 rm:5 rd:5	&rprr_esz rn=%reg_movprfx
+@rdm_pg_rn_esz		........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rprr_esz rm=%reg_movprfx
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9		........ ........ ...... rn:5 . rd:4		&rri imm=%imm9_16_10
 @rd_rn_i9		........ ........ ...... rn:5 rd:5		&rri imm=%imm9_16_10
@@ -51,6 +59,37 @@
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
+### SVE Integer Arithmetic - Binary Predicated Group
+
+# SVE bitwise logical vector operations (predicated)
+ORR_zpzz		00000100 .. 011 000 000 ... ..... .....		@rdn_pg_rm_esz
+EOR_zpzz		00000100 .. 011 001 000 ... ..... .....		@rdn_pg_rm_esz
+AND_zpzz		00000100 .. 011 010 000 ... ..... .....		@rdn_pg_rm_esz
+BIC_zpzz		00000100 .. 011 011 000 ... ..... .....		@rdn_pg_rm_esz
+
+# SVE integer add/subtract vectors (predicated)
+ADD_zpzz		00000100 .. 000 000 000 ... ..... .....		@rdn_pg_rm_esz
+SUB_zpzz		00000100 .. 000 001 000 ... ..... .....		@rdn_pg_rm_esz
+SUB_zpzz		00000100 .. 000 011 000 ... ..... .....		@rdm_pg_rn_esz # SUBR
+
+# SVE integer min/max/difference (predicated)
+SMAX_zpzz		00000100 .. 001 000 000 ... ..... .....		@rdn_pg_rm_esz
+UMAX_zpzz		00000100 .. 001 001 000 ... ..... .....		@rdn_pg_rm_esz
+SMIN_zpzz		00000100 .. 001 010 000 ... ..... .....		@rdn_pg_rm_esz
+UMIN_zpzz		00000100 .. 001 011 000 ... ..... .....		@rdn_pg_rm_esz
+SABD_zpzz		00000100 .. 001 100 000 ... ..... .....		@rdn_pg_rm_esz
+UABD_zpzz		00000100 .. 001 101 000 ... ..... .....		@rdn_pg_rm_esz
+
+# SVE integer multiply/divide (predicated)
+MUL_zpzz		00000100 .. 010 000 000 ... ..... .....		@rdn_pg_rm_esz
+SMULH_zpzz		00000100 .. 010 010 000 ... ..... .....		@rdn_pg_rm_esz
+UMULH_zpzz		00000100 .. 010 011 000 ... ..... .....		@rdn_pg_rm_esz
+# Note that divide requires size >= 2; below 2 is unallocated.
+SDIV_zpzz		00000100 .. 010 100 000 ... ..... .....		@rdn_pg_rm_esz
+UDIV_zpzz		00000100 .. 010 101 000 ... ..... .....		@rdn_pg_rm_esz
+SDIV_zpzz		00000100 .. 010 110 000 ... ..... .....		@rdm_pg_rn_esz # SDIVR
+UDIV_zpzz		00000100 .. 010 111 000 ... ..... .....		@rdm_pg_rn_esz # UDIVR
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 08/23] target/arm: Handle SVE registers in write_fp_dreg
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (6 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 07/23] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 09/23] target/arm: Handle SVE registers when using clear_vec_high Richard Henderson
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

When storing to an AdvSIMD FP register, all of the high
bits of the SVE register are zeroed.  At the same time,
export the function for use in translate-sve.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.h |  1 +
 target/arm/translate-a64.c | 32 ++++++++++++++++++--------------
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 9014b5bf8b..07861fa9c6 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -35,6 +35,7 @@ TCGv_i64 cpu_reg(DisasContext *s, int reg);
 TCGv_i64 cpu_reg_sp(DisasContext *s, int reg);
 TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf);
 TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf);
+void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v);
 
 /* We should have at some point before trying to access an FP register
  * done the necessary access check, so assert that
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 8be1660661..b951045820 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -533,13 +533,28 @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
     return v;
 }
 
-static void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
+/* Clear the bits above an 64-bit vector.
+ * If SVE is not enabled, then there are only 128 bits in the vector.
+ */
+static void clear_vec_high(DisasContext *s, int rd)
 {
+    unsigned ofs = fp_reg_offset(s, rd, MO_64);
+    unsigned vsz = vec_full_reg_size(s);
     TCGv_i64 tcg_zero = tcg_const_i64(0);
 
-    tcg_gen_st_i64(v, cpu_env, fp_reg_offset(s, reg, MO_64));
-    tcg_gen_st_i64(tcg_zero, cpu_env, fp_reg_hi_offset(s, reg));
+    tcg_gen_st_i64(tcg_zero, cpu_env, ofs + 8);
     tcg_temp_free_i64(tcg_zero);
+    if (vsz > 16) {
+        tcg_gen_gvec_dup8i(ofs + 16, vsz - 16, vsz - 16, 0);
+    }
+}
+
+void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
+{
+    unsigned ofs = fp_reg_offset(s, reg, MO_64);
+
+    tcg_gen_st_i64(v, cpu_env, ofs);
+    clear_vec_high(s, reg);
 }
 
 static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
@@ -1015,17 +1030,6 @@ static void write_vec_element_i32(DisasContext *s, TCGv_i32 tcg_src,
     }
 }
 
-/* Clear the high 64 bits of a 128 bit vector (in general non-quad
- * vector ops all need to do this).
- */
-static void clear_vec_high(DisasContext *s, int rd)
-{
-    TCGv_i64 tcg_zero = tcg_const_i64(0);
-
-    write_vec_element(s, tcg_zero, rd, 1, MO_64);
-    tcg_temp_free_i64(tcg_zero);
-}
-
 /* Store from vector register to memory */
 static void do_vec_st(DisasContext *s, int srcidx, int element,
                       TCGv_i64 tcg_addr, int size)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 09/23] target/arm: Handle SVE registers when using clear_vec_high
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (7 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 08/23] target/arm: Handle SVE registers in write_fp_dreg Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 10/23] target/arm: Implement SVE Integer Reduction Group Richard Henderson
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

When storing to an AdvSIMD FP register, all of the high
bits of the SVE register are zeroed.  Therefore, call it
more often with is_q as a parameter.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 157 +++++++++++++++------------------------------
 1 file changed, 51 insertions(+), 106 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b951045820..9e15a4b1ae 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -533,17 +533,19 @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
     return v;
 }
 
-/* Clear the bits above an 64-bit vector.
+/* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
  * If SVE is not enabled, then there are only 128 bits in the vector.
  */
-static void clear_vec_high(DisasContext *s, int rd)
+static void clear_vec_high(DisasContext *s, bool is_q, int rd)
 {
     unsigned ofs = fp_reg_offset(s, rd, MO_64);
     unsigned vsz = vec_full_reg_size(s);
-    TCGv_i64 tcg_zero = tcg_const_i64(0);
 
-    tcg_gen_st_i64(tcg_zero, cpu_env, ofs + 8);
-    tcg_temp_free_i64(tcg_zero);
+    if (is_q) {
+        TCGv_i64 tcg_zero = tcg_const_i64(0);
+        tcg_gen_st_i64(tcg_zero, cpu_env, ofs + 8);
+        tcg_temp_free_i64(tcg_zero);
+    }
     if (vsz > 16) {
         tcg_gen_gvec_dup8i(ofs + 16, vsz - 16, vsz - 16, 0);
     }
@@ -554,7 +556,7 @@ void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
     unsigned ofs = fp_reg_offset(s, reg, MO_64);
 
     tcg_gen_st_i64(v, cpu_env, ofs);
-    clear_vec_high(s, reg);
+    clear_vec_high(s, false, reg);
 }
 
 static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
@@ -915,6 +917,8 @@ static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size)
 
     tcg_temp_free_i64(tmplo);
     tcg_temp_free_i64(tmphi);
+
+    clear_vec_high(s, true, destidx);
 }
 
 /*
@@ -2670,12 +2674,13 @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
                     /* For non-quad operations, setting a slice of the low
                      * 64 bits of the register clears the high 64 bits (in
                      * the ARM ARM pseudocode this is implicit in the fact
-                     * that 'rval' is a 64 bit wide variable). We optimize
-                     * by noticing that we only need to do this the first
-                     * time we touch a register.
+                     * that 'rval' is a 64 bit wide variable).
+                     * For quad operations, we might still need to zero the
+                     * high bits of SVE.  We optimize by noticing that we only
+                     * need to do this the first time we touch a register.
                      */
-                    if (!is_q && e == 0 && (r == 0 || xs == selem - 1)) {
-                        clear_vec_high(s, tt);
+                    if (e == 0 && (r == 0 || xs == selem - 1)) {
+                        clear_vec_high(s, is_q, tt);
                     }
                 }
                 tcg_gen_addi_i64(tcg_addr, tcg_addr, ebytes);
@@ -2818,10 +2823,9 @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
             write_vec_element(s, tcg_tmp, rt, 0, MO_64);
             if (is_q) {
                 write_vec_element(s, tcg_tmp, rt, 1, MO_64);
-            } else {
-                clear_vec_high(s, rt);
             }
             tcg_temp_free_i64(tcg_tmp);
+            clear_vec_high(s, is_q, rt);
         } else {
             /* Load/store one element per register */
             if (is_load) {
@@ -6659,7 +6663,6 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q,
     }
 
     if (!is_q) {
-        clear_vec_high(s, rd);
         write_vec_element(s, tcg_final, rd, 0, MO_64);
     } else {
         write_vec_element(s, tcg_final, rd, 1, MO_64);
@@ -6672,7 +6675,8 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q,
     tcg_temp_free_i64(tcg_rd);
     tcg_temp_free_i32(tcg_rd_narrowed);
     tcg_temp_free_i64(tcg_final);
-    return;
+
+    clear_vec_high(s, is_q, rd);
 }
 
 /* SQSHLU, UQSHL, SQSHL: saturating left shifts */
@@ -6736,10 +6740,7 @@ static void handle_simd_qshl(DisasContext *s, bool scalar, bool is_q,
             tcg_temp_free_i64(tcg_op);
         }
         tcg_temp_free_i64(tcg_shift);
-
-        if (!is_q) {
-            clear_vec_high(s, rd);
-        }
+        clear_vec_high(s, is_q, rd);
     } else {
         TCGv_i32 tcg_shift = tcg_const_i32(shift);
         static NeonGenTwoOpEnvFn * const fns[2][2][3] = {
@@ -6788,8 +6789,8 @@ static void handle_simd_qshl(DisasContext *s, bool scalar, bool is_q,
         }
         tcg_temp_free_i32(tcg_shift);
 
-        if (!is_q && !scalar) {
-            clear_vec_high(s, rd);
+        if (!scalar) {
+            clear_vec_high(s, is_q, rd);
         }
     }
 }
@@ -6831,10 +6832,8 @@ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn,
                 write_vec_element(s, tcg_double, rd, pass, MO_64);
             }
         }
-
         tcg_temp_free_i64(tcg_int64);
         tcg_temp_free_i64(tcg_double);
-
     } else {
         TCGv_i32 tcg_int32 = tcg_temp_new_i32();
         TCGv_i32 tcg_float = tcg_temp_new_i32();
@@ -6887,20 +6886,17 @@ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn,
                 write_vec_element_i32(s, tcg_float, rd, pass, size);
             }
         }
-
         tcg_temp_free_i32(tcg_int32);
         tcg_temp_free_i32(tcg_float);
-
-        if ((size == MO_32 && elements == 2) ||
-            (size == MO_16 && elements == 4)) {
-            clear_vec_high(s, rd);
-        }
     }
 
     tcg_temp_free_ptr(tcg_fpst);
     if (fracbits || size == MO_64) {
         tcg_temp_free_i32(tcg_shift);
     }
+    if (elements > 1) {
+        clear_vec_high(s, (elements << size) > 8, rd);
+    }
 }
 
 /* UCVTF/SCVTF - Integer to FP conversion */
@@ -6988,9 +6984,7 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
             write_vec_element(s, tcg_op, rd, pass, MO_64);
             tcg_temp_free_i64(tcg_op);
         }
-        if (!is_q) {
-            clear_vec_high(s, rd);
-        }
+        clear_vec_high(s, is_q, rd);
     } else {
         int maxpass = is_scalar ? 1 : is_q ? 4 : 2;
         for (pass = 0; pass < maxpass; pass++) {
@@ -7009,8 +7003,8 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
             }
             tcg_temp_free_i32(tcg_op);
         }
-        if (!is_q && !is_scalar) {
-            clear_vec_high(s, rd);
+        if (!is_scalar) {
+            clear_vec_high(s, is_q, rd);
         }
     }
 
@@ -7491,13 +7485,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             tcg_temp_free_i32(tcg_op2);
         }
     }
-
     tcg_temp_free_ptr(fpst);
 
-    if ((elements << size) < 4) {
-        /* scalar, or non-quad vector op */
-        clear_vec_high(s, rd);
-    }
+    clear_vec_high(s, elements * (size ? 8 : 4) > 8, rd);
 }
 
 /* AdvSIMD scalar three same
@@ -8005,13 +7995,10 @@ static void handle_2misc_fcmp_zero(DisasContext *s, int opcode,
             }
             write_vec_element(s, tcg_res, rd, pass, MO_64);
         }
-        if (is_scalar) {
-            clear_vec_high(s, rd);
-        }
-
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_zero);
         tcg_temp_free_i64(tcg_op);
+        clear_vec_high(s, !is_scalar, rd);
     } else {
         TCGv_i32 tcg_op = tcg_temp_new_i32();
         TCGv_i32 tcg_zero = tcg_const_i32(0);
@@ -8063,8 +8050,8 @@ static void handle_2misc_fcmp_zero(DisasContext *s, int opcode,
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_zero);
         tcg_temp_free_i32(tcg_op);
-        if (!is_q && !is_scalar) {
-            clear_vec_high(s, rd);
+        if (!is_scalar) {
+            clear_vec_high(s, is_q, rd);
         }
     }
 
@@ -8100,12 +8087,9 @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode,
             }
             write_vec_element(s, tcg_res, rd, pass, MO_64);
         }
-        if (is_scalar) {
-            clear_vec_high(s, rd);
-        }
-
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_op);
+        clear_vec_high(s, !is_scalar, rd);
     } else {
         TCGv_i32 tcg_op = tcg_temp_new_i32();
         TCGv_i32 tcg_res = tcg_temp_new_i32();
@@ -8145,8 +8129,8 @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode,
         }
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_op);
-        if (!is_q && !is_scalar) {
-            clear_vec_high(s, rd);
+        if (!is_scalar) {
+            clear_vec_high(s, is_q, rd);
         }
     }
     tcg_temp_free_ptr(fpst);
@@ -8259,9 +8243,7 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
         write_vec_element_i32(s, tcg_res[pass], rd, destelt + pass, MO_32);
         tcg_temp_free_i32(tcg_res[pass]);
     }
-    if (!is_q) {
-        clear_vec_high(s, rd);
-    }
+    clear_vec_high(s, is_q, rd);
 }
 
 /* Remaining saturating accumulating ops */
@@ -8286,12 +8268,9 @@ static void handle_2misc_satacc(DisasContext *s, bool is_scalar, bool is_u,
             }
             write_vec_element(s, tcg_rd, rd, pass, MO_64);
         }
-        if (is_scalar) {
-            clear_vec_high(s, rd);
-        }
-
         tcg_temp_free_i64(tcg_rd);
         tcg_temp_free_i64(tcg_rn);
+        clear_vec_high(s, !is_scalar, rd);
     } else {
         TCGv_i32 tcg_rn = tcg_temp_new_i32();
         TCGv_i32 tcg_rd = tcg_temp_new_i32();
@@ -8349,13 +8328,9 @@ static void handle_2misc_satacc(DisasContext *s, bool is_scalar, bool is_u,
             }
             write_vec_element_i32(s, tcg_rd, rd, pass, MO_32);
         }
-
-        if (!is_q) {
-            clear_vec_high(s, rd);
-        }
-
         tcg_temp_free_i32(tcg_rd);
         tcg_temp_free_i32(tcg_rn);
+        clear_vec_high(s, is_q, rd);
     }
 }
 
@@ -8855,9 +8830,7 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
     tcg_temp_free_i64(tcg_round);
 
  done:
-    if (!is_q) {
-        clear_vec_high(s, rd);
-    }
+    clear_vec_high(s, is_q, rd);
 }
 
 static void gen_shl8_ins_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift)
@@ -9045,19 +9018,18 @@ static void handle_vec_simd_shrn(DisasContext *s, bool is_q,
     }
 
     if (!is_q) {
-        clear_vec_high(s, rd);
         write_vec_element(s, tcg_final, rd, 0, MO_64);
     } else {
         write_vec_element(s, tcg_final, rd, 1, MO_64);
     }
-
     if (round) {
         tcg_temp_free_i64(tcg_round);
     }
     tcg_temp_free_i64(tcg_rn);
     tcg_temp_free_i64(tcg_rd);
     tcg_temp_free_i64(tcg_final);
-    return;
+
+    clear_vec_high(s, is_q, rd);
 }
 
 
@@ -9451,9 +9423,7 @@ static void handle_3rd_narrowing(DisasContext *s, int is_q, int is_u, int size,
         write_vec_element_i32(s, tcg_res[pass], rd, pass + part, MO_32);
         tcg_temp_free_i32(tcg_res[pass]);
     }
-    if (!is_q) {
-        clear_vec_high(s, rd);
-    }
+    clear_vec_high(s, is_q, rd);
 }
 
 static void handle_pmull_64(DisasContext *s, int is_q, int rd, int rn, int rm)
@@ -9877,9 +9847,7 @@ static void handle_simd_3same_pair(DisasContext *s, int is_q, int u, int opcode,
             write_vec_element_i32(s, tcg_res[pass], rd, pass, MO_32);
             tcg_temp_free_i32(tcg_res[pass]);
         }
-        if (!is_q) {
-            clear_vec_high(s, rd);
-        }
+        clear_vec_high(s, is_q, rd);
     }
 
     if (fpst) {
@@ -10372,10 +10340,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             tcg_temp_free_i32(tcg_op2);
         }
     }
-
-    if (!is_q) {
-        clear_vec_high(s, rd);
-    }
+    clear_vec_high(s, is_q, rd);
 }
 
 /* AdvSIMD three same
@@ -10611,10 +10576,7 @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
 
     tcg_temp_free_ptr(fpst);
 
-    if (!is_q) {
-        /* non-quad vector op */
-        clear_vec_high(s, rd);
-    }
+    clear_vec_high(s, is_q, rd);
 }
 
 /* AdvSIMD three same extra
@@ -10846,9 +10808,7 @@ static void handle_rev(DisasContext *s, int opcode, bool u,
             write_vec_element(s, tcg_tmp, rd, i, grp_size);
             tcg_temp_free_i64(tcg_tmp);
         }
-        if (!is_q) {
-            clear_vec_high(s, rd);
-        }
+        clear_vec_high(s, is_q, rd);
     } else {
         int revmask = (1 << grp_size) - 1;
         int esize = 8 << size;
@@ -11499,9 +11459,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
             tcg_temp_free_i32(tcg_op);
         }
     }
-    if (!is_q) {
-        clear_vec_high(s, rd);
-    }
+    clear_vec_high(s, is_q, rd);
 
     if (need_rmode) {
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, cpu_env);
@@ -11778,9 +11736,7 @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
             tcg_temp_free_i32(tcg_op);
         }
 
-        if (!is_q) {
-            clear_vec_high(s, rd);
-        }
+        clear_vec_high(s, is_q, rd);
     }
 
     if (need_rmode) {
@@ -12029,12 +11985,8 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             tcg_temp_free_i64(tcg_op);
             tcg_temp_free_i64(tcg_res);
         }
-
-        if (is_scalar) {
-            clear_vec_high(s, rd);
-        }
-
         tcg_temp_free_i64(tcg_idx);
+        clear_vec_high(s, !is_scalar, rd);
     } else if (!is_long) {
         /* 32 bit floating point, or 16 or 32 bit integer.
          * For the 16 bit scalar case we use the usual Neon helpers and
@@ -12198,12 +12150,8 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             tcg_temp_free_i32(tcg_op);
             tcg_temp_free_i32(tcg_res);
         }
-
         tcg_temp_free_i32(tcg_idx);
-
-        if (!is_q) {
-            clear_vec_high(s, rd);
-        }
+        clear_vec_high(s, is_q, rd);
     } else {
         /* long ops: 16x16->32 or 32x32->64 */
         TCGv_i64 tcg_res[2];
@@ -12279,10 +12227,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                 tcg_temp_free_i64(tcg_passres);
             }
             tcg_temp_free_i64(tcg_idx);
-
-            if (is_scalar) {
-                clear_vec_high(s, rd);
-            }
+            clear_vec_high(s, !is_scalar, rd);
         } else {
             TCGv_i32 tcg_idx = tcg_temp_new_i32();
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 10/23] target/arm: Implement SVE Integer Reduction Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (8 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 09/23] target/arm: Handle SVE registers when using clear_vec_high Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 11/23] target/arm: Implement SVE bitwise shift by immediate (predicated) Richard Henderson
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Excepting MOVPRFX, which isn't a reduction.  Presumably it is
placed within the group because of its encoding.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  44 +++++++++++++++++
 target/arm/sve_helper.c    | 116 +++++++++++++++++++++++++++++++++++++++------
 target/arm/translate-sve.c |  64 +++++++++++++++++++++++++
 target/arm/sve.def         |  22 +++++++++
 4 files changed, 231 insertions(+), 15 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 964b15b104..937598d6f8 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -162,6 +162,50 @@ DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_orv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_eorv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_eorv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_eorv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_eorv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_andv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_andv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_andv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_andv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_saddv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_saddv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_saddv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_uaddv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uaddv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uaddv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uaddv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_smaxv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_smaxv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_smaxv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_smaxv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_umaxv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_umaxv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_umaxv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_umaxv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_sminv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_sminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_sminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_sminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_uminv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index b617ea2c04..fca17440e7 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -196,11 +196,6 @@ DO_ZPZZ_D(sve_orr_zpzz_d, uint64_t, DO_ORR)
 DO_ZPZZ_D(sve_eor_zpzz_d, uint64_t, DO_EOR)
 DO_ZPZZ_D(sve_bic_zpzz_d, uint64_t, DO_BIC)
 
-#undef DO_AND
-#undef DO_ORR
-#undef DO_EOR
-#undef DO_BIC
-
 #define DO_ADD(N, M)  (N + M)
 #define DO_SUB(N, M)  (N - M)
 
@@ -216,9 +211,6 @@ DO_ZPZZ(sve_sub_zpzz_s, uint32_t, H1_4, DO_SUB)
 DO_ZPZZ_D(sve_add_zpzz_d, uint64_t, DO_ADD)
 DO_ZPZZ_D(sve_sub_zpzz_d, uint64_t, DO_SUB)
 
-#undef DO_ADD
-#undef DO_SUB
-
 #define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
 #define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
 #define DO_ABD(N, M)  ((N) >= (M) ? (N) - (M) : (M) - (N))
@@ -251,10 +243,6 @@ DO_ZPZZ_D(sve_umin_zpzz_d, uint64_t, DO_MIN)
 DO_ZPZZ_D(sve_sabd_zpzz_d, int64_t,  DO_ABD)
 DO_ZPZZ_D(sve_uabd_zpzz_d, uint64_t, DO_ABD)
 
-#undef DO_MAX
-#undef DO_MIN
-#undef DO_ABD
-
 #define DO_MUL(N, M)  (N * M)
 #define DO_DIV(N, M)  (M ? N / M : 0)
 
@@ -309,12 +297,110 @@ DO_ZPZZ_D(sve_umulh_zpzz_d, uint64_t, do_umulh_d)
 DO_ZPZZ_D(sve_sdiv_zpzz_d, int64_t, DO_DIV)
 DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
 
-#undef DO_MUL
-#undef DO_DIV
-
 #undef DO_ZPZZ
 #undef DO_ZPZZ_D
 
+/* Two-operand reduction expander, controlled by a predicate.
+ * The difference between TYPERED and TYPERET has to do with
+ * sign-extension.  E.g. for SMAX, TYPERED must be signed,
+ * but TYPERET must be unsigned so that e.g. a 32-bit value
+ * is not sign-extended to the ABI uint64_t return type.
+ */
+/* ??? If we were to vectorize this by hand the reduction ordering
+ * would change.  For integer operands, this is perfectly fine.
+ */
+#define DO_VPZ(NAME, TYPEELT, TYPERED, TYPERET, H, INIT, OP) \
+uint64_t HELPER(NAME)(void *vn, void *vg, uint32_t desc)   \
+{                                                          \
+    intptr_t iv, ib, opr_sz = simd_oprsz(desc);            \
+    TYPERED ret = INIT;                                    \
+    for (iv = ib = 0; iv < opr_sz; iv += 16, ib += 2) {    \
+        uint16_t pg = *(uint16_t *)(vg + H2(ib));          \
+        intptr_t i = 0;                                    \
+        do {                                               \
+            TYPEELT nn = *(TYPEELT *)(vn + iv + H(i));     \
+            ret = OP(ret, nn);                             \
+            i += sizeof(TYPEELT), pg >>= sizeof(TYPEELT);  \
+        } while (pg);                                      \
+    }                                                      \
+    return (TYPERET)ret;                                   \
+}
+
+#define DO_VPZ_D(NAME, TYPEE, TYPER, INIT, OP)             \
+uint64_t HELPER(NAME)(void *vn, void *vg, uint32_t desc)   \
+{                                                          \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;             \
+    TYPEE *n = vn;                                         \
+    uint8_t *pg = vg;                                      \
+    TYPER ret = INIT;                                      \
+    for (i = 0; i < opr_sz; i += 1) {                      \
+        if (pg[H1(i)] & 1) {                               \
+            TYPEE nn = n[i];                               \
+            ret = OP(ret, nn);                             \
+        }                                                  \
+    }                                                      \
+    return ret;                                            \
+}
+
+DO_VPZ(sve_orv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_ORR)
+DO_VPZ(sve_orv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_ORR)
+DO_VPZ(sve_orv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_ORR)
+DO_VPZ_D(sve_orv_d, uint64_t, uint64_t, 0, DO_ORR)
+
+DO_VPZ(sve_eorv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_EOR)
+DO_VPZ(sve_eorv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_EOR)
+DO_VPZ(sve_eorv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_EOR)
+DO_VPZ_D(sve_eorv_d, uint64_t, uint64_t, 0, DO_EOR)
+
+DO_VPZ(sve_andv_b, uint8_t, uint8_t, uint8_t, H1, -1, DO_AND)
+DO_VPZ(sve_andv_h, uint16_t, uint16_t, uint16_t, H1_2, -1, DO_AND)
+DO_VPZ(sve_andv_s, uint32_t, uint32_t, uint32_t, H1_4, -1, DO_AND)
+DO_VPZ_D(sve_andv_d, uint64_t, uint64_t, -1, DO_AND)
+
+DO_VPZ(sve_saddv_b, int8_t, uint64_t, uint64_t, H1, 0, DO_ADD)
+DO_VPZ(sve_saddv_h, int16_t, uint64_t, uint64_t, H1_2, 0, DO_ADD)
+DO_VPZ(sve_saddv_s, int32_t, uint64_t, uint64_t, H1_4, 0, DO_ADD)
+
+DO_VPZ(sve_uaddv_b, uint8_t, uint64_t, uint64_t, H1, 0, DO_ADD)
+DO_VPZ(sve_uaddv_h, uint16_t, uint64_t, uint64_t, H1_2, 0, DO_ADD)
+DO_VPZ(sve_uaddv_s, uint32_t, uint64_t, uint64_t, H1_4, 0, DO_ADD)
+DO_VPZ_D(sve_uaddv_d, uint64_t, uint64_t, 0, DO_ADD)
+
+DO_VPZ(sve_smaxv_b, int8_t, int8_t, uint8_t, H1, INT8_MIN, DO_MAX)
+DO_VPZ(sve_smaxv_h, int16_t, int16_t, uint16_t, H1_2, INT16_MIN, DO_MAX)
+DO_VPZ(sve_smaxv_s, int32_t, int32_t, uint32_t, H1_4, INT32_MIN, DO_MAX)
+DO_VPZ_D(sve_smaxv_d, int64_t, int64_t, INT64_MIN, DO_MAX)
+
+DO_VPZ(sve_umaxv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_MAX)
+DO_VPZ(sve_umaxv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_MAX)
+DO_VPZ(sve_umaxv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_MAX)
+DO_VPZ_D(sve_umaxv_d, uint64_t, uint64_t, 0, DO_MAX)
+
+DO_VPZ(sve_sminv_b, int8_t, int8_t, uint8_t, H1, INT8_MAX, DO_MIN)
+DO_VPZ(sve_sminv_h, int16_t, int16_t, uint16_t, H1_2, INT16_MAX, DO_MIN)
+DO_VPZ(sve_sminv_s, int32_t, int32_t, uint32_t, H1_4, INT32_MAX, DO_MIN)
+DO_VPZ_D(sve_sminv_d, int64_t, int64_t, INT64_MAX, DO_MIN)
+
+DO_VPZ(sve_uminv_b, uint8_t, uint8_t, uint8_t, H1, -1, DO_MIN)
+DO_VPZ(sve_uminv_h, uint16_t, uint16_t, uint16_t, H1_2, -1, DO_MIN)
+DO_VPZ(sve_uminv_s, uint32_t, uint32_t, uint32_t, H1_4, -1, DO_MIN)
+DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
+
+#undef DO_VPZ
+#undef DO_VPZ_D
+
+#undef DO_AND
+#undef DO_ORR
+#undef DO_EOR
+#undef DO_BIC
+#undef DO_ADD
+#undef DO_SUB
+#undef DO_MAX
+#undef DO_MIN
+#undef DO_ABD
+#undef DO_MUL
+#undef DO_DIV
+
 void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
 {
     intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d8b34020bb..4abc66ba5f 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -22,6 +22,7 @@
 #include "exec/exec-all.h"
 #include "tcg-op.h"
 #include "tcg-op-gvec.h"
+#include "tcg-gvec-desc.h"
 #include "qemu/log.h"
 #include "arm_ldst.h"
 #include "translate.h"
@@ -201,6 +202,69 @@ void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 
 #undef DO_ZPZZ
 
+typedef void gen_helper_gvec_reduc(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_i32);
+static void do_vpz_ool(DisasContext *s, arg_rpr_esz *a,
+                       gen_helper_gvec_reduc *fn)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+    TCGv_i64 temp = tcg_temp_new_i64();
+    TCGv_ptr t_zn = tcg_temp_new_ptr();
+    TCGv_ptr t_pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->pg));
+    fn(temp, t_zn, t_pg, desc);
+    tcg_temp_free_ptr(t_zn);
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_i32(desc);
+
+    write_fp_dreg(s, a->rd, temp);
+    tcg_temp_free_i64(temp);
+}
+
+#define DO_VPZ(NAME, name) \
+void trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \
+{                                                                        \
+    static gen_helper_gvec_reduc * const fns[4] = {                      \
+        gen_helper_sve_##name##_b, gen_helper_sve_##name##_h,            \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,            \
+    };                                                                   \
+    do_vpz_ool(s, a, fns[a->esz]);                                       \
+}
+
+DO_VPZ(ORV, orv)
+DO_VPZ(ANDV, andv)
+DO_VPZ(EORV, eorv)
+
+DO_VPZ(UADDV, uaddv)
+DO_VPZ(SMAXV, smaxv)
+DO_VPZ(UMAXV, umaxv)
+DO_VPZ(SMINV, sminv)
+DO_VPZ(UMINV, uminv)
+
+void trans_SADDV(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    gen_helper_gvec_reduc *fn;
+    switch (a->esz) {
+    case 0:
+        fn = gen_helper_sve_saddv_b;
+        break;
+    case 1:
+        fn = gen_helper_sve_saddv_h;
+        break;
+    case 2:
+        fn = gen_helper_sve_saddv_s;
+        break;
+    default:
+        unallocated_encoding(s);
+        return;
+    };
+    do_vpz_ool(s, a, fn);
+}
+
+#undef DO_VPZ
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 3bb4faaf89..c26b1377e8 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -35,6 +35,7 @@
 
 &rri			rd rn imm
 &rrr_esz		rd rn rm esz
+&rpr_esz		rd pg rn esz
 &rprr_esz		rd pg rn rm esz
 &pred_set		rd pat esz i s
 
@@ -52,6 +53,9 @@
 @rdn_pg_rm_esz		........ esz:2 ... ... ... pg:3 rm:5 rd:5	&rprr_esz rn=%reg_movprfx
 @rdm_pg_rn_esz		........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rprr_esz rm=%reg_movprfx
 
+# One register operand, with governing predicate, vector element size
+@rd_pg_rn_esz		........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9		........ ........ ...... rn:5 . rd:4		&rri imm=%imm9_16_10
 @rd_rn_i9		........ ........ ...... rn:5 rd:5		&rri imm=%imm9_16_10
@@ -90,6 +94,24 @@ UDIV_zpzz		00000100 .. 010 101 000 ... ..... .....		@rdn_pg_rm_esz
 SDIV_zpzz		00000100 .. 010 110 000 ... ..... .....		@rdm_pg_rn_esz # SDIVR
 UDIV_zpzz		00000100 .. 010 111 000 ... ..... .....		@rdm_pg_rn_esz # UDIVR
 
+### SVE Integer Reduction Group
+
+# SVE bitwise logical reduction (predicated)
+ORV			00000100 .. 011 000 001 ... ..... .....		@rd_pg_rn_esz
+EORV			00000100 .. 011 001 001 ... ..... .....		@rd_pg_rn_esz
+ANDV			00000100 .. 011 010 001 ... ..... .....		@rd_pg_rn_esz
+
+# SVE integer add reduction (predicated)
+UADDV			00000100 .. 000 001 001 ... ..... .....		@rd_pg_rn_esz
+# Note that saddv requires size != 3, which is unallocated.
+SADDV			00000100 .. 000 000 001 ... ..... .....		@rd_pg_rn_esz
+
+# SVE integer min/max reduction (predicated)
+SMAXV			00000100 .. 001 000 001 ... ..... .....		@rd_pg_rn_esz
+UMAXV			00000100 .. 001 001 001 ... ..... .....		@rd_pg_rn_esz
+SMINV			00000100 .. 001 010 001 ... ..... .....		@rd_pg_rn_esz
+UMINV			00000100 .. 001 011 001 ... ..... .....		@rd_pg_rn_esz
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 11/23] target/arm: Implement SVE bitwise shift by immediate (predicated)
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (9 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 10/23] target/arm: Implement SVE Integer Reduction Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 12/23] target/arm: Implement SVE bitwise shift by vector (predicated) Richard Henderson
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  25 +++++
 target/arm/sve_helper.c    | 265 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 124 +++++++++++++++++++++
 target/arm/sve.def         |  21 ++++
 4 files changed, 435 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 937598d6f8..2b265e9892 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -206,6 +206,31 @@ DEF_HELPER_FLAGS_3(sve_uminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_uminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_uminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_clr_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clr_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clr_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clr_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_asrd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asrd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asrd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asrd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index fca17440e7..9146e35e5b 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -42,6 +42,201 @@
 #endif
 
 
+/* Expand active predicate bits to bytes, for byte elements.
+ *  for (i = 0; i < 256; ++i) {
+ *      unsigned long m = 0;
+ *      for (j = 0; j < 8; j++) {
+ *          if ((i >> j) & 1) {
+ *              m |= 0xfful << (j << 3);
+ *          }
+ *      }
+ *      printf("0x%016lx,\n", m);
+ *  }
+ */
+static inline uint64_t expand_pred_b(uint8_t byte)
+{
+    static const uint64_t word[256] = {
+        0x0000000000000000, 0x00000000000000ff, 0x000000000000ff00,
+        0x000000000000ffff, 0x0000000000ff0000, 0x0000000000ff00ff,
+        0x0000000000ffff00, 0x0000000000ffffff, 0x00000000ff000000,
+        0x00000000ff0000ff, 0x00000000ff00ff00, 0x00000000ff00ffff,
+        0x00000000ffff0000, 0x00000000ffff00ff, 0x00000000ffffff00,
+        0x00000000ffffffff, 0x000000ff00000000, 0x000000ff000000ff,
+        0x000000ff0000ff00, 0x000000ff0000ffff, 0x000000ff00ff0000,
+        0x000000ff00ff00ff, 0x000000ff00ffff00, 0x000000ff00ffffff,
+        0x000000ffff000000, 0x000000ffff0000ff, 0x000000ffff00ff00,
+        0x000000ffff00ffff, 0x000000ffffff0000, 0x000000ffffff00ff,
+        0x000000ffffffff00, 0x000000ffffffffff, 0x0000ff0000000000,
+        0x0000ff00000000ff, 0x0000ff000000ff00, 0x0000ff000000ffff,
+        0x0000ff0000ff0000, 0x0000ff0000ff00ff, 0x0000ff0000ffff00,
+        0x0000ff0000ffffff, 0x0000ff00ff000000, 0x0000ff00ff0000ff,
+        0x0000ff00ff00ff00, 0x0000ff00ff00ffff, 0x0000ff00ffff0000,
+        0x0000ff00ffff00ff, 0x0000ff00ffffff00, 0x0000ff00ffffffff,
+        0x0000ffff00000000, 0x0000ffff000000ff, 0x0000ffff0000ff00,
+        0x0000ffff0000ffff, 0x0000ffff00ff0000, 0x0000ffff00ff00ff,
+        0x0000ffff00ffff00, 0x0000ffff00ffffff, 0x0000ffffff000000,
+        0x0000ffffff0000ff, 0x0000ffffff00ff00, 0x0000ffffff00ffff,
+        0x0000ffffffff0000, 0x0000ffffffff00ff, 0x0000ffffffffff00,
+        0x0000ffffffffffff, 0x00ff000000000000, 0x00ff0000000000ff,
+        0x00ff00000000ff00, 0x00ff00000000ffff, 0x00ff000000ff0000,
+        0x00ff000000ff00ff, 0x00ff000000ffff00, 0x00ff000000ffffff,
+        0x00ff0000ff000000, 0x00ff0000ff0000ff, 0x00ff0000ff00ff00,
+        0x00ff0000ff00ffff, 0x00ff0000ffff0000, 0x00ff0000ffff00ff,
+        0x00ff0000ffffff00, 0x00ff0000ffffffff, 0x00ff00ff00000000,
+        0x00ff00ff000000ff, 0x00ff00ff0000ff00, 0x00ff00ff0000ffff,
+        0x00ff00ff00ff0000, 0x00ff00ff00ff00ff, 0x00ff00ff00ffff00,
+        0x00ff00ff00ffffff, 0x00ff00ffff000000, 0x00ff00ffff0000ff,
+        0x00ff00ffff00ff00, 0x00ff00ffff00ffff, 0x00ff00ffffff0000,
+        0x00ff00ffffff00ff, 0x00ff00ffffffff00, 0x00ff00ffffffffff,
+        0x00ffff0000000000, 0x00ffff00000000ff, 0x00ffff000000ff00,
+        0x00ffff000000ffff, 0x00ffff0000ff0000, 0x00ffff0000ff00ff,
+        0x00ffff0000ffff00, 0x00ffff0000ffffff, 0x00ffff00ff000000,
+        0x00ffff00ff0000ff, 0x00ffff00ff00ff00, 0x00ffff00ff00ffff,
+        0x00ffff00ffff0000, 0x00ffff00ffff00ff, 0x00ffff00ffffff00,
+        0x00ffff00ffffffff, 0x00ffffff00000000, 0x00ffffff000000ff,
+        0x00ffffff0000ff00, 0x00ffffff0000ffff, 0x00ffffff00ff0000,
+        0x00ffffff00ff00ff, 0x00ffffff00ffff00, 0x00ffffff00ffffff,
+        0x00ffffffff000000, 0x00ffffffff0000ff, 0x00ffffffff00ff00,
+        0x00ffffffff00ffff, 0x00ffffffffff0000, 0x00ffffffffff00ff,
+        0x00ffffffffffff00, 0x00ffffffffffffff, 0xff00000000000000,
+        0xff000000000000ff, 0xff0000000000ff00, 0xff0000000000ffff,
+        0xff00000000ff0000, 0xff00000000ff00ff, 0xff00000000ffff00,
+        0xff00000000ffffff, 0xff000000ff000000, 0xff000000ff0000ff,
+        0xff000000ff00ff00, 0xff000000ff00ffff, 0xff000000ffff0000,
+        0xff000000ffff00ff, 0xff000000ffffff00, 0xff000000ffffffff,
+        0xff0000ff00000000, 0xff0000ff000000ff, 0xff0000ff0000ff00,
+        0xff0000ff0000ffff, 0xff0000ff00ff0000, 0xff0000ff00ff00ff,
+        0xff0000ff00ffff00, 0xff0000ff00ffffff, 0xff0000ffff000000,
+        0xff0000ffff0000ff, 0xff0000ffff00ff00, 0xff0000ffff00ffff,
+        0xff0000ffffff0000, 0xff0000ffffff00ff, 0xff0000ffffffff00,
+        0xff0000ffffffffff, 0xff00ff0000000000, 0xff00ff00000000ff,
+        0xff00ff000000ff00, 0xff00ff000000ffff, 0xff00ff0000ff0000,
+        0xff00ff0000ff00ff, 0xff00ff0000ffff00, 0xff00ff0000ffffff,
+        0xff00ff00ff000000, 0xff00ff00ff0000ff, 0xff00ff00ff00ff00,
+        0xff00ff00ff00ffff, 0xff00ff00ffff0000, 0xff00ff00ffff00ff,
+        0xff00ff00ffffff00, 0xff00ff00ffffffff, 0xff00ffff00000000,
+        0xff00ffff000000ff, 0xff00ffff0000ff00, 0xff00ffff0000ffff,
+        0xff00ffff00ff0000, 0xff00ffff00ff00ff, 0xff00ffff00ffff00,
+        0xff00ffff00ffffff, 0xff00ffffff000000, 0xff00ffffff0000ff,
+        0xff00ffffff00ff00, 0xff00ffffff00ffff, 0xff00ffffffff0000,
+        0xff00ffffffff00ff, 0xff00ffffffffff00, 0xff00ffffffffffff,
+        0xffff000000000000, 0xffff0000000000ff, 0xffff00000000ff00,
+        0xffff00000000ffff, 0xffff000000ff0000, 0xffff000000ff00ff,
+        0xffff000000ffff00, 0xffff000000ffffff, 0xffff0000ff000000,
+        0xffff0000ff0000ff, 0xffff0000ff00ff00, 0xffff0000ff00ffff,
+        0xffff0000ffff0000, 0xffff0000ffff00ff, 0xffff0000ffffff00,
+        0xffff0000ffffffff, 0xffff00ff00000000, 0xffff00ff000000ff,
+        0xffff00ff0000ff00, 0xffff00ff0000ffff, 0xffff00ff00ff0000,
+        0xffff00ff00ff00ff, 0xffff00ff00ffff00, 0xffff00ff00ffffff,
+        0xffff00ffff000000, 0xffff00ffff0000ff, 0xffff00ffff00ff00,
+        0xffff00ffff00ffff, 0xffff00ffffff0000, 0xffff00ffffff00ff,
+        0xffff00ffffffff00, 0xffff00ffffffffff, 0xffffff0000000000,
+        0xffffff00000000ff, 0xffffff000000ff00, 0xffffff000000ffff,
+        0xffffff0000ff0000, 0xffffff0000ff00ff, 0xffffff0000ffff00,
+        0xffffff0000ffffff, 0xffffff00ff000000, 0xffffff00ff0000ff,
+        0xffffff00ff00ff00, 0xffffff00ff00ffff, 0xffffff00ffff0000,
+        0xffffff00ffff00ff, 0xffffff00ffffff00, 0xffffff00ffffffff,
+        0xffffffff00000000, 0xffffffff000000ff, 0xffffffff0000ff00,
+        0xffffffff0000ffff, 0xffffffff00ff0000, 0xffffffff00ff00ff,
+        0xffffffff00ffff00, 0xffffffff00ffffff, 0xffffffffff000000,
+        0xffffffffff0000ff, 0xffffffffff00ff00, 0xffffffffff00ffff,
+        0xffffffffffff0000, 0xffffffffffff00ff, 0xffffffffffffff00,
+        0xffffffffffffffff,
+    };
+    return word[byte];
+}
+
+/* Similarly for half-word elements.
+ *  for (i = 0; i < 256; ++i) {
+ *      unsigned long m = 0;
+ *      if (i & 0xaa) {
+ *          continue;
+ *      }
+ *      for (j = 0; j < 8; j += 2) {
+ *          if ((i >> j) & 1) {
+ *              m |= 0xfffful << (j << 3);
+ *          }
+ *      }
+ *      printf("[0x%x] = 0x%016lx,\n", i, m);
+ *  }
+ */
+static inline uint64_t expand_pred_h(uint8_t byte)
+{
+    static const uint64_t word[] = {
+        [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
+        [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
+        [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
+        [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
+        [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
+        [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
+        [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
+        [0x55] = 0xffffffffffffffff,
+    };
+    return word[byte & 0x55];
+}
+
+/* Similarly for single word elements.  */
+static inline uint64_t expand_pred_s(uint8_t byte)
+{
+    static const uint64_t word[] = {
+        [0x01] = 0x00000000ffffffffull,
+        [0x10] = 0xffffffff00000000ull,
+        [0x11] = 0xffffffffffffffffull,
+    };
+    return word[byte & 0x11];
+}
+
+/* Store zero into every active element of Zd.  We will use this for two
+ * and three-operand predicated instructions for which logic dictates a
+ * zero result.  In particular, logical shift by element size, which is
+ * otherwise undefined on the host.
+ *
+ * For element sizes smaller than uint64_t, we use tables to expand
+ * the N bits of the controlling predicate to a byte mask, and clear
+ * those bytes.
+ */
+void HELPER(sve_clr_b)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= ~expand_pred_b(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clr_h)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= ~expand_pred_h(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clr_s)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= ~expand_pred_s(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clr_d)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        if (pg[H1(i)] & 1) {
+            d[i] = 0;
+        }
+    }
+}
+
 /* Given the first and last word of the result, the first and last word
    of the governing mask, and the sum of the result, return a mask that
    can be used to quickly set NZCV.  */
@@ -401,6 +596,76 @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
 #undef DO_MUL
 #undef DO_DIV
 
+/* Three-operand expander, immediate operand, controlled by a predicate.
+ */
+#define DO_ZPZI(NAME, TYPE, H, OP)                                      \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)          \
+{                                                                       \
+    intptr_t iv = 0, ib = 0, opr_sz = simd_oprsz(desc);                 \
+    TYPE imm = simd_data(desc);                                         \
+    for (iv = ib = 0; iv < opr_sz; iv += 16, ib += 2) {                 \
+        uint16_t pg = *(uint16_t *)(vg + H2(ib));                       \
+        intptr_t i = 0;                                                 \
+        do {                                                            \
+            if (pg & 1) {                                               \
+                TYPE nn = *(TYPE *)(vn + iv + H(i));                    \
+                *(TYPE *)(vd + iv + H(i)) = OP(nn, imm);                \
+            }                                                           \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);                     \
+        } while (pg);                                                   \
+    }                                                                   \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZI_D(NAME, TYPE, OP)                               \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)  \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                  \
+    TYPE *d = vd, *n = vn;                                      \
+    TYPE imm = simd_data(desc);                                 \
+    uint8_t *pg = vg;                                           \
+    for (i = 0; i < opr_sz; i += 1) {                           \
+        if (pg[H1(i)] & 1) {                                    \
+            TYPE nn = n[i];                                     \
+            d[i] = OP(nn, imm);                                 \
+        }                                                       \
+    }                                                           \
+}
+
+#define DO_SHR(N, M)  (N >> M)
+#define DO_SHL(N, M)  (N << M)
+
+/* Arithmetic shift right for division.  This rounds negative numbers
+   toward zero as per signed division.  Therefore before shifting,
+   when N is negative, add 2**M-1.  */
+#define DO_ASRD(N, M) ((N + (N < 0 ? ((__typeof(N))1 << M) - 1 : 0)) >> M)
+
+DO_ZPZI(sve_asr_zpzi_b, int8_t, H1, DO_SHR)
+DO_ZPZI(sve_asr_zpzi_h, int16_t, H1_2, DO_SHR)
+DO_ZPZI(sve_asr_zpzi_s, int32_t, H1_4, DO_SHR)
+DO_ZPZI_D(sve_asr_zpzi_d, int64_t, DO_SHR)
+
+DO_ZPZI(sve_lsr_zpzi_b, uint8_t, H1, DO_SHR)
+DO_ZPZI(sve_lsr_zpzi_h, uint16_t, H1_2, DO_SHR)
+DO_ZPZI(sve_lsr_zpzi_s, uint32_t, H1_4, DO_SHR)
+DO_ZPZI_D(sve_lsr_zpzi_d, uint64_t, DO_SHR)
+
+DO_ZPZI(sve_lsl_zpzi_b, uint8_t, H1, DO_SHL)
+DO_ZPZI(sve_lsl_zpzi_h, uint16_t, H1_2, DO_SHL)
+DO_ZPZI(sve_lsl_zpzi_s, uint32_t, H1_4, DO_SHL)
+DO_ZPZI_D(sve_lsl_zpzi_d, uint64_t, DO_SHL)
+
+DO_ZPZI(sve_asrd_b, int8_t, H1, DO_ASRD)
+DO_ZPZI(sve_asrd_h, int16_t, H1_2, DO_ASRD)
+DO_ZPZI(sve_asrd_s, int32_t, H1_4, DO_ASRD)
+DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD)
+
+#undef DO_ZPZI
+#undef DO_ZPZI_D
+#undef DO_SHR
+#undef DO_SHL
+#undef DO_ASRD
+
 void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
 {
     intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 4abc66ba5f..08388c0a07 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -37,6 +37,30 @@ typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
+/*
+ * Helpers for extracting complex instruction fields.
+ */
+
+/* See e.g. ASL (immediate, predicated).
+ * Returns -1 for unallocated encoding; diagnose later.
+ */
+static int tszimm_esz(int x)
+{
+    x >>= 3;  /* discard imm3 */
+    return 31 - clz32(x);
+}
+
+static int tszimm_shr(int x)
+{
+    return (2 * tszimm_esz(x)) - x;
+}
+
+/* See e.g. LSL (immediate, predicated).  */
+static int tszimm_shl(int x)
+{
+    return x - tszimm_esz(x);
+}
+
 /*
  * Include the generated decoder.
  */
@@ -265,6 +289,106 @@ void trans_SADDV(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
 
 #undef DO_VPZ
 
+/* Store zero into every active element of Zd.  We will use this for two
+ * and three-operand predicated instructions for which logic dictates a
+ * zero result.
+ */
+static void do_zp_clr(DisasContext *s, int rd, int pg, int esz)
+{
+    static gen_helper_gvec_2 * const fns[4] = {
+        gen_helper_sve_clr_b, gen_helper_sve_clr_h,
+        gen_helper_sve_clr_s, gen_helper_sve_clr_d,
+    };
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    tcg_gen_gvec_2_ool(vec_full_reg_offset(s, rd),
+                       pred_full_reg_offset(s, pg),
+                       vsz, vsz, 0, fns[esz]);
+}
+
+static void do_zpzi_ool(DisasContext *s, arg_rpri_esz *a,
+                        gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, a->imm, fn);
+}
+
+void trans_ASR_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_asr_zpzi_b, gen_helper_sve_asr_zpzi_h,
+        gen_helper_sve_asr_zpzi_s, gen_helper_sve_asr_zpzi_d,
+    };
+    if (a->esz < 0) {
+        /* Invalid tsz encoding -- see tszimm_esz. */
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.  For
+       arithmetic right-shift, it's the same as by one less. */
+    a->imm = MIN(a->imm, (8 << a->esz) - 1);
+    do_zpzi_ool(s, a, fns[a->esz]);
+}
+
+void trans_LSR_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_lsr_zpzi_b, gen_helper_sve_lsr_zpzi_h,
+        gen_helper_sve_lsr_zpzi_s, gen_helper_sve_lsr_zpzi_d,
+    };
+    if (a->esz < 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.
+       For logical shifts, it is a zeroing operation.  */
+    if (a->imm >= (8 << a->esz)) {
+        do_zp_clr(s, a->rd, a->pg, a->esz);
+    } else {
+        do_zpzi_ool(s, a, fns[a->esz]);
+    }
+}
+
+void trans_LSL_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_lsl_zpzi_b, gen_helper_sve_lsl_zpzi_h,
+        gen_helper_sve_lsl_zpzi_s, gen_helper_sve_lsl_zpzi_d,
+    };
+    if (a->esz < 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.
+       For logical shifts, it is a zeroing operation.  */
+    if (a->imm >= (8 << a->esz)) {
+        do_zp_clr(s, a->rd, a->pg, a->esz);
+    } else {
+        do_zpzi_ool(s, a, fns[a->esz]);
+    }
+}
+
+void trans_ASRD(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_asrd_b, gen_helper_sve_asrd_h,
+        gen_helper_sve_asrd_s, gen_helper_sve_asrd_d,
+    };
+    if (a->esz < 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.  For arithmetic
+       right shift for division, it is a zeroing operation.  */
+    if (a->imm >= (8 << a->esz)) {
+        do_zp_clr(s, a->rd, a->pg, a->esz);
+    } else {
+        do_zpzi_ool(s, a, fns[a->esz]);
+    }
+}
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index c26b1377e8..f1d2801b94 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -23,6 +23,14 @@
 # Named fields.  These are primarily for disjoint fields.
 
 %imm9_16_10		16:s6 10:3
+%imm6_22_5		22:1 5:5
+
+# A combination of tsz:imm3 -- extract esize.
+%tszimm_esz		22:2 5:5 !function=tszimm_esz
+# A combination of tsz:imm3 -- extract (2 * esize) - (tsz:imm3)
+%tszimm_shr		22:2 5:5 !function=tszimm_shr
+# A combination of tsz:imm3 -- extract (tsz:imm3) - esize
+%tszimm_shl		22:2 5:5 !function=tszimm_shl
 
 # Either a copy of rd (at bit 0), or a different source
 # as propagated via the MOVPRFX instruction.
@@ -37,6 +45,7 @@
 &rrr_esz		rd rn rm esz
 &rpr_esz		rd pg rn esz
 &rprr_esz		rd pg rn rm esz
+&rpri_esz		rd pg rn imm esz
 &pred_set		rd pat esz i s
 
 ###########################################################################
@@ -56,6 +65,10 @@
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn_esz		........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
 
+# Two register operand, one immediate operand, with predicate, element size encoded as TSZHL.
+# User must fill in imm.
+@rdn_pg_tszimm		........ .. ... ... ... pg:3 ..... rd:5		&rpri_esz rn=%reg_movprfx esz=%tszimm_esz
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9		........ ........ ...... rn:5 . rd:4		&rri imm=%imm9_16_10
 @rd_rn_i9		........ ........ ...... rn:5 rd:5		&rri imm=%imm9_16_10
@@ -112,6 +125,14 @@ UMAXV			00000100 .. 001 001 001 ... ..... .....		@rd_pg_rn_esz
 SMINV			00000100 .. 001 010 001 ... ..... .....		@rd_pg_rn_esz
 UMINV			00000100 .. 001 011 001 ... ..... .....		@rd_pg_rn_esz
 
+### SVE Shift by Immediate - Predicated Group
+
+# SVE bitwise shift by immediate (predicated)
+ASR_zpzi		00000100 .. 000 000 100 ... .. ... .....	@rdn_pg_tszimm imm=%tszimm_shr
+LSR_zpzi		00000100 .. 000 001 100 ... .. ... .....	@rdn_pg_tszimm imm=%tszimm_shl
+LSL_zpzi		00000100 .. 000 011 100 ... .. ... .....	@rdn_pg_tszimm imm=%tszimm_shr
+ASRD			00000100 .. 000 100 100 ... .. ... .....	@rdn_pg_tszimm imm=%tszimm_shr
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 12/23] target/arm: Implement SVE bitwise shift by vector (predicated)
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (10 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 11/23] target/arm: Implement SVE bitwise shift by immediate (predicated) Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 13/23] target/arm: Implement SVE bitwise shift by wide elements (predicated) Richard Henderson
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 27 +++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 25 +++++++++++++++++++++++++
 target/arm/translate-sve.c |  4 ++++
 target/arm/sve.def         |  8 ++++++++
 4 files changed, 64 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 2b265e9892..61b1287269 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -162,6 +162,33 @@ DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 9146e35e5b..20f1e60fda 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -492,6 +492,28 @@ DO_ZPZZ_D(sve_umulh_zpzz_d, uint64_t, do_umulh_d)
 DO_ZPZZ_D(sve_sdiv_zpzz_d, int64_t, DO_DIV)
 DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
 
+/* Note that all bits of the shift are significant
+   and not modulo the element size.  */
+#define DO_ASR(N, M)  (N >> MIN(M, sizeof(N) * 8 - 1))
+#define DO_LSR(N, M)  (M < sizeof(N) * 8 ? N >> M : 0)
+#define DO_LSL(N, M)  (M < sizeof(N) * 8 ? N << M : 0)
+
+DO_ZPZZ(sve_asr_zpzz_b, int8_t, H1, DO_ASR)
+DO_ZPZZ(sve_lsr_zpzz_b, uint8_t, H1_2, DO_LSR)
+DO_ZPZZ(sve_lsl_zpzz_b, uint8_t, H1_4, DO_LSL)
+
+DO_ZPZZ(sve_asr_zpzz_h, int16_t, H1, DO_ASR)
+DO_ZPZZ(sve_lsr_zpzz_h, uint16_t, H1_2, DO_LSR)
+DO_ZPZZ(sve_lsl_zpzz_h, uint16_t, H1_4, DO_LSL)
+
+DO_ZPZZ(sve_asr_zpzz_s, int32_t, H1, DO_ASR)
+DO_ZPZZ(sve_lsr_zpzz_s, uint32_t, H1_2, DO_LSR)
+DO_ZPZZ(sve_lsl_zpzz_s, uint32_t, H1_4, DO_LSL)
+
+DO_ZPZZ_D(sve_asr_zpzz_d, int64_t, DO_ASR)
+DO_ZPZZ_D(sve_lsr_zpzz_d, uint64_t, DO_LSR)
+DO_ZPZZ_D(sve_lsl_zpzz_d, uint64_t, DO_LSL)
+
 #undef DO_ZPZZ
 #undef DO_ZPZZ_D
 
@@ -595,6 +617,9 @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
 #undef DO_ABD
 #undef DO_MUL
 #undef DO_DIV
+#undef DO_ASR
+#undef DO_LSR
+#undef DO_LSL
 
 /* Three-operand expander, immediate operand, controlled by a predicate.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 08388c0a07..685a3ba249 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -190,6 +190,10 @@ DO_ZPZZ(MUL, mul)
 DO_ZPZZ(SMULH, smulh)
 DO_ZPZZ(UMULH, umulh)
 
+DO_ZPZZ(ASR, asr)
+DO_ZPZZ(LSR, lsr)
+DO_ZPZZ(LSL, lsl)
+
 void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 {
     gen_helper_gvec_4 *fn;
diff --git a/target/arm/sve.def b/target/arm/sve.def
index f1d2801b94..9f9c0803a0 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -133,6 +133,14 @@ LSR_zpzi		00000100 .. 000 001 100 ... .. ... .....	@rdn_pg_tszimm imm=%tszimm_sh
 LSL_zpzi		00000100 .. 000 011 100 ... .. ... .....	@rdn_pg_tszimm imm=%tszimm_shr
 ASRD			00000100 .. 000 100 100 ... .. ... .....	@rdn_pg_tszimm imm=%tszimm_shr
 
+# SVE bitwise shift by vector (predicated)
+ASR_zpzz		00000100 .. 010 000 100 ... ..... .....		@rdn_pg_rm_esz
+LSR_zpzz		00000100 .. 010 001 100 ... ..... .....		@rdn_pg_rm_esz
+LSL_zpzz		00000100 .. 010 011 100 ... ..... .....		@rdn_pg_rm_esz
+ASR_zpzz		00000100 .. 010 100 100 ... ..... .....		@rdm_pg_rn_esz # ASRR
+LSR_zpzz		00000100 .. 010 101 100 ... ..... .....		@rdm_pg_rn_esz # LSRR
+LSL_zpzz		00000100 .. 010 111 100 ... ..... .....		@rdm_pg_rn_esz # LSLR
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 13/23] target/arm: Implement SVE bitwise shift by wide elements (predicated)
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (11 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 12/23] target/arm: Implement SVE bitwise shift by vector (predicated) Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 14/23] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group Richard Henderson
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 21 +++++++++++++++++++++
 target/arm/sve_helper.c    | 36 ++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 20 ++++++++++++++++++++
 target/arm/sve.def         |  6 ++++++
 4 files changed, 83 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 61b1287269..a2db3e2fd9 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -189,6 +189,27 @@ DEF_HELPER_FLAGS_5(sve_lsl_zpzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_lsl_zpzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 20f1e60fda..3be6d1ae05 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -517,6 +517,42 @@ DO_ZPZZ_D(sve_lsl_zpzz_d, uint64_t, DO_LSL)
 #undef DO_ZPZZ
 #undef DO_ZPZZ_D
 
+/* Three-operand expander, controlled by a predicate, in which the
+ * third operand is "wide".  That is, for D = N op M, the same 64-bit
+ * value of M is used with all of the narrower values of N.
+ */
+#define DO_ZPZW(NAME, TYPE, TYPEW, H, OP)                               \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
+{                                                                       \
+    intptr_t iv = 0, ib = 0, opr_sz = simd_oprsz(desc);                 \
+    for (iv = ib = 0; iv < opr_sz; iv += 16, ib += 2) {                 \
+        uint16_t pg = *(uint16_t *)(vg + H2(ib));                       \
+        TYPEW mm = *(TYPEW *)(vm + iv);                                 \
+        intptr_t i = 0;                                                 \
+        do {                                                            \
+            if (pg & 1) {                                               \
+                TYPE nn = *(TYPE *)(vn + iv + H(i));                    \
+                *(TYPE *)(vd + iv + H(i)) = OP(nn, mm);                 \
+            }                                                           \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);                     \
+        } while (pg);                                                   \
+    }                                                                   \
+}
+
+DO_ZPZW(sve_asr_zpzw_b, int8_t, uint64_t, H1, DO_ASR)
+DO_ZPZW(sve_lsr_zpzw_b, uint8_t, uint64_t, H1, DO_LSR)
+DO_ZPZW(sve_lsl_zpzw_b, uint8_t, uint64_t, H1, DO_LSL)
+
+DO_ZPZW(sve_asr_zpzw_h, int16_t, uint64_t, H1_2, DO_ASR)
+DO_ZPZW(sve_lsr_zpzw_h, uint16_t, uint64_t, H1_2, DO_LSR)
+DO_ZPZW(sve_lsl_zpzw_h, uint16_t, uint64_t, H1_2, DO_LSL)
+
+DO_ZPZW(sve_asr_zpzw_s, int32_t, uint64_t, H1_4, DO_ASR)
+DO_ZPZW(sve_lsr_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSR)
+DO_ZPZW(sve_lsl_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
+
+#undef DO_ZPZW
+
 /* Two-operand reduction expander, controlled by a predicate.
  * The difference between TYPERED and TYPERET has to do with
  * sign-extension.  E.g. for SMAX, TYPERED must be signed,
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 685a3ba249..91f07d57e3 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -230,6 +230,26 @@ void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 
 #undef DO_ZPZZ
 
+#define DO_ZPZW(NAME, name) \
+void trans_##NAME##_zpzw(DisasContext *s, arg_rprr_esz *a, uint32_t insn) \
+{                                                                         \
+    static gen_helper_gvec_4 * const fns[3] = {                           \
+        gen_helper_sve_##name##_zpzw_b, gen_helper_sve_##name##_zpzw_h,   \
+        gen_helper_sve_##name##_zpzw_s,                                   \
+    };                                                                    \
+    if ((unsigned)a->esz < 3) {                                           \
+        do_zpzz_ool(s, a, fns[a->esz]);                                   \
+    } else {                                                              \
+        unallocated_encoding(s);                                          \
+    }                                                                     \
+}
+
+DO_ZPZW(ASR, asr)
+DO_ZPZW(LSR, lsr)
+DO_ZPZW(LSL, lsl)
+
+#undef DO_ZPZW
+
 typedef void gen_helper_gvec_reduc(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_i32);
 static void do_vpz_ool(DisasContext *s, arg_rpr_esz *a,
                        gen_helper_gvec_reduc *fn)
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 9f9c0803a0..66be950ca5 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -141,6 +141,12 @@ ASR_zpzz		00000100 .. 010 100 100 ... ..... .....		@rdm_pg_rn_esz # ASRR
 LSR_zpzz		00000100 .. 010 101 100 ... ..... .....		@rdm_pg_rn_esz # LSRR
 LSL_zpzz		00000100 .. 010 111 100 ... ..... .....		@rdm_pg_rn_esz # LSLR
 
+# SVE bitwise shift by wide elements (predicated)
+# Note these require size != 3.
+ASR_zpzw		00000100 .. 011 000 100 ... ..... .....		@rdn_pg_rm_esz
+LSR_zpzw		00000100 .. 011 001 100 ... ..... .....		@rdn_pg_rm_esz
+LSL_zpzw		00000100 .. 011 011 100 ... ..... .....		@rdn_pg_rm_esz
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 14/23] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (12 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 13/23] target/arm: Implement SVE bitwise shift by wide elements (predicated) Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 15/23] target/arm: Implement SVE Integer Multiply-Add Group Richard Henderson
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  60 +++++++++++++++++++++
 target/arm/sve_helper.c    | 128 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 107 +++++++++++++++++++++++++++++++++++++
 target/arm/sve.def         |  21 ++++++++
 4 files changed, 316 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index a2db3e2fd9..e9382de300 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -279,6 +279,66 @@ DEF_HELPER_FLAGS_4(sve_asrd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_asrd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_asrd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_cls_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cls_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_clz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_clz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_clz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_clz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_cnot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnot_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnot_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_not_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_not_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_not_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_not_zpz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_sxtb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_sxtb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_sxtb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_uxtb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uxtb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uxtb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_sxth_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_sxth_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_uxth_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uxth_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_sxtw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uxtw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_abs_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_abs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_abs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_abs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_neg_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_neg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_neg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_neg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 3be6d1ae05..481b3bdefe 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -553,6 +553,134 @@ DO_ZPZW(sve_lsl_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
 
 #undef DO_ZPZW
 
+/* Fully general two-operand expander, controlled by a predicate.
+ */
+#define DO_ZPZ(NAME, TYPE, H, OP)                               \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)  \
+{                                                               \
+    intptr_t iv = 0, ib = 0, opr_sz = simd_oprsz(desc);         \
+    for (iv = ib = 0; iv < opr_sz; iv += 16, ib += 2) {         \
+        uint16_t pg = *(uint16_t *)(vg + H2(ib));               \
+        intptr_t i = 0;                                         \
+        do {                                                    \
+            if (pg & 1) {                                       \
+                TYPE nn = *(TYPE *)(vn + iv + H(i));            \
+                *(TYPE *)(vd + iv + H(i)) = OP(nn);             \
+            }                                                   \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);             \
+        } while (pg);                                           \
+    }                                                           \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZ_D(NAME, TYPE, OP)                                \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)  \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                  \
+    TYPE *d = vd, *n = vn;                                      \
+    uint8_t *pg = vg;                                           \
+    for (i = 0; i < opr_sz; i += 1) {                           \
+        if (pg[H1(i)] & 1) {                                    \
+            TYPE nn = n[i];                                     \
+            d[i] = OP(nn);                                      \
+        }                                                       \
+    }                                                           \
+}
+
+#define DO_CLS_B(N)   (clrsb32(N) - 24)
+#define DO_CLS_H(N)   (clrsb32(N) - 16)
+
+DO_ZPZ(sve_cls_b, int8_t, H1, DO_CLS_B)
+DO_ZPZ(sve_cls_h, int16_t, H1_2, DO_CLS_H)
+DO_ZPZ(sve_cls_s, int32_t, H1_4, clrsb32)
+DO_ZPZ_D(sve_cls_d, int64_t, clrsb64)
+
+#define DO_CLZ_B(N)   (clz32(N) - 24)
+#define DO_CLZ_H(N)   (clz32(N) - 16)
+
+DO_ZPZ(sve_clz_b, uint8_t, H1, DO_CLZ_B)
+DO_ZPZ(sve_clz_h, uint16_t, H1_2, DO_CLZ_H)
+DO_ZPZ(sve_clz_s, uint32_t, H1_4, clz32)
+DO_ZPZ_D(sve_clz_d, uint64_t, clz64)
+
+DO_ZPZ(sve_cnt_zpz_b, uint8_t, H1, ctpop8)
+DO_ZPZ(sve_cnt_zpz_h, uint16_t, H1_2, ctpop16)
+DO_ZPZ(sve_cnt_zpz_s, uint32_t, H1_4, ctpop32)
+DO_ZPZ_D(sve_cnt_zpz_d, uint64_t, ctpop64)
+
+#define DO_CNOT(N)    (N == 0)
+
+DO_ZPZ(sve_cnot_b, uint8_t, H1, DO_CNOT)
+DO_ZPZ(sve_cnot_h, uint16_t, H1_2, DO_CNOT)
+DO_ZPZ(sve_cnot_s, uint32_t, H1_4, DO_CNOT)
+DO_ZPZ_D(sve_cnot_d, uint64_t, DO_CNOT)
+
+#define DO_FABS(N)    (N & ((__typeof(N))-1 >> 1))
+
+DO_ZPZ(sve_fabs_h, uint16_t, H1_2, DO_FABS)
+DO_ZPZ(sve_fabs_s, uint32_t, H1_4, DO_FABS)
+DO_ZPZ_D(sve_fabs_d, uint64_t, DO_FABS)
+
+#define DO_FNEG(N)    (N ^ ~((__typeof(N))-1 >> 1))
+
+DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
+DO_ZPZ(sve_fneg_s, uint32_t, H1_4, DO_FNEG)
+DO_ZPZ_D(sve_fneg_d, uint64_t, DO_FNEG)
+
+#define DO_NOT(N)    (~N)
+
+DO_ZPZ(sve_not_zpz_b, uint8_t, H1, DO_NOT)
+DO_ZPZ(sve_not_zpz_h, uint16_t, H1_2, DO_NOT)
+DO_ZPZ(sve_not_zpz_s, uint32_t, H1_4, DO_NOT)
+DO_ZPZ_D(sve_not_zpz_d, uint64_t, DO_NOT)
+
+#define DO_SXTB(N)    ((int8_t)N)
+#define DO_SXTH(N)    ((int16_t)N)
+#define DO_SXTS(N)    ((int32_t)N)
+#define DO_UXTB(N)    ((uint8_t)N)
+#define DO_UXTH(N)    ((uint16_t)N)
+#define DO_UXTS(N)    ((uint32_t)N)
+
+DO_ZPZ(sve_sxtb_h, uint16_t, H1_2, DO_SXTB)
+DO_ZPZ(sve_sxtb_s, uint32_t, H1_4, DO_SXTB)
+DO_ZPZ(sve_sxth_s, uint32_t, H1_4, DO_SXTH)
+DO_ZPZ_D(sve_sxtb_d, uint64_t, DO_SXTB)
+DO_ZPZ_D(sve_sxth_d, uint64_t, DO_SXTH)
+DO_ZPZ_D(sve_sxtw_d, uint64_t, DO_SXTS)
+
+DO_ZPZ(sve_uxtb_h, uint16_t, H1_2, DO_UXTB)
+DO_ZPZ(sve_uxtb_s, uint32_t, H1_4, DO_UXTB)
+DO_ZPZ(sve_uxth_s, uint32_t, H1_4, DO_UXTH)
+DO_ZPZ_D(sve_uxtb_d, uint64_t, DO_UXTB)
+DO_ZPZ_D(sve_uxth_d, uint64_t, DO_UXTH)
+DO_ZPZ_D(sve_uxtw_d, uint64_t, DO_UXTS)
+
+#define DO_ABS(N)    (N < 0 ? -N : N)
+
+DO_ZPZ(sve_abs_b, int8_t, H1, DO_ABS)
+DO_ZPZ(sve_abs_h, int16_t, H1_2, DO_ABS)
+DO_ZPZ(sve_abs_s, int32_t, H1_4, DO_ABS)
+DO_ZPZ_D(sve_abs_d, int64_t, DO_ABS)
+
+#define DO_NEG(N)    (-N)
+
+DO_ZPZ(sve_neg_b, uint8_t, H1, DO_NEG)
+DO_ZPZ(sve_neg_h, uint16_t, H1_2, DO_NEG)
+DO_ZPZ(sve_neg_s, uint32_t, H1_4, DO_NEG)
+DO_ZPZ_D(sve_neg_d, uint64_t, DO_NEG)
+
+#undef DO_CLS_B
+#undef DO_CLS_H
+#undef DO_CLZ_B
+#undef DO_CLZ_H
+#undef DO_CNOT
+#undef DO_FABS
+#undef DO_FNEG
+#undef DO_ABS
+#undef DO_NEG
+#undef DO_ZPZ
+#undef DO_ZPZ_D
+
 /* Two-operand reduction expander, controlled by a predicate.
  * The difference between TYPERED and TYPERET has to do with
  * sign-extension.  E.g. for SMAX, TYPERED must be signed,
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 91f07d57e3..7dbc43fb6e 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -250,6 +250,113 @@ DO_ZPZW(LSL, lsl)
 
 #undef DO_ZPZW
 
+static void do_zpz_ool(DisasContext *s, arg_rpr_esz *a, gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, 0, fn);
+}
+
+#define DO_ZPZ(NAME, name) \
+void trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn)   \
+{                                                                   \
+    static gen_helper_gvec_3 * const fns[4] = {                     \
+        gen_helper_sve_##name##_b, gen_helper_sve_##name##_h,       \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,       \
+    };                                                              \
+    do_zpz_ool(s, a, fns[a->esz]);                                  \
+}
+
+DO_ZPZ(CLS, cls)
+DO_ZPZ(CLZ, clz)
+DO_ZPZ(CNT_zpz, cnt_zpz)
+DO_ZPZ(CNOT, cnot)
+DO_ZPZ(NOT_zpz, not_zpz)
+DO_ZPZ(ABS, abs)
+DO_ZPZ(NEG, neg)
+
+void trans_FABS(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_fabs_h,
+        gen_helper_sve_fabs_s,
+        gen_helper_sve_fabs_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+void trans_FNEG(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_fneg_h,
+        gen_helper_sve_fneg_s,
+        gen_helper_sve_fneg_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+void trans_SXTB(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_sxtb_h,
+        gen_helper_sve_sxtb_s,
+        gen_helper_sve_sxtb_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+void trans_UXTB(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_uxtb_h,
+        gen_helper_sve_uxtb_s,
+        gen_helper_sve_uxtb_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+void trans_SXTH(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL, NULL,
+        gen_helper_sve_sxth_s,
+        gen_helper_sve_sxth_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+void trans_UXTH(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL, NULL,
+        gen_helper_sve_uxth_s,
+        gen_helper_sve_uxth_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+void trans_SXTW(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ool(s, a, a->esz == 3 ? gen_helper_sve_sxtw_d : NULL);
+}
+
+void trans_UXTW(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ool(s, a, a->esz == 3 ? gen_helper_sve_uxtw_d : NULL);
+}
+
+#undef DO_ZPZ
+
 typedef void gen_helper_gvec_reduc(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_i32);
 static void do_vpz_ool(DisasContext *s, arg_rpr_esz *a,
                        gen_helper_gvec_reduc *fn)
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 66be950ca5..955a0275a1 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -147,6 +147,27 @@ ASR_zpzw		00000100 .. 011 000 100 ... ..... .....		@rdn_pg_rm_esz
 LSR_zpzw		00000100 .. 011 001 100 ... ..... .....		@rdn_pg_rm_esz
 LSL_zpzw		00000100 .. 011 011 100 ... ..... .....		@rdn_pg_rm_esz
 
+### SVE Integer Arithmetic - Unary Predicated Group
+
+# SVE unary bit operations (predicated)
+CLS			00000100 .. 011 000 101 ... ..... .....		@rd_pg_rn_esz
+CLZ			00000100 .. 011 001 101 ... ..... .....		@rd_pg_rn_esz
+CNT_zpz			00000100 .. 011 010 101 ... ..... .....		@rd_pg_rn_esz
+CNOT			00000100 .. 011 011 101 ... ..... .....		@rd_pg_rn_esz
+NOT_zpz			00000100 .. 011 110 101 ... ..... .....		@rd_pg_rn_esz
+FABS			00000100 .. 011 100 101 ... ..... .....		@rd_pg_rn_esz # Note size != 0
+FNEG			00000100 .. 011 101 101 ... ..... .....		@rd_pg_rn_esz # Note size != 0
+
+# SVE integer unary operations (predicated)
+ABS			00000100 .. 010 110 101 ... ..... .....		@rd_pg_rn_esz
+NEG			00000100 .. 010 111 101 ... ..... .....		@rd_pg_rn_esz
+SXTB			00000100 .. 010 000 101 ... ..... .....		@rd_pg_rn_esz # Note size != 0
+UXTB			00000100 .. 010 001 101 ... ..... .....		@rd_pg_rn_esz # Note size != 0
+SXTH			00000100 .. 010 010 101 ... ..... .....		@rd_pg_rn_esz # Note size > 1
+UXTH			00000100 .. 010 011 101 ... ..... .....		@rd_pg_rn_esz # Note size > 1
+SXTW			00000100 .. 010 100 101 ... ..... .....		@rd_pg_rn_esz # Note size == 3
+UXTW			00000100 .. 010 101 101 ... ..... .....		@rd_pg_rn_esz # Note size == 3
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 15/23] target/arm: Implement SVE Integer Multiply-Add Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (13 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 14/23] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 16/23] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group Richard Henderson
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 18 ++++++++++++++
 target/arm/sve_helper.c    | 58 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 27 +++++++++++++++++++++
 target/arm/sve.def         | 15 ++++++++++++
 4 files changed, 118 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index e9382de300..abed625123 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -339,6 +339,24 @@ DEF_HELPER_FLAGS_4(sve_neg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_neg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_neg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(sve_mla_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mla_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mla_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mla_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_mls_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mls_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mls_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mls_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 481b3bdefe..8235784a82 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -855,6 +855,64 @@ DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD)
 #undef DO_SHL
 #undef DO_ASRD
 
+/* Fully general four-operand expander, controlled by a predicate.
+ */
+#define DO_ZPZZZ(NAME, TYPE, H, OP)                           \
+void HELPER(NAME)(void *vd, void *va, void *vn, void *vm,     \
+                  void *vg, uint32_t desc)                    \
+{                                                             \
+    intptr_t iv = 0, ib = 0, opr_sz = simd_oprsz(desc);       \
+    for (iv = ib = 0; iv < opr_sz; iv += 16, ib += 2) {       \
+        uint16_t pg = *(uint16_t *)(vg + H2(ib));             \
+        intptr_t i = 0;                                       \
+        do {                                                  \
+            if (pg & 1) {                                     \
+                TYPE nn = *(TYPE *)(vn + iv + H(i));          \
+                TYPE mm = *(TYPE *)(vm + iv + H(i));          \
+                TYPE aa = *(TYPE *)(va + iv + H(i));          \
+                *(TYPE *)(vd + iv + H(i)) = OP(aa, nn, mm);   \
+            }                                                 \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);           \
+        } while (pg);                                         \
+    }                                                         \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZZZ_D(NAME, TYPE, OP)                            \
+void HELPER(NAME)(void *vd, void *va, void *vn, void *vm,     \
+                  void *vg, uint32_t desc)                    \
+{                                                             \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                \
+    TYPE *d = vd, *a = va, *n = vn, *m = vm;                  \
+    uint8_t *pg = vg;                                         \
+    for (i = 0; i < opr_sz; i += 1) {                         \
+        if (pg[H1(i)] & 1) {                                  \
+            TYPE aa = a[i], nn = n[i], mm = m[i];             \
+            d[i] = OP(aa, nn, mm);                            \
+        }                                                     \
+    }                                                         \
+}
+
+#define DO_MLA(A, N, M)  (A + N * M)
+#define DO_MLS(A, N, M)  (A - N * M)
+
+DO_ZPZZZ(sve_mla_b, uint8_t, H1, DO_MLA)
+DO_ZPZZZ(sve_mls_b, uint8_t, H1, DO_MLS)
+
+DO_ZPZZZ(sve_mla_h, uint16_t, H1_2, DO_MLA)
+DO_ZPZZZ(sve_mls_h, uint16_t, H1_2, DO_MLS)
+
+DO_ZPZZZ(sve_mla_s, uint32_t, H1_4, DO_MLA)
+DO_ZPZZZ(sve_mls_s, uint32_t, H1_4, DO_MLS)
+
+DO_ZPZZZ_D(sve_mla_d, uint64_t, DO_MLA)
+DO_ZPZZZ_D(sve_mls_d, uint64_t, DO_MLS)
+
+#undef DO_MLA
+#undef DO_MLS
+#undef DO_ZPZZZ
+#undef DO_ZPZZZ_D
+
 void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
 {
     intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 7dbc43fb6e..83793ab169 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -520,6 +520,33 @@ void trans_ASRD(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
     }
 }
 
+static void do_zpzzz_ool(DisasContext *s, arg_rprrr_esz *a,
+                         gen_helper_gvec_5 *fn)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    tcg_gen_gvec_5_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->ra),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, 0, fn);
+}
+
+#define DO_ZPZZZ(NAME, name) \
+void trans_##NAME(DisasContext *s, arg_rprrr_esz *a, uint32_t insn)  \
+{                                                                    \
+    static gen_helper_gvec_5 * const fns[4] = {                      \
+        gen_helper_sve_##name##_b, gen_helper_sve_##name##_h,        \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,        \
+    };                                                               \
+    do_zpzzz_ool(s, a, fns[a->esz]);                                 \
+}
+
+DO_ZPZZZ(MLA, mla)
+DO_ZPZZZ(MLS, mls)
+
+#undef DO_ZPZZZ
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 955a0275a1..3ae871394c 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -45,6 +45,7 @@
 &rrr_esz		rd rn rm esz
 &rpr_esz		rd pg rn esz
 &rprr_esz		rd pg rn rm esz
+&rprrr_esz		rd pg rn rm ra esz
 &rpri_esz		rd pg rn imm esz
 &pred_set		rd pat esz i s
 
@@ -62,6 +63,10 @@
 @rdn_pg_rm_esz		........ esz:2 ... ... ... pg:3 rm:5 rd:5	&rprr_esz rn=%reg_movprfx
 @rdm_pg_rn_esz		........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rprr_esz rm=%reg_movprfx
 
+# Three register operand, with governing predicate, vector element size
+@rda_pg_rn_rm_esz	........ esz:2 . rm:5  ... pg:3 rn:5 rd:5	&rprrr_esz ra=%reg_movprfx
+@rdn_pg_ra_rm_esz	........ esz:2 . rm:5  ... pg:3 ra:5 rd:5	&rprrr_esz rn=%reg_movprfx
+
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn_esz		........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
 
@@ -168,6 +173,16 @@ UXTH			00000100 .. 010 011 101 ... ..... .....		@rd_pg_rn_esz # Note size > 1
 SXTW			00000100 .. 010 100 101 ... ..... .....		@rd_pg_rn_esz # Note size == 3
 UXTW			00000100 .. 010 101 101 ... ..... .....		@rd_pg_rn_esz # Note size == 3
 
+### SVE Integer Multiply-Add Group
+
+# SVE integer multiply-add writing addend (predicated)
+MLA			00000100 .. 0 ..... 010 ... ..... .....		@rda_pg_rn_rm_esz
+MLS			00000100 .. 0 ..... 011 ... ..... .....		@rda_pg_rn_rm_esz
+
+# SVE integer multiply-add writing multiplicand (predicated)
+MLA			00000100 .. 0 ..... 110 ... ..... .....		@rdn_pg_ra_rm_esz # MAD
+MLS			00000100 .. 0 ..... 111 ... ..... .....		@rdn_pg_ra_rm_esz # MSB
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 16/23] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (14 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 15/23] target/arm: Implement SVE Integer Multiply-Add Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 17/23] target/arm: Implement SVE Index Generation Group Richard Henderson
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 30 ++++++++++++++++++++++++++++++
 target/arm/sve.def         | 13 +++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 83793ab169..7edec8ba96 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -127,6 +127,36 @@ static void do_zzz_genfn(DisasContext *s, arg_rrr_esz *a, GVecGen3Fn *fn)
     do_genfn3(s, fn, a->esz, a->rd, a->rn, a->rm);
 }
 
+void trans_ADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_add);
+}
+
+void trans_SUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_sub);
+}
+
+void trans_SQADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_ssadd);
+}
+
+void trans_SQSUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_sssub);
+}
+
+void trans_UQADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_usadd);
+}
+
+void trans_UQSUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_genfn(s, a, tcg_gen_gvec_ussub);
+}
+
 void trans_AND_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
 {
     do_zzz_genfn(s, a, tcg_gen_gvec_and);
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 3ae871394c..a33fec4f33 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -53,6 +53,9 @@
 # Named instruction formats.  These are generally used to
 # reduce the amount of duplication between instruction patterns.
 
+# Three operand
+@rd_rn_rm_esz		........ esz:2 . rm:5  ... ...  rn:5 rd:5	&rrr_esz
+
 # Three operand with unused vector element size
 @rd_rn_rm		........ ... rm:5  ... ...  rn:5 rd:5		&rrr_esz esz=0
 
@@ -183,6 +186,16 @@ MLS			00000100 .. 0 ..... 011 ... ..... .....		@rda_pg_rn_rm_esz
 MLA			00000100 .. 0 ..... 110 ... ..... .....		@rdn_pg_ra_rm_esz # MAD
 MLS			00000100 .. 0 ..... 111 ... ..... .....		@rdn_pg_ra_rm_esz # MSB
 
+### SVE Integer Arithmetic - Unpredicated Group
+
+# SVE integer add/subtract vectors (unpredicated)
+ADD_zzz			00000100 .. 1 ..... 000 000 ..... .....		@rd_rn_rm_esz
+SUB_zzz			00000100 .. 1 ..... 000 001 ..... .....		@rd_rn_rm_esz
+SQADD_zzz		00000100 .. 1 ..... 000 100 ..... .....		@rd_rn_rm_esz
+UQADD_zzz		00000100 .. 1 ..... 000 101 ..... .....		@rd_rn_rm_esz
+SQSUB_zzz		00000100 .. 1 ..... 000 110 ..... .....		@rd_rn_rm_esz
+UQSUB_zzz		00000100 .. 1 ..... 000 111 ..... .....		@rd_rn_rm_esz
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 17/23] target/arm: Implement SVE Index Generation Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (15 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 16/23] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 18/23] target/arm: Implement SVE Stack Allocation Group Richard Henderson
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  5 ++++
 target/arm/sve_helper.c    | 40 ++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.def         | 14 +++++++++++
 4 files changed, 121 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index abed625123..c8eae5eb62 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -357,6 +357,11 @@ DEF_HELPER_FLAGS_6(sve_mls_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_mls_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_index_b, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_4(sve_index_h, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_4(sve_index_s, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_4(sve_index_d, TCG_CALL_NO_RWG, void, ptr, i64, i64, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 8235784a82..d8684b9457 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -913,6 +913,46 @@ DO_ZPZZZ_D(sve_mls_d, uint64_t, DO_MLS)
 #undef DO_ZPZZZ
 #undef DO_ZPZZZ_D
 
+void HELPER(sve_index_b)(void *vd, uint32_t start,
+                         uint32_t incr, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint8_t *d = vd;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[H1(i)] = start + i * incr;
+    }
+}
+
+void HELPER(sve_index_h)(void *vd, uint32_t start,
+                         uint32_t incr, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 2;
+    uint16_t *d = vd;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[H2(i)] = start + i * incr;
+    }
+}
+
+void HELPER(sve_index_s)(void *vd, uint32_t start,
+                         uint32_t incr, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t *d = vd;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[H4(i)] = start + i * incr;
+    }
+}
+
+void HELPER(sve_index_d)(void *vd, uint64_t start,
+                         uint64_t incr, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = start + i * incr;
+    }
+}
+
 void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
 {
     intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 7edec8ba96..7e1bf7d623 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -577,6 +577,68 @@ DO_ZPZZZ(MLS, mls)
 
 #undef DO_ZPZZZ
 
+static void do_index(DisasContext *s, int esz, int rd,
+                     TCGv_i64 start, TCGv_i64 incr)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+    TCGv_ptr t_zd = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_zd, cpu_env, vec_full_reg_offset(s, rd));
+    if (esz == 3) {
+        gen_helper_sve_index_d(t_zd, start, incr, desc);
+    } else {
+        static void (*fns[3])(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32) = {
+            gen_helper_sve_index_b,
+            gen_helper_sve_index_h,
+            gen_helper_sve_index_s,
+        };
+        TCGv_i32 s32 = tcg_temp_new_i32();
+        TCGv_i32 i32 = tcg_temp_new_i32();
+
+        tcg_gen_extrl_i64_i32(s32, start);
+        tcg_gen_extrl_i64_i32(i32, incr);
+        fns[esz](t_zd, s32, i32, desc);
+
+        tcg_temp_free_i32(s32);
+        tcg_temp_free_i32(i32);
+    }
+    tcg_temp_free_ptr(t_zd);
+    tcg_temp_free_i32(desc);
+}
+
+void trans_INDEX_ii(DisasContext *s, arg_INDEX_ii *a, uint32_t insn)
+{
+    TCGv_i64 start = tcg_const_i64(a->imm1);
+    TCGv_i64 incr = tcg_const_i64(a->imm2);
+    do_index(s, a->esz, a->rd, start, incr);
+    tcg_temp_free_i64(start);
+    tcg_temp_free_i64(incr);
+}
+
+void trans_INDEX_ir(DisasContext *s, arg_INDEX_ir *a, uint32_t insn)
+{
+    TCGv_i64 start = tcg_const_i64(a->imm);
+    TCGv_i64 incr = cpu_reg(s, a->rm);
+    do_index(s, a->esz, a->rd, start, incr);
+    tcg_temp_free_i64(start);
+}
+
+void trans_INDEX_ri(DisasContext *s, arg_INDEX_ri *a, uint32_t insn)
+{
+    TCGv_i64 start = cpu_reg(s, a->rn);
+    TCGv_i64 incr = tcg_const_i64(a->imm);
+    do_index(s, a->esz, a->rd, start, incr);
+    tcg_temp_free_i64(incr);
+}
+
+void trans_INDEX_rr(DisasContext *s, arg_INDEX_rr *a, uint32_t insn)
+{
+    TCGv_i64 start = cpu_reg(s, a->rn);
+    TCGv_i64 incr = cpu_reg(s, a->rm);
+    do_index(s, a->esz, a->rd, start, incr);
+}
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index a33fec4f33..0cac3a974f 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -204,6 +204,20 @@ ORR_zzz			00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm
 EOR_zzz			00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm
 BIC_zzz			00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm
 
+### SVE Index Generation Group
+
+# SVE index generation (immediate start, immediate increment)
+INDEX_ii		00000100 esz:2 1 imm2:s5 010000 imm1:s5 rd:5
+
+# SVE index generation (immediate start, register increment)
+INDEX_ir		00000100 esz:2 1 rm:5 010010 imm:s5 rd:5
+
+# SVE index generation (register start, immediate increment)
+INDEX_ri		00000100 esz:2 1 imm:s5 010001 rn:5 rd:5
+
+# SVE index generation (register start, register increment)
+INDEX_rr		00000100 .. 1 ..... 010011 ..... .....		@rd_rn_rm_esz
+
 ### SVE Predicate Generation Group
 
 # SVE initialize predicate (PTRUE, PTRUES)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 18/23] target/arm: Implement SVE Stack Allocation Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (16 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 17/23] target/arm: Implement SVE Index Generation Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 19/23] target/arm: Implement SVE Bitwise Shift - Unpredicated Group Richard Henderson
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 18 ++++++++++++++++++
 target/arm/sve.def         | 12 ++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 7e1bf7d623..026af7a162 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -639,6 +639,24 @@ void trans_INDEX_rr(DisasContext *s, arg_INDEX_rr *a, uint32_t insn)
     do_index(s, a->esz, a->rd, start, incr);
 }
 
+void trans_ADDVL(DisasContext *s, arg_ADDVL *a, uint32_t insn)
+{
+    TCGv_i64 reg = cpu_reg_sp(s, a->rd);
+    tcg_gen_addi_i64(reg, reg, a->imm * vec_full_reg_size(s));
+}
+
+void trans_ADDPL(DisasContext *s, arg_ADDPL *a, uint32_t insn)
+{
+    TCGv_i64 reg = cpu_reg_sp(s, a->rd);
+    tcg_gen_addi_i64(reg, reg, a->imm * pred_full_reg_size(s));
+}
+
+void trans_RDVL(DisasContext *s, arg_RDVL *a, uint32_t insn)
+{
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+    tcg_gen_movi_i64(reg, a->imm * vec_full_reg_size(s));
+}
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 0cac3a974f..7428ebc5cd 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -73,6 +73,9 @@
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn_esz		........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
 
+# Two register operands with a 6-bit signed immediate.
+@rd_rn_i6		........ ... rn:5 ..... imm:s6 rd:5		&rri
+
 # Two register operand, one immediate operand, with predicate, element size encoded as TSZHL.
 # User must fill in imm.
 @rdn_pg_tszimm		........ .. ... ... ... pg:3 ..... rd:5		&rpri_esz rn=%reg_movprfx esz=%tszimm_esz
@@ -218,6 +221,15 @@ INDEX_ri		00000100 esz:2 1 imm:s5 010001 rn:5 rd:5
 # SVE index generation (register start, register increment)
 INDEX_rr		00000100 .. 1 ..... 010011 ..... .....		@rd_rn_rm_esz
 
+### SVE Stack Allocation Group
+
+# SVE stack frame adjustment
+ADDVL			00000100 001 ..... 01010 ...... .....		@rd_rn_i6
+ADDPL			00000100 011 ..... 01010 ...... .....		@rd_rn_i6
+
+# SVE stack frame size
+RDVL			00000100 101 11111 01010 imm:s6 rd:5
+
 ### SVE Predicate Generation Group
 
 # SVE initialize predicate (PTRUE, PTRUES)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 19/23] target/arm: Implement SVE Bitwise Shift - Unpredicated Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (17 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 18/23] target/arm: Implement SVE Stack Allocation Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 20/23] target/arm: Implement SVE Compute Vector Address Group Richard Henderson
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 12 +++++++++++
 target/arm/sve_helper.c    | 30 ++++++++++++++++++++++++++
 target/arm/translate-sve.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.def         | 21 ++++++++++++++++++
 4 files changed, 116 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index c8eae5eb62..c0e23e7a83 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -362,6 +362,18 @@ DEF_HELPER_FLAGS_4(sve_index_h, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
 DEF_HELPER_FLAGS_4(sve_index_s, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
 DEF_HELPER_FLAGS_4(sve_index_d, TCG_CALL_NO_RWG, void, ptr, i64, i64, i32)
 
+DEF_HELPER_FLAGS_4(sve_asr_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index d8684b9457..b6aca18d22 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -669,6 +669,36 @@ DO_ZPZ(sve_neg_h, uint16_t, H1_2, DO_NEG)
 DO_ZPZ(sve_neg_s, uint32_t, H1_4, DO_NEG)
 DO_ZPZ_D(sve_neg_d, uint64_t, DO_NEG)
 
+/* Three-operand expander, unpredicated, in which the third operand is "wide".
+ */
+#define DO_ZZW(NAME, TYPE, TYPEW, H, OP)                       \
+void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
+{                                                              \
+    intptr_t i, opr_sz = simd_oprsz(desc);                     \
+    for (i = 0; i < opr_sz; ) {                                \
+        TYPEW mm = *(TYPEW *)(vm + i);                         \
+        do {                                                   \
+            TYPE nn = *(TYPE *)(vn + H(i));                    \
+            *(TYPE *)(vd + H(i)) = OP(nn, mm);                 \
+            i += sizeof(TYPE);                                 \
+        } while (i & 7);                                       \
+    }                                                          \
+}
+
+DO_ZZW(sve_asr_zzw_b, int8_t, uint64_t, H1, DO_ASR)
+DO_ZZW(sve_lsr_zzw_b, uint8_t, uint64_t, H1, DO_LSR)
+DO_ZZW(sve_lsl_zzw_b, uint8_t, uint64_t, H1, DO_LSL)
+
+DO_ZZW(sve_asr_zzw_h, int16_t, uint64_t, H1_2, DO_ASR)
+DO_ZZW(sve_lsr_zzw_h, uint16_t, uint64_t, H1_2, DO_LSR)
+DO_ZZW(sve_lsl_zzw_h, uint16_t, uint64_t, H1_2, DO_LSL)
+
+DO_ZZW(sve_asr_zzw_s, int32_t, uint64_t, H1_4, DO_ASR)
+DO_ZZW(sve_lsr_zzw_s, uint32_t, uint64_t, H1_4, DO_LSR)
+DO_ZZW(sve_lsl_zzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
+
+#undef DO_ZZW
+
 #undef DO_CLS_B
 #undef DO_CLS_H
 #undef DO_CLZ_B
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 026af7a162..d8e7cc7570 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -657,6 +657,59 @@ void trans_RDVL(DisasContext *s, arg_RDVL *a, uint32_t insn)
     tcg_gen_movi_i64(reg, a->imm * vec_full_reg_size(s));
 }
 
+static void do_shift_imm(DisasContext *s, arg_rri_esz *a,
+                         void (*gvec_fn)(unsigned, uint32_t, uint32_t,
+                                         uint32_t, uint32_t, unsigned))
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    gvec_fn(a->esz, vec_full_reg_offset(s, a->rd),
+            vec_full_reg_offset(s, a->rn), vsz, vsz, a->imm);
+}
+
+void trans_ASR_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_shift_imm(s, a, tcg_gen_gvec_sari);
+}
+
+void trans_LSR_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_shift_imm(s, a, tcg_gen_gvec_shri);
+}
+
+void trans_LSL_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_shift_imm(s, a, tcg_gen_gvec_shli);
+}
+
+static void do_zzw_ool(DisasContext *s, arg_rrr_esz *a, gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vsz, vsz, 0, fn);
+}
+
+#define DO_ZZW(NAME, name) \
+void trans_##NAME##_zzw(DisasContext *s, arg_rrr_esz *a, uint32_t insn)   \
+{                                                                         \
+    static gen_helper_gvec_3 * const fns[4] = {                           \
+        gen_helper_sve_##name##_zzw_b, gen_helper_sve_##name##_zzw_h,     \
+        gen_helper_sve_##name##_zzw_s, NULL                               \
+    };                                                                    \
+    do_zzw_ool(s, a, fns[a->esz]);                                        \
+}
+
+DO_ZZW(ASR, asr)
+DO_ZZW(LSR, lsr)
+DO_ZZW(LSL, lsl)
+
+#undef DO_ZZW
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 7428ebc5cd..9caed8fc66 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -32,6 +32,11 @@
 # A combination of tsz:imm3 -- extract (tsz:imm3) - esize
 %tszimm_shl		22:2 5:5 !function=tszimm_shl
 
+# Similarly for the tszh/tszl pair at 22/16 for zzi
+%tszimm16_esz		22:2 16:5 !function=tszimm_esz
+%tszimm16_shr		22:2 16:5 !function=tszimm_shr
+%tszimm16_shl		22:2 16:5 !function=tszimm_shl
+
 # Either a copy of rd (at bit 0), or a different source
 # as propagated via the MOVPRFX instruction.
 %reg_movprfx		0:5
@@ -42,6 +47,7 @@
 # instruction patterns.
 
 &rri			rd rn imm
+&rri_esz		rd rn imm esz
 &rrr_esz		rd rn rm esz
 &rpr_esz		rd pg rn esz
 &rprr_esz		rd pg rn rm esz
@@ -80,6 +86,9 @@
 # User must fill in imm.
 @rdn_pg_tszimm		........ .. ... ... ... pg:3 ..... rd:5		&rpri_esz rn=%reg_movprfx esz=%tszimm_esz
 
+# Similarly without predicate.
+@rd_rn_tszimm		........ .. ... ... ...... rn:5 rd:5		&rri_esz esz=%tszimm16_esz
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9		........ ........ ...... rn:5 . rd:4		&rri imm=%imm9_16_10
 @rd_rn_i9		........ ........ ...... rn:5 rd:5		&rri imm=%imm9_16_10
@@ -230,6 +239,18 @@ ADDPL			00000100 011 ..... 01010 ...... .....		@rd_rn_i6
 # SVE stack frame size
 RDVL			00000100 101 11111 01010 imm:s6 rd:5
 
+### SVE Bitwise Shift - Unpredicated Group
+
+# SVE bitwise shift by immediate (unpredicated)
+ASR_zzi			00000100 .. 1 ..... 1001 00 ..... .....		@rd_rn_tszimm imm=%tszimm16_shr
+LSR_zzi			00000100 .. 1 ..... 1001 01 ..... .....		@rd_rn_tszimm imm=%tszimm16_shr
+LSL_zzi			00000100 .. 1 ..... 1001 11 ..... .....		@rd_rn_tszimm imm=%tszimm16_shl
+
+# SVE bitwise shift by wide elements (unpredicated)
+ASR_zzw			00000100 .. 1 ..... 1000 00 ..... .....		@rd_rn_rm_esz # Note size != 3
+LSR_zzw			00000100 .. 1 ..... 1000 01 ..... .....		@rd_rn_rm_esz # Note size != 3
+LSL_zzw			00000100 .. 1 ..... 1000 11 ..... .....		@rd_rn_rm_esz # Note size != 3
+
 ### SVE Predicate Generation Group
 
 # SVE initialize predicate (PTRUE, PTRUES)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 20/23] target/arm: Implement SVE Compute Vector Address Group
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (18 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 19/23] target/arm: Implement SVE Bitwise Shift - Unpredicated Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 21/23] target/arm: Implement SVE floating-point exponential accelerator Richard Henderson
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  5 +++++
 target/arm/sve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 29 +++++++++++++++++++++++++++++
 target/arm/sve.def         | 12 ++++++++++++
 4 files changed, 86 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index c0e23e7a83..a9fcf25b95 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -374,6 +374,11 @@ DEF_HELPER_FLAGS_4(sve_lsl_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_lsl_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_lsl_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_adr_p32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_adr_p64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_adr_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_adr_u32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index b6aca18d22..33b3c3432d 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -983,6 +983,46 @@ void HELPER(sve_index_d)(void *vd, uint64_t start,
     }
 }
 
+void HELPER(sve_adr_p32)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t sh = simd_data(desc);
+    uint32_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = n[i] + (m[i] << sh);
+    }
+}
+
+void HELPER(sve_adr_p64)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t sh = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = n[i] + (m[i] << sh);
+    }
+}
+
+void HELPER(sve_adr_s32)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t sh = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = n[i] + ((uint64_t)(int32_t)m[i] << sh);
+    }
+}
+
+void HELPER(sve_adr_u32)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t sh = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = n[i] + ((uint64_t)(uint32_t)m[i] << sh);
+    }
+}
+
 void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
 {
     intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d8e7cc7570..fcb5c4929e 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -710,6 +710,35 @@ DO_ZZW(LSL, lsl)
 
 #undef DO_ZZW
 
+static void do_adr(DisasContext *s, arg_rrri *a, gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vsz, vsz, a->imm, fn);
+}
+
+void trans_ADR_p32(DisasContext *s, arg_rrri *a, uint32_t insn)
+{
+    do_adr(s, a, gen_helper_sve_adr_p32);
+}
+
+void trans_ADR_p64(DisasContext *s, arg_rrri *a, uint32_t insn)
+{
+    do_adr(s, a, gen_helper_sve_adr_p64);
+}
+
+void trans_ADR_s32(DisasContext *s, arg_rrri *a, uint32_t insn)
+{
+    do_adr(s, a, gen_helper_sve_adr_s32);
+}
+
+void trans_ADR_u32(DisasContext *s, arg_rrri *a, uint32_t insn)
+{
+    do_adr(s, a, gen_helper_sve_adr_u32);
+}
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 9caed8fc66..66a88f59bc 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -47,6 +47,7 @@
 # instruction patterns.
 
 &rri			rd rn imm
+&rrri			rd rn rm imm
 &rri_esz		rd rn imm esz
 &rrr_esz		rd rn rm esz
 &rpr_esz		rd pg rn esz
@@ -65,6 +66,9 @@
 # Three operand with unused vector element size
 @rd_rn_rm		........ ... rm:5  ... ...  rn:5 rd:5		&rrr_esz esz=0
 
+# Three operand with "memory" size, aka immediate left shift
+@rd_rn_msz_rm		........ ... rm:5 .... imm:2 rn:5 rd:5		&rrri
+
 # Three prediate operand, with governing predicate, unused vector element size
 @pd_pg_pn_pm		........ .... rm:4 .. pg:4 . rn:4 . rd:4	&rprr_esz esz=0
 
@@ -251,6 +255,14 @@ ASR_zzw			00000100 .. 1 ..... 1000 00 ..... .....		@rd_rn_rm_esz # Note size !=
 LSR_zzw			00000100 .. 1 ..... 1000 01 ..... .....		@rd_rn_rm_esz # Note size != 3
 LSL_zzw			00000100 .. 1 ..... 1000 11 ..... .....		@rd_rn_rm_esz # Note size != 3
 
+### SVE Compute Vector Address Group
+
+# SVE vector address generation
+ADR_s32			00000100 00 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
+ADR_u32			00000100 01 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
+ADR_p32			00000100 10 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
+ADR_p64			00000100 11 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
+
 ### SVE Predicate Generation Group
 
 # SVE initialize predicate (PTRUE, PTRUES)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 21/23] target/arm: Implement SVE floating-point exponential accelerator
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (19 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 20/23] target/arm: Implement SVE Compute Vector Address Group Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 22/23] target/arm: Implement SVE floating-point trig select coefficient Richard Henderson
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  4 +++
 target/arm/sve_helper.c    | 81 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 18 +++++++++++
 target/arm/sve.def         | 11 ++++++-
 4 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index a9fcf25b95..c72ae3390f 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -379,6 +379,10 @@ DEF_HELPER_FLAGS_4(sve_adr_p64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_adr_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_adr_u32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_fexpa_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fexpa_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fexpa_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 33b3c3432d..936a6ec648 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1023,6 +1023,87 @@ void HELPER(sve_adr_u32)(void *vd, void *vn, void *vm, uint32_t desc)
     }
 }
 
+void HELPER(sve_fexpa_h)(void *vd, void *vn, uint32_t desc)
+{
+    static const uint16_t coeff[] = {
+        0x0000, 0x0016, 0x002d, 0x0045, 0x005d, 0x0075, 0x008e, 0x00a8,
+        0x00c2, 0x00dc, 0x00f8, 0x0114, 0x0130, 0x014d, 0x016b, 0x0189,
+        0x01a8, 0x01c8, 0x01e8, 0x0209, 0x022b, 0x024e, 0x0271, 0x0295,
+        0x02ba, 0x02e0, 0x0306, 0x032e, 0x0356, 0x037f, 0x03a9, 0x03d4,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / 2;
+    uint16_t *d = vd, *n = vn;
+
+    for (i = 0; i < opr_sz; i++) {
+        uint16_t nn = n[i];
+        intptr_t idx = extract32(nn, 0, 5);
+        uint16_t exp = extract32(nn, 5, 5);
+        d[i] = coeff[idx] | (exp << 10);
+    }
+}
+
+void HELPER(sve_fexpa_s)(void *vd, void *vn, uint32_t desc)
+{
+    static const uint32_t coeff[] = {
+        0x000000, 0x0164d2, 0x02cd87, 0x043a29,
+        0x05aac3, 0x071f62, 0x08980f, 0x0a14d5,
+        0x0b95c2, 0x0d1adf, 0x0ea43a, 0x1031dc,
+        0x11c3d3, 0x135a2b, 0x14f4f0, 0x16942d,
+        0x1837f0, 0x19e046, 0x1b8d3a, 0x1d3eda,
+        0x1ef532, 0x20b051, 0x227043, 0x243516,
+        0x25fed7, 0x27cd94, 0x29a15b, 0x2b7a3a,
+        0x2d583f, 0x2f3b79, 0x3123f6, 0x3311c4,
+        0x3504f3, 0x36fd92, 0x38fbaf, 0x3aff5b,
+        0x3d08a4, 0x3f179a, 0x412c4d, 0x4346cd,
+        0x45672a, 0x478d75, 0x49b9be, 0x4bec15,
+        0x4e248c, 0x506334, 0x52a81e, 0x54f35b,
+        0x5744fd, 0x599d16, 0x5bfbb8, 0x5e60f5,
+        0x60ccdf, 0x633f89, 0x65b907, 0x68396a,
+        0x6ac0c7, 0x6d4f30, 0x6fe4ba, 0x728177,
+        0x75257d, 0x77d0df, 0x7a83b3, 0x7d3e0c,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t *d = vd, *n = vn;
+
+    for (i = 0; i < opr_sz; i++) {
+        uint32_t nn = n[i];
+        intptr_t idx = extract32(nn, 0, 6);
+        uint32_t exp = extract32(nn, 6, 8);
+        d[i] = coeff[idx] | (exp << 23);
+    }
+}
+
+void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
+{
+    static const uint64_t coeff[] = {
+        0x0000000000000, 0x02C9A3E778061, 0x059B0D3158574, 0x0874518759BC8,
+        0x0B5586CF9890F, 0x0E3EC32D3D1A2, 0x11301D0125B51, 0x1429AAEA92DE0,
+        0x172B83C7D517B, 0x1A35BEB6FCB75, 0x1D4873168B9AA, 0x2063B88628CD6,
+        0x2387A6E756238, 0x26B4565E27CDD, 0x29E9DF51FDEE1, 0x2D285A6E4030B,
+        0x306FE0A31B715, 0x33C08B26416FF, 0x371A7373AA9CB, 0x3A7DB34E59FF7,
+        0x3DEA64C123422, 0x4160A21F72E2A, 0x44E086061892D, 0x486A2B5C13CD0,
+        0x4BFDAD5362A27, 0x4F9B2769D2CA7, 0x5342B569D4F82, 0x56F4736B527DA,
+        0x5AB07DD485429, 0x5E76F15AD2148, 0x6247EB03A5585, 0x6623882552225,
+        0x6A09E667F3BCD, 0x6DFB23C651A2F, 0x71F75E8EC5F74, 0x75FEB564267C9,
+        0x7A11473EB0187, 0x7E2F336CF4E62, 0x82589994CCE13, 0x868D99B4492ED,
+        0x8ACE5422AA0DB, 0x8F1AE99157736, 0x93737B0CDC5E5, 0x97D829FDE4E50,
+        0x9C49182A3F090, 0xA0C667B5DE565, 0xA5503B23E255D, 0xA9E6B5579FDBF,
+        0xAE89F995AD3AD, 0xB33A2B84F15FB, 0xB7F76F2FB5E47, 0xBCC1E904BC1D2,
+        0xC199BDD85529C, 0xC67F12E57D14B, 0xCB720DCEF9069, 0xD072D4A07897C,
+        0xD5818DCFBA487, 0xDA9E603DB3285, 0xDFC97337B9B5F, 0xE502EE78B3FF6,
+        0xEA4AFA2A490DA, 0xEFA1BEE615A27, 0xF50765B6E4540, 0xFA7C1819E90D8,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn;
+
+    for (i = 0; i < opr_sz; i++) {
+        uint64_t nn = n[i];
+        intptr_t idx = extract32(nn, 0, 6);
+        uint64_t exp = extract32(nn, 6, 11);
+        d[i] = coeff[idx] | (exp << 52);
+    }
+}
+
 void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
 {
     intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index fcb5c4929e..b671462611 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -739,6 +739,24 @@ void trans_ADR_u32(DisasContext *s, arg_rrri *a, uint32_t insn)
     do_adr(s, a, gen_helper_sve_adr_u32);
 }
 
+void trans_FEXPA(DisasContext *s, arg_rr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_2 * const fns[4] = {
+        NULL,
+        gen_helper_sve_fexpa_h,
+        gen_helper_sve_fexpa_s,
+        gen_helper_sve_fexpa_d,
+    };
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vsz, vsz, 0, fns[a->esz]);
+}
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index 66a88f59bc..c0fc8b7665 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -49,6 +49,7 @@
 &rri			rd rn imm
 &rrri			rd rn rm imm
 &rri_esz		rd rn imm esz
+&rr_esz			rd rn esz
 &rrr_esz		rd rn rm esz
 &rpr_esz		rd pg rn esz
 &rprr_esz		rd pg rn rm esz
@@ -60,8 +61,11 @@
 # Named instruction formats.  These are generally used to
 # reduce the amount of duplication between instruction patterns.
 
+# Two operand
+@rd_rn_esz		........ esz:2 ...... ...... rn:5 rd:5		&rr_esz
+
 # Three operand
-@rd_rn_rm_esz		........ esz:2 . rm:5  ... ...  rn:5 rd:5	&rrr_esz
+@rd_rn_rm_esz		........ esz:2 . rm:5 ... ... rn:5 rd:5		&rrr_esz
 
 # Three operand with unused vector element size
 @rd_rn_rm		........ ... rm:5  ... ...  rn:5 rd:5		&rrr_esz esz=0
@@ -263,6 +267,11 @@ ADR_u32			00000100 01 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
 ADR_p32			00000100 10 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
 ADR_p64			00000100 11 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
 
+### SVE Integer Misc - Unpredicated Group
+
+# SVE floating-point exponential accelerator
+FEXPA			00000100 .. 1 00000 101110 ..... .....		@rd_rn_esz # Note size != 0
+
 ### SVE Predicate Generation Group
 
 # SVE initialize predicate (PTRUE, PTRUES)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 22/23] target/arm: Implement SVE floating-point trig select coefficient
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (20 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 21/23] target/arm: Implement SVE floating-point exponential accelerator Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 23/23] target/arm: Implement SVE Element Count Group, register destinations Richard Henderson
  2018-01-11 17:56 ` [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Peter Maydell
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  4 ++++
 target/arm/sve_helper.c    | 42 ++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 19 +++++++++++++++++++
 target/arm/sve.def         |  3 +++
 4 files changed, 68 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index c72ae3390f..ccf5405d63 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -383,6 +383,10 @@ DEF_HELPER_FLAGS_3(sve_fexpa_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_fexpa_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_fexpa_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_ftssel_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_ftssel_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_ftssel_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pred, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 936a6ec648..5341f6d0e5 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1104,6 +1104,48 @@ void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
     }
 }
 
+void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 2;
+    uint16_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        uint16_t nn = n[i];
+        uint16_t mm = m[i];
+        if (mm & 1) {
+            nn = float16_one;
+        }
+        d[i] = nn ^ (mm & 2) << 14;
+    }
+}
+
+void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        uint32_t nn = n[i];
+        uint32_t mm = m[i];
+        if (mm & 1) {
+            nn = float32_one;
+        }
+        d[i] = nn ^ (mm & 2) << 30;
+    }
+}
+
+void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i];
+        uint64_t mm = m[i];
+        if (mm & 1) {
+            nn = float64_one;
+        }
+        d[i] = nn ^ (mm & 2) << 62;
+    }
+}
+
 void HELPER(sve_ldr)(CPUARMState *env, void *d, target_ulong addr, uint32_t len)
 {
     intptr_t i, len_align = QEMU_ALIGN_DOWN(len, 8);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b671462611..a6c31e0e9c 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -757,6 +757,25 @@ void trans_FEXPA(DisasContext *s, arg_rr_esz *a, uint32_t insn)
                        vsz, vsz, 0, fns[a->esz]);
 }
 
+void trans_FTSSEL(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_ftssel_h,
+        gen_helper_sve_ftssel_s,
+        gen_helper_sve_ftssel_d,
+    };
+    unsigned vsz = size_for_gvec(vec_full_reg_size(s));
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vsz, vsz, 0, fns[a->esz]);
+}
+
 static uint64_t pred_esz_mask[4] = {
     0xffffffffffffffffull, 0x5555555555555555ull,
     0x1111111111111111ull, 0x0101010101010101ull
diff --git a/target/arm/sve.def b/target/arm/sve.def
index c0fc8b7665..df2730eb73 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -272,6 +272,9 @@ ADR_p64			00000100 11 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
 # SVE floating-point exponential accelerator
 FEXPA			00000100 .. 1 00000 101110 ..... .....		@rd_rn_esz # Note size != 0
 
+# SVE floating-point trig select coefficient
+FTSSEL			00000100 .. 1 ..... 101100 ..... .....		@rd_rn_rm_esz # Note size != 0
+
 ### SVE Predicate Generation Group
 
 # SVE initialize predicate (PTRUE, PTRUES)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 23/23] target/arm: Implement SVE Element Count Group, register destinations
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (21 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 22/23] target/arm: Implement SVE floating-point trig select coefficient Richard Henderson
@ 2017-12-18 17:45 ` Richard Henderson
  2018-01-11 17:56 ` [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Peter Maydell
  23 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-12-18 17:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 103 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.def         |  18 ++++++++
 2 files changed, 121 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index a6c31e0e9c..91eb4e797a 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -61,6 +61,11 @@ static int tszimm_shl(int x)
     return x - tszimm_esz(x);
 }
 
+static inline int plus1(int x)
+{
+    return x + 1;
+}
+
 /*
  * Include the generated decoder.
  */
@@ -815,6 +820,104 @@ static unsigned decode_pred_count(unsigned fullsz, int pattern, int esz)
     }
 }
 
+void trans_CNT_r(DisasContext *s, arg_CNT_r *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+
+    tcg_gen_movi_i64(cpu_reg(s, a->rd), numelem * a->imm);
+}
+
+void trans_INC_DEC_r(DisasContext *s, arg_incdec_cnt *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+    int inc = numelem * a->imm * (a->d ? -1 : 1);
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+
+    tcg_gen_addi_i64(reg, reg, inc);
+}
+
+void trans_sat_INC_DEC_r_32(DisasContext *s, arg_incdec_cnt *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+    int inc = numelem * a->imm * (a->d ? -1 : 1);
+    int64_t ibound;
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+    TCGv_i64 bound;
+    TCGCond cond;
+
+    /* Use normal 64-bit arithmetic to detect 32-bit overflow.  */
+    if (a->u) {
+        tcg_gen_ext32u_i64(reg, reg);
+    } else {
+        tcg_gen_ext32s_i64(reg, reg);
+    }
+    tcg_gen_addi_i64(reg, reg, inc);
+    if (a->d) {
+        if (a->u) {
+            ibound = 0;
+            cond = TCG_COND_LTU;
+        } else {
+            ibound = INT32_MIN;
+            cond = TCG_COND_LT;
+        }
+    } else {
+        if (a->u) {
+            ibound = UINT32_MAX;
+            cond = TCG_COND_GTU;
+        } else {
+            ibound = INT32_MAX;
+            cond = TCG_COND_GT;
+        }
+    }
+    bound = tcg_const_i64(ibound);
+    tcg_gen_movcond_i64(cond, reg, reg, bound, bound, reg);
+    tcg_temp_free_i64(bound);
+}
+
+void trans_sat_INC_DEC_r_64(DisasContext *s, arg_incdec_cnt *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+    int inc = numelem * a->imm * (a->d ? -1 : 1);
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 zero;
+
+    if (a->u) {
+        tcg_gen_addi_i64(t0, reg, inc);
+
+        /* Bound the result.  */
+        if (a->d) {
+            tcg_gen_movi_i64(t1, 0);
+            tcg_gen_movcond_i64(TCG_COND_LTU, reg, t0, reg, t0, t1);
+        } else {
+            tcg_gen_movi_i64(t1, -1);
+            tcg_gen_movcond_i64(TCG_COND_LTU, reg, reg, t0, t0, t1);
+        }
+    } else {
+        /* Detect signed overflow for addition.  */
+        tcg_gen_xori_i64(t0, reg, inc);
+        tcg_gen_addi_i64(reg, reg, inc);
+        tcg_gen_xori_i64(t0, reg, inc);
+        tcg_gen_andc_i64(t0, t1, t0);
+
+        /* Because we know the increment, we know which way it overflowed.  */
+        tcg_gen_movi_i64(t1, a->d ? INT64_MIN : INT64_MAX);
+
+        /* Bound the result.  */
+        zero = tcg_const_i64(0);
+        tcg_gen_movcond_i64(TCG_COND_LT, reg, t0, zero, t1, reg);
+
+        tcg_temp_free_i64(zero);
+    }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+}
+
 /* For PTRUE, PTRUES, PFALSE, SETFFR.  */
 void trans_pred_set(DisasContext *s, arg_pred_set *a, uint32_t insn)
 {
diff --git a/target/arm/sve.def b/target/arm/sve.def
index df2730eb73..da533ba666 100644
--- a/target/arm/sve.def
+++ b/target/arm/sve.def
@@ -24,6 +24,7 @@
 
 %imm9_16_10		16:s6 10:3
 %imm6_22_5		22:1 5:5
+%imm4_16_p1             16:4 !function=plus1
 
 # A combination of tsz:imm3 -- extract esize.
 %tszimm_esz		22:2 5:5 !function=tszimm_esz
@@ -56,6 +57,7 @@
 &rprrr_esz		rd pg rn rm ra esz
 &rpri_esz		rd pg rn imm esz
 &pred_set		rd pat esz i s
+&incdec_cnt		rd pat esz imm d u
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -101,6 +103,10 @@
 @pd_rn_i9		........ ........ ...... rn:5 . rd:4		&rri imm=%imm9_16_10
 @rd_rn_i9		........ ........ ...... rn:5 rd:5		&rri imm=%imm9_16_10
 
+# One register, pattern, and uint4+1.
+# User must fill in U and D.
+@incdec_cnt		........ esz:2 .. .... ...... pat:5 rd:5	&incdec_cnt imm=%imm4_16_p1
+
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
@@ -275,6 +281,18 @@ FEXPA			00000100 .. 1 00000 101110 ..... .....		@rd_rn_esz # Note size != 0
 # SVE floating-point trig select coefficient
 FTSSEL			00000100 .. 1 ..... 101100 ..... .....		@rd_rn_rm_esz # Note size != 0
 
+### SVE Element Count Group
+
+# SVE element count
+CNT_r			00000100 .. 10 .... 1110 0   0   ..... .....	@incdec_cnt d=0 u=1
+
+# SVE inc/dec register by element count
+INC_DEC_r		00000100 .. 11 .... 1110 0   d:1 ..... .....	@incdec_cnt u=1
+
+# SVE saturating inc/dec register by element count
+sat_INC_DEC_r_32	00000100 .. 10 .... 1111 d:1 u:1 ..... .....	@incdec_cnt
+sat_INC_DEC_r_64	00000100 .. 11 .... 1111 d:1 u:1 ..... .....	@incdec_cnt
+
 ### SVE Predicate Generation Group
 
 # SVE initialize predicate (PTRUE, PTRUES)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches
  2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
                   ` (22 preceding siblings ...)
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 23/23] target/arm: Implement SVE Element Count Group, register destinations Richard Henderson
@ 2018-01-11 17:56 ` Peter Maydell
  2018-01-11 19:23   ` Richard Henderson
  23 siblings, 1 reply; 40+ messages in thread
From: Peter Maydell @ 2018-01-11 17:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 18 December 2017 at 17:45, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The most important part here, for review, is the first patch.
>
> I add a code generator, writen in python, which takes an input file
> that describes the opcode bits and field bits of the instructions,
> and outputs a function that does all of the decoding.
>
> The subsequent patches begin to add SVE support and also demonstrate
> how I envision how both the decoder and the tcg host vector support
> are to be used.  Thus, review of the direction would be appreciated
> before there are another 100 patches along the same style.

This doesn't apply to master -- do you have an example of
what the generated code comes out like?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py Richard Henderson
@ 2018-01-11 18:06   ` Peter Maydell
  2018-01-11 19:10     ` Richard Henderson
  2018-01-12 11:57   ` Peter Maydell
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Maydell @ 2018-01-11 18:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 18 December 2017 at 17:45, Richard Henderson
<richard.henderson@linaro.org> wrote:
> To be used to decode ARM SVE, but could be used for any 32-bit RISC.
> It would need additional work to extend to insn sizes other than 32-bit.

I guess we could make that extension without requiring all
existing pattern files to be updated, right?

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  scripts/decodetree.py | 984 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 984 insertions(+)
>  create mode 100755 scripts/decodetree.py

This is rather large for a single patch...

> +# Pattern examples:
> +#
> +#   addl_r   010000 ..... ..... .... 0000000 ..... @opr
> +#   addl_i   010000 ..... ..... .... 0000000 ..... @opi
> +#

I think we should insist that a pattern defines all the
bits (either as constant values or as fields that get
passed to the decode function). That will help prevent
accidental under-decoding.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton Richard Henderson
@ 2018-01-11 18:20   ` Peter Maydell
  2018-01-11 19:12     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Maydell @ 2018-01-11 18:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 18 December 2017 at 17:45, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Including only 4, as-yet unimplemented, instruction patterns
> so that the whole thing compiles.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.h | 111 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-a64.c |  91 +++++++------------------------------
>  target/arm/translate-sve.c |  48 ++++++++++++++++++++
>  .gitignore                 |   1 +
>  target/arm/Makefile.objs   |  11 +++++
>  target/arm/sve.def         |  45 ++++++++++++++++++
>  6 files changed, 233 insertions(+), 74 deletions(-)
>  create mode 100644 target/arm/translate-a64.h
>  create mode 100644 target/arm/translate-sve.c
>  create mode 100644 target/arm/sve.def

This will be easier to review if you split the stuff that's
purely code motion from the .c file to the .h into its own
patch.


> diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
> index c2d32988f9..d1ca1f799b 100644
> --- a/target/arm/Makefile.objs
> +++ b/target/arm/Makefile.objs
> @@ -10,3 +10,14 @@ obj-y += gdbstub.o
>  obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
>  obj-y += crypto_helper.o
>  obj-$(CONFIG_SOFTMMU) += arm-powerctl.o
> +
> +DECODETREE = $(SRC_PATH)/scripts/decodetree.py
> +
> +target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.def $(DECODETREE)
> +       $(call quiet-command,\
> +         $(PYTHON) $(DECODETREE) -o $@ --decode disas_sve \
> +               $(SRC_PATH)/target/arm/sve.def || rm -f $@, \
> +               "GEN", $@)
> +
> +target/arm/translate-sve.o: target/arm/decode-sve.inc.c
> +obj-$(TARGET_AARCH64) += translate-sve.o

If we're serious about the idea that this decoder script is
general purpose, we should have a rules.mak rune for
generally invoking it to create a decode-foo.inc.c from a foo.def.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py
  2018-01-11 18:06   ` Peter Maydell
@ 2018-01-11 19:10     ` Richard Henderson
  2018-01-11 19:21       ` Peter Maydell
  2018-01-12 10:53       ` Peter Maydell
  0 siblings, 2 replies; 40+ messages in thread
From: Richard Henderson @ 2018-01-11 19:10 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 01/11/2018 10:06 AM, Peter Maydell wrote:
> On 18 December 2017 at 17:45, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> To be used to decode ARM SVE, but could be used for any 32-bit RISC.
>> It would need additional work to extend to insn sizes other than 32-bit.
> 
> I guess we could make that extension without requiring all
> existing pattern files to be updated, right?

Sure.  I would expect that to be a command-line option for the script.

> 
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  scripts/decodetree.py | 984 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 984 insertions(+)
>>  create mode 100755 scripts/decodetree.py
> 
> This is rather large for a single patch...

Ok, but... what bits of it are functional without the whole?

> 
>> +# Pattern examples:
>> +#
>> +#   addl_r   010000 ..... ..... .... 0000000 ..... @opr
>> +#   addl_i   010000 ..... ..... .... 0000000 ..... @opi
>> +#
> 
> I think we should insist that a pattern defines all the
> bits (either as constant values or as fields that get
> passed to the decode function). That will help prevent
> accidental under-decoding.

Hmm.  What do you suggest then for bits that the cpu does not decode at all?
This doesn't happen with ARM (I don't think) but it does happen with HPPA, and
probably others.

I suppose I could either wrap it in a field that the translator ignores, or
choose another character besides ".", e.g.

mfia            000000 xxxxx 00000 xxx 10100101 t:5

where bits [21-25] and bits [13-15] really are ignored by hardware.

Thoughts?


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton
  2018-01-11 18:20   ` Peter Maydell
@ 2018-01-11 19:12     ` Richard Henderson
  2018-01-12 16:12       ` Bastian Koppelmann
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2018-01-11 19:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 01/11/2018 10:20 AM, Peter Maydell wrote:
> On 18 December 2017 at 17:45, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Including only 4, as-yet unimplemented, instruction patterns
>> so that the whole thing compiles.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/translate-a64.h | 111 +++++++++++++++++++++++++++++++++++++++++++++
>>  target/arm/translate-a64.c |  91 +++++++------------------------------
>>  target/arm/translate-sve.c |  48 ++++++++++++++++++++
>>  .gitignore                 |   1 +
>>  target/arm/Makefile.objs   |  11 +++++
>>  target/arm/sve.def         |  45 ++++++++++++++++++
>>  6 files changed, 233 insertions(+), 74 deletions(-)
>>  create mode 100644 target/arm/translate-a64.h
>>  create mode 100644 target/arm/translate-sve.c
>>  create mode 100644 target/arm/sve.def
> 
> This will be easier to review if you split the stuff that's
> purely code motion from the .c file to the .h into its own
> patch.

Ok.

>> +target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.def $(DECODETREE)
>> +       $(call quiet-command,\
>> +         $(PYTHON) $(DECODETREE) -o $@ --decode disas_sve \
>> +               $(SRC_PATH)/target/arm/sve.def || rm -f $@, \
>> +               "GEN", $@)
>> +
>> +target/arm/translate-sve.o: target/arm/decode-sve.inc.c
>> +obj-$(TARGET_AARCH64) += translate-sve.o
> 
> If we're serious about the idea that this decoder script is
> general purpose, we should have a rules.mak rune for
> generally invoking it to create a decode-foo.inc.c from a foo.def.

I didn't want to attempt to generalize this until we have two users.
Particularly if we wind up with extra options to the script to change other
behavior.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py
  2018-01-11 19:10     ` Richard Henderson
@ 2018-01-11 19:21       ` Peter Maydell
  2018-01-11 19:26         ` Richard Henderson
  2018-01-12 10:53       ` Peter Maydell
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Maydell @ 2018-01-11 19:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 11 January 2018 at 19:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/11/2018 10:06 AM, Peter Maydell wrote:
>> On 18 December 2017 at 17:45, Richard Henderson
>> <richard.henderson@linaro.org> wrote:

>>> +# Pattern examples:
>>> +#
>>> +#   addl_r   010000 ..... ..... .... 0000000 ..... @opr
>>> +#   addl_i   010000 ..... ..... .... 0000000 ..... @opi
>>> +#
>>
>> I think we should insist that a pattern defines all the
>> bits (either as constant values or as fields that get
>> passed to the decode function). That will help prevent
>> accidental under-decoding.
>
> Hmm.  What do you suggest then for bits that the cpu does not decode at all?
> This doesn't happen with ARM (I don't think) but it does happen with HPPA, and
> probably others.

Arm does have undecoded bits (they're in brackets in encoding diagrams),
but they're UNPREDICTABLE if you don't set them right, so ideally we
check them all and UNDEF. Our current aarch32 decoder doesn't always
do this, and it's non-obvious when that happens.

> I suppose I could either wrap it in a field that the translator ignores, or
> choose another character besides ".", e.g.
>
> mfia            000000 xxxxx 00000 xxx 10100101 t:5
>
> where bits [21-25] and bits [13-15] really are ignored by hardware.

Yes, I'd like to see something so that if you want the translator
to ignore a bit you have to explicitly mark it as to be ignored.

Something I noticed the doc comment doesn't mention: what's the
semantics for if the patterns you declare overlap? Is this a
purely declarative language where you have to make sure an
insn can only match one pattern (or get an error, presumably),
or is there an implicit "match starting from the top, so put
looser patterns last" process?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches
  2018-01-11 17:56 ` [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Peter Maydell
@ 2018-01-11 19:23   ` Richard Henderson
  2018-01-11 19:27     ` Peter Maydell
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2018-01-11 19:23 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 01/11/2018 09:56 AM, Peter Maydell wrote:
> On 18 December 2017 at 17:45, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> The most important part here, for review, is the first patch.
>>
>> I add a code generator, writen in python, which takes an input file
>> that describes the opcode bits and field bits of the instructions,
>> and outputs a function that does all of the decoding.
>>
>> The subsequent patches begin to add SVE support and also demonstrate
>> how I envision how both the decoder and the tcg host vector support
>> are to be used.  Thus, review of the direction would be appreciated
>> before there are another 100 patches along the same style.
> 
> This doesn't apply to master -- do you have an example of
> what the generated code comes out like?

That's why I gave you a link to a buildable branch on Tuesday.

But here's are some snippets from what's current in my tree.

Note that I play games with the decode and translation such that e.g. SETFFR ->
PTRUE p16, all; RDFFR pd -> ORR pd, p16, p16, p16.  That's what you'll be
seeing in the last dozen lines.  But I also chose that snippet because it shows
the nesting when instruction subsets need to decode more bits.


r~


    switch ((insn >> 24) & 0xff) {
    case 0x4:
        /* 00000100 ........ ........ ........ */
        switch (insn & 0x0020e000) {
        case 0x00000000:
            /* 00000100 ..0..... 000..... ........ */
            switch ((insn >> 16) & 0x1f) {
            case 0x0:
                /* 00000100 ..000000 000..... ........ */
                extract_rdn_pg_rm_esz(&u.f_rprr_esz, insn);
                trans_ADD_zpzz(ctx, &u.f_rprr_esz, insn);
                return true;
            case 0x1:
                /* 00000100 ..000001 000..... ........ */
                extract_rdn_pg_rm_esz(&u.f_rprr_esz, insn);
                trans_SUB_zpzz(ctx, &u.f_rprr_esz, insn);
                return true;
            case 0x3:
                /* 00000100 ..000011 000..... ........ */
                extract_rdm_pg_rn_esz(&u.f_rprr_esz, insn);
                trans_SUB_zpzz(ctx, &u.f_rprr_esz, insn);
                return true;
            case 0x8:
                /* 00000100 ..001000 000..... ........ */
                extract_rdn_pg_rm_esz(&u.f_rprr_esz, insn);
                trans_SMAX_zpzz(ctx, &u.f_rprr_esz, insn);
                return true;

...

            case 0x00100000:
                /* 00100101 ..01.... 11...... ...0.... */
                switch ((insn >> 17) & 0x7) {
                case 0x0:
                    /* 00100101 ..01000. 11...... ...0.... */
                    extract_Fmt_42(&u.f_22, insn);
                    switch (insn & 0x00c1020f) {
                    case 0x00400000:
                        /* 00100101 01010000 11....0. ...00000 */
                        trans_PTEST(ctx, &u.f_22, insn);
                        return true;
                    }
                    return false;
                case 0x4:
                    /* 00100101 ..01100. 11...... ...0.... */
                    switch ((insn >> 10) & 0xf) {
                    case 0x0:
                        /* 00100101 ..01100. 110000.. ...0.... */
                        extract_pd_pn(&u.f_rr_esz, insn);
                        switch (insn & 0x00c10200) {
                        case 0x00400000:
                            /* 00100101 01011000 1100000. ...0.... */
                            trans_PFIRST(ctx, &u.f_rr_esz, insn);
                            return true;
                        }
                        return false;
                    case 0x1:
                        /* 00100101 ..01100. 110001.. ...0.... */
                        extract_pd_pn_esz(&u.f_rr_esz, insn);
                        switch (insn & 0x00010200) {
                        case 0x00010000:
                            /* 00100101 ..011001 1100010. ...0.... */
                            trans_PNEXT(ctx, &u.f_rr_esz, insn);
                            return true;
                        }
                        return false;
                    case 0x8:
                        /* 00100101 ..01100. 111000.. ...0.... */
                        extract_Fmt_43(&u.f_ptrue, insn);
                        trans_PTRUE(ctx, &u.f_ptrue, insn);
                        return true;
                    case 0x9:
                        /* 00100101 ..01100. 111001.. ...0.... */
                        extract_Fmt_45(&u.f_ptrue, insn);
                        switch (insn & 0x00c103e0) {
                        case 0x00000000:
                            /* 00100101 00011000 11100100 0000.... */
                            trans_PTRUE(ctx, &u.f_ptrue, insn);
                            return true;
                        }
                        return false;
                    case 0xc:
                        /* 00100101 ..01100. 111100.. ...0.... */
                        switch (insn & 0x00810200) {
                        case 0x00000000:
                            /* 00100101 0.011000 1111000. ...0.... */
                            extract_Fmt_46(&u.f_rprr_s, insn);
                            trans_ORR_pppp(ctx, &u.f_rprr_s, insn);
                            return true;
                        case 0x00010000:
                            /* 00100101 0.011001 1111000. ...0.... */
                            extract_Fmt_47(&u.f_rprr_s, insn);
                            switch (insn & 0x004001e0) {
                            case 0x00000000:
                                /* 00100101 00011001 11110000 0000.... */
                                trans_ORR_pppp(ctx, &u.f_rprr_s, insn);
                                return true;
                            }
                            return false;
                        }
                        return false;
                    }
                    return false;
                }
                return false;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py
  2018-01-11 19:21       ` Peter Maydell
@ 2018-01-11 19:26         ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2018-01-11 19:26 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 01/11/2018 11:21 AM, Peter Maydell wrote:
> On 11 January 2018 at 19:10, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> On 01/11/2018 10:06 AM, Peter Maydell wrote:
>>> On 18 December 2017 at 17:45, Richard Henderson
>>> <richard.henderson@linaro.org> wrote:
> 
>>>> +# Pattern examples:
>>>> +#
>>>> +#   addl_r   010000 ..... ..... .... 0000000 ..... @opr
>>>> +#   addl_i   010000 ..... ..... .... 0000000 ..... @opi
>>>> +#
>>>
>>> I think we should insist that a pattern defines all the
>>> bits (either as constant values or as fields that get
>>> passed to the decode function). That will help prevent
>>> accidental under-decoding.
>>
>> Hmm.  What do you suggest then for bits that the cpu does not decode at all?
>> This doesn't happen with ARM (I don't think) but it does happen with HPPA, and
>> probably others.
> 
> Arm does have undecoded bits (they're in brackets in encoding diagrams),
> but they're UNPREDICTABLE if you don't set them right, so ideally we
> check them all and UNDEF. Our current aarch32 decoder doesn't always
> do this, and it's non-obvious when that happens.
> 
>> I suppose I could either wrap it in a field that the translator ignores, or
>> choose another character besides ".", e.g.
>>
>> mfia            000000 xxxxx 00000 xxx 10100101 t:5
>>
>> where bits [21-25] and bits [13-15] really are ignored by hardware.
> 
> Yes, I'd like to see something so that if you want the translator
> to ignore a bit you have to explicitly mark it as to be ignored.

Ok.

> Something I noticed the doc comment doesn't mention: what's the
> semantics for if the patterns you declare overlap? Is this a
> purely declarative language where you have to make sure an
> insn can only match one pattern (or get an error, presumably),
> or is there an implicit "match starting from the top, so put
> looser patterns last" process?

It *should* error.  But I'm not sure that it does.
It's probably worth adding some unit tests for this...


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches
  2018-01-11 19:23   ` Richard Henderson
@ 2018-01-11 19:27     ` Peter Maydell
  2018-01-11 19:34       ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Maydell @ 2018-01-11 19:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 11 January 2018 at 19:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/11/2018 09:56 AM, Peter Maydell wrote:
>> On 18 December 2017 at 17:45, Richard Henderson
>> <richard.henderson@linaro.org> wrote:
>>> The most important part here, for review, is the first patch.
>>>
>>> I add a code generator, writen in python, which takes an input file
>>> that describes the opcode bits and field bits of the instructions,
>>> and outputs a function that does all of the decoding.
>>>
>>> The subsequent patches begin to add SVE support and also demonstrate
>>> how I envision how both the decoder and the tcg host vector support
>>> are to be used.  Thus, review of the direction would be appreciated
>>> before there are another 100 patches along the same style.
>>
>> This doesn't apply to master -- do you have an example of
>> what the generated code comes out like?
>
> That's why I gave you a link to a buildable branch on Tuesday.

tgt-arm-cplx ? I had a look at that but it seemed to be a different
set of patches to this lot.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches
  2018-01-11 19:27     ` Peter Maydell
@ 2018-01-11 19:34       ` Richard Henderson
  2018-01-12 12:42         ` Peter Maydell
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2018-01-11 19:34 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 01/11/2018 11:27 AM, Peter Maydell wrote:
> On 11 January 2018 at 19:23, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> On 01/11/2018 09:56 AM, Peter Maydell wrote:
>>> On 18 December 2017 at 17:45, Richard Henderson
>>> <richard.henderson@linaro.org> wrote:
>>>> The most important part here, for review, is the first patch.
>>>>
>>>> I add a code generator, writen in python, which takes an input file
>>>> that describes the opcode bits and field bits of the instructions,
>>>> and outputs a function that does all of the decoding.
>>>>
>>>> The subsequent patches begin to add SVE support and also demonstrate
>>>> how I envision how both the decoder and the tcg host vector support
>>>> are to be used.  Thus, review of the direction would be appreciated
>>>> before there are another 100 patches along the same style.
>>>
>>> This doesn't apply to master -- do you have an example of
>>> what the generated code comes out like?
>>
>> That's why I gave you a link to a buildable branch on Tuesday.
> 
> tgt-arm-cplx ? I had a look at that but it seemed to be a different
> set of patches to this lot.

No, tgt-arm-sve-{1,2,3}.  All of them should build.

tgt-arm-cplx is the armv8.{1,3} stuff that's waiting on fp16 to go in.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py
  2018-01-11 19:10     ` Richard Henderson
  2018-01-11 19:21       ` Peter Maydell
@ 2018-01-12 10:53       ` Peter Maydell
  1 sibling, 0 replies; 40+ messages in thread
From: Peter Maydell @ 2018-01-12 10:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 11 January 2018 at 19:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/11/2018 10:06 AM, Peter Maydell wrote:
>> On 18 December 2017 at 17:45, Richard Henderson
>> <richard.henderson@linaro.org> wrote:
>>> To be used to decode ARM SVE, but could be used for any 32-bit RISC.
>>> It would need additional work to extend to insn sizes other than 32-bit.
>>
>> I guess we could make that extension without requiring all
>> existing pattern files to be updated, right?
>
> Sure.  I would expect that to be a command-line option for the script.

It occurs to me that we could just make it silently DTRT --
we want to check the bit count in patterns and so on to
detect typos, but it would be a weird typo that was wrong
by 8 bits, so as long as the patterns are all a multiple
of 8 bits long we could accept them (and handle them in
whatever way we find we need to handle mixed-length
instruction sets). So we might not even need a command
line option.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py
  2017-12-18 17:45 ` [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py Richard Henderson
  2018-01-11 18:06   ` Peter Maydell
@ 2018-01-12 11:57   ` Peter Maydell
  2018-01-12 14:54     ` Richard Henderson
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Maydell @ 2018-01-12 11:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 18 December 2017 at 17:45, Richard Henderson
<richard.henderson@linaro.org> wrote:
> To be used to decode ARM SVE, but could be used for any 32-bit RISC.
> It would need additional work to extend to insn sizes other than 32-bit.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

I have some comments here (mostly about the syntax, error checking
and generated code, rather than the script itself), but overall I
like this approach and I think we can drop the "RFC" from the patchset.

> ---
>  scripts/decodetree.py | 984 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 984 insertions(+)
>  create mode 100755 scripts/decodetree.py
>
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> new file mode 100755
> index 0000000000..acb0243915
> --- /dev/null
> +++ b/scripts/decodetree.py
> @@ -0,0 +1,984 @@
> +#!/usr/bin/env python

I asked on #qemu-devel for some review from people who are more
familiar with Python than I am. One of the suggestions (from
Marc-André Lureau) was to run pycodestyle on this and fix the
(mostly coding style nits) reported by it. (pycodestyle may
be called 'pep8' on older distros.)

> +#
> +# Generate a decoding tree from a specification file.
> +#
> +# The tree is built from instruction "patterns".  A pattern may represent
> +# a single architectural instruction or a group of same, depending on what
> +# is convenient for further processing.
> +#
> +# Each pattern has "fixedbits" & "fixedmask", the combination of which
> +# describes the condition under which the pattern is matched:
> +#
> +#   (insn & fixedmask) == fixedbits
> +#
> +# Each pattern may have "fields", which are extracted from the insn and
> +# passed along to the translator.  Examples of such are registers,
> +# immediates, and sub-opcodes.
> +#
> +# In support of patterns, one may declare fields, argument sets, and
> +# formats, each of which may be re-used to simplify further definitions.
> +#
> +## Field syntax:
> +#
> +# field_def    := '%' identifier ( unnamed_field )+ ( !function=identifier )?
> +# unnamed_field := number ':' ( 's' ) number
> +#
> +# For unnamed_field, the first number is the least-significant bit position of
> +# the field and the second number is the length of the field.  If the 's' is
> +# present, the field is considered signed.  If multiple unnamed_fields are
> +# present, they are concatenated.  In this way one can define disjoint fields.

This syntax lets you specify that fields other than the first one in
a concatenated set are signed, like
    10:5 | 3:s5
That doesn't seem to me like it's meaningful. Shouldn't the signedness
or otherwise be a property of the whole extracted field, rather than
an individual component of it? (In practice creating a signed combined
value is implemented by doing the most-significant component as sextract,
obviously.)

> +#
> +# If !function is specified, the concatenated result is passed through the
> +# named function, taking and returning an integral value.
> +#
> +# FIXME: the fields of the structure into which this result will be stored
> +# is restricted to "int".  Which means that we cannot expand 64-bit items.
> +#
> +# Field examples:
> +#
> +#   %disp   0:s16          -- sextract(i, 0, 16)
> +#   %imm9   16:6 10:3      -- extract(i, 16, 6) << 3 | extract(i, 10, 3)
> +#   %disp12 0:s1 1:1 2:10  -- sextract(i, 0, 1) << 11
> +#                             | extract(i, 1, 1) << 10
> +#                             | extract(i, 2, 10)
> +#   %shimm8 5:s8 13:1 !function=expand_shimm8
> +#                          -- expand_shimm8(sextract(i, 5, 8) << 1
> +#                                           | extract(i, 13, 1))

Do we syntax-check for accidentally specifying a field-def whose
components overlap (eg "3:5 0:5")? I think we should, but I didn't
see a check in a quick scan through the parsing code.

> +#
> +## Argument set syntax:
> +#
> +# args_def    := '&' identifier ( args_elt )+
> +# args_elt    := identifier
> +#
> +# Each args_elt defines an argument within the argument set.
> +# Each argument set will be rendered as a C structure "arg_$name"
> +# with each of the fields being one of the member arguments.
> +#
> +# Argument set examples:
> +#
> +#   &reg3       ra rb rc
> +#   &loadstore  reg base offset
> +#
> +## Format syntax:
> +#
> +# fmt_def      := '@' identifier ( fmt_elt )+
> +# fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
> +# fixedbit_elt := [01.]+
> +# field_elt    := identifier ':' 's'? number
> +# field_ref    := '%' identifier | identifier '=' '%' identifier
> +# args_ref     := '&' identifier
> +#
> +# Defining a format is a handy way to avoid replicating groups of fields
> +# across many instruction patterns.
> +#
> +# A fixedbit_elt describes a contiguous sequence of bits that must
> +# be 1, 0, or "." for don't care.
> +#
> +# A field_elt describes a simple field only given a width; the position of
> +# the field is implied by its position with respect to other fixedbit_elt
> +# and field_elt.
> +#
> +# If any fixedbit_elt or field_elt appear then all 32 bits must be defined.
> +# Padding with a fixedbit_elt of all '.' is an easy way to accomplish that.

What is a format that doesn't specify the full 32 bits useful for?
Do you have an example of one?

> +#
> +# A field_ref incorporates a field by reference.  This is the only way to
> +# add a complex field to a format.  A field may be renamed in the process
> +# via assignment to another identifier.  This is intended to allow the
> +# same argument set be used with disjoint named fields.
> +#
> +# A single args_ref may specify an argument set to use for the format.
> +# The set of fields in the format must be a subset of the arguments in
> +# the argument set.  If an argument set is not specified, one will be
> +# inferred from the set of fields.
> +#
> +# It is recommended, but not required, that all field_ref and args_ref
> +# appear at the end of the line, not interleaving with fixedbit_elf or
> +# field_elt.
> +#
> +# Format examples:
> +#
> +#   @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
> +#   @opi    ...... ra:5 lit:8    1 ....... rc:5
> +#
> +## Pattern syntax:
> +#
> +# pat_def      := identifier ( pat_elt )+
> +# pat_elt      := fixedbit_elt | field_elt | field_ref
> +#               | args_ref | fmt_ref | const_elt
> +# fmt_ref      := '@' identifier
> +# const_elt    := identifier '=' number
> +#
> +# The fixedbit_elt and field_elt specifiers are unchanged from formats.
> +# A pattern that does not specify a named format will have one inferred
> +# from a referenced argument set (if present) and the set of fields.
> +#
> +# A const_elt allows a argument to be set to a constant value.  This may
> +# come in handy when fields overlap between patterns and one has to
> +# include the values in the fixedbit_elt instead.
> +#
> +# The decoder will call a translator function for each pattern matched.
> +#
> +# Pattern examples:
> +#
> +#   addl_r   010000 ..... ..... .... 0000000 ..... @opr
> +#   addl_i   010000 ..... ..... .... 0000000 ..... @opi
> +#
> +# which will, in part, invoke
> +#
> +#   trans_addl_r(ctx, &arg_opr, insn)
> +# and
> +#   trans_addl_i(ctx, &arg_opi, insn)
> +#

I notice in the generated code that all the trans_FOO functions
are global, not file-local. That seems like it's going to lead
to name clashes down the line, especially if/when we ever get
to supporting multiple different target architectures in a single
QEMU binary.

Also from the generated code, "arg_incdec2_pred" &c don't follow
our coding style preference for CamelCase for typedef names. On
the other hand it's not immediately obvious how best to pick
a camelcase approach for them...

> +if sys.version_info >= (3, 0):
> +    re_fullmatch = re.fullmatch
> +else:
> +    def re_fullmatch(pat, str):
> +        return re.match('^' + pat + '$', str)
> +
> +def output_autogen():
> +    output('/* This file is autogenerated.  */\n\n')

"autogenerated by decodetree.py" might assist some future
person in tracking down how it got generated.

> +
> +def parse_generic(lineno, is_format, name, toks):
> +    """Parse one instruction format from TOKS at LINENO"""
> +    global fields
> +    global arguments
> +    global formats
> +    global patterns
> +    global re_ident
> +
> +    fixedmask = 0
> +    fixedbits = 0
> +    width = 0
> +    flds = {}
> +    arg = None
> +    fmt = None
> +    for t in toks:
> +        # '&Foo' gives a format an explcit argument set.

"explicit"

> +
> +def main():
> +    global arguments
> +    global formats
> +    global patterns
> +    global translate_prefix
> +    global output_file
> +
> +    h_file = None
> +    c_file = None
> +    decode_function = 'decode'
> +
> +    long_opts = [ 'decode=', 'translate=', 'header=', 'output=' ]
> +    try:
> +        (opts, args) = getopt.getopt(sys.argv[1:], 'h:o:', long_opts)
> +    except getopt.GetoptError as err:
> +        error(0, err)
> +    for o, a in opts:
> +        if o in ('-h', '--header'):
> +            h_file = a
> +        elif o in ('-o', '--output'):
> +            c_file = a
> +        elif o == '--decode':
> +            decode_function = a
> +        elif o == '--translate':
> +            translate_prefix = a
> +        else:
> +            assert False, 'unhandled option'

A --help would be helpful (as would documenting the command
line syntax in the comment at the top of the file).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches
  2018-01-11 19:34       ` Richard Henderson
@ 2018-01-12 12:42         ` Peter Maydell
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Maydell @ 2018-01-12 12:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 11 January 2018 at 19:34, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/11/2018 11:27 AM, Peter Maydell wrote:
>> tgt-arm-cplx ? I had a look at that but it seemed to be a different
>> set of patches to this lot.
>
> No, tgt-arm-sve-{1,2,3}.  All of them should build.

Thank you. I've now had a look at the generated code and
reviewed patch 1. I had a quick look through some of the
later patches, mostly to check how the use of the script
looks, but I don't intend to review them properly at this
point (unless you think I should).

PS: a quote which kept coming to mind while I was reading
this patchset:

  'Er, what does the Z mean?' said Zaphod.
  'Which one?'
  'Any one.'

thanks
-- PMM

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py
  2018-01-12 11:57   ` Peter Maydell
@ 2018-01-12 14:54     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2018-01-12 14:54 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 01/12/2018 03:57 AM, Peter Maydell wrote:
> I asked on #qemu-devel for some review from people who are more
> familiar with Python than I am. One of the suggestions (from
> Marc-André Lureau) was to run pycodestyle on this and fix the
> (mostly coding style nits) reported by it. (pycodestyle may
> be called 'pep8' on older distros.)

Thanks, I'll have a look.

>> +# For unnamed_field, the first number is the least-significant bit position of
>> +# the field and the second number is the length of the field.  If the 's' is
>> +# present, the field is considered signed.  If multiple unnamed_fields are
>> +# present, they are concatenated.  In this way one can define disjoint fields.
> 
> This syntax lets you specify that fields other than the first one in
> a concatenated set are signed, like
>     10:5 | 3:s5
> That doesn't seem to me like it's meaningful. Shouldn't the signedness
> or otherwise be a property of the whole extracted field, rather than
> an individual component of it? (In practice creating a signed combined
> value is implemented by doing the most-significant component as sextract,
> obviously.)

You're right that it's not especially meaningful.  But since I use deposit to
compose the pieces, any extraneous sign on a less significant component gets
smooshed.  So nothing bad happens in the end.  Which is why I decided not to check.

> Do we syntax-check for accidentally specifying a field-def whose
> components overlap (eg "3:5 0:5")? I think we should, but I didn't
> see a check in a quick scan through the parsing code.

Probably not...  something else for unit testing.

>> +# If any fixedbit_elt or field_elt appear then all 32 bits must be defined.
>> +# Padding with a fixedbit_elt of all '.' is an easy way to accomplish that.
> 
> What is a format that doesn't specify the full 32 bits useful for?
> Do you have an example of one?

No.  I'm not sure what I was thinking of there.  I'm pretty sure the code
doesn't allow that.

> I notice in the generated code that all the trans_FOO functions
> are global, not file-local. That seems like it's going to lead
> to name clashes down the line, especially if/when we ever get
> to supporting multiple different target architectures in a single
> QEMU binary.

I was initially thinking that I'd have the translator functions in a different
file, and because of that they would of course have to be global.  I had
thought far enough ahead to add command-line options to change the names and
prefixes.

But as it has turned out, putting the translator functions into the same file
has worked out well.  I should probably rearrange this.

> Also from the generated code, "arg_incdec2_pred" &c don't follow
> our coding style preference for CamelCase for typedef names. On
> the other hand it's not immediately obvious how best to pick
> a camelcase approach for them...

Yeah, auto-generating names in different ways is tricky.

> A --help would be helpful (as would documenting the command
> line syntax in the comment at the top of the file).

Sure.

Thanks!


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton
  2018-01-11 19:12     ` Richard Henderson
@ 2018-01-12 16:12       ` Bastian Koppelmann
  2018-01-12 18:59         ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Bastian Koppelmann @ 2018-01-12 16:12 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 01/11/2018 08:12 PM, Richard Henderson wrote:
> On 01/11/2018 10:20 AM, Peter Maydell wrote:
>> On 18 December 2017 at 17:45, Richard Henderson
>> <richard.henderson@linaro.org> wrote:
> 
> I didn't want to attempt to generalize this until we have two users.
> Particularly if we wind up with extra options to the script to change other
> behavior.

I might be exactly the second user. I think there is little sense having
two generator scripts (one for ARM, and one for RISCV). Unfortunately I
missed the original mail of Richard and can't find it anywhere in my
mailbox. What is the best way to leave some review comment's for
Richards patch containing scripts/decodetree.py, since I cannot inline
it in a mail?

Cheers,
Bastian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton
  2018-01-12 16:12       ` Bastian Koppelmann
@ 2018-01-12 18:59         ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2018-01-12 18:59 UTC (permalink / raw)
  To: Bastian Koppelmann, Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 01/12/2018 08:12 AM, Bastian Koppelmann wrote:
> On 01/11/2018 08:12 PM, Richard Henderson wrote:
>> On 01/11/2018 10:20 AM, Peter Maydell wrote:
>>> On 18 December 2017 at 17:45, Richard Henderson
>>> <richard.henderson@linaro.org> wrote:
>>
>> I didn't want to attempt to generalize this until we have two users.
>> Particularly if we wind up with extra options to the script to change other
>> behavior.
> 
> I might be exactly the second user. I think there is little sense having
> two generator scripts (one for ARM, and one for RISCV). Unfortunately I
> missed the original mail of Richard and can't find it anywhere in my
> mailbox. What is the best way to leave some review comment's for
> Richards patch containing scripts/decodetree.py, since I cannot inline
> it in a mail?

I've halfway done with a conversion of target/hppa/translate.c to decodetree as
well.  PA-RISC decoding is particularly ugly, and I did it with tables the
first time around.

Just send mail with "decodetree.py" in the subject somewhere and remember to CC
me.  FWIW, the original mail is http://patchwork.ozlabs.org/patch/850325/.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-01-12 18:59 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-18 17:45 [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 01/23] scripts: Add decodetree.py Richard Henderson
2018-01-11 18:06   ` Peter Maydell
2018-01-11 19:10     ` Richard Henderson
2018-01-11 19:21       ` Peter Maydell
2018-01-11 19:26         ` Richard Henderson
2018-01-12 10:53       ` Peter Maydell
2018-01-12 11:57   ` Peter Maydell
2018-01-12 14:54     ` Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 02/23] target/arm: Add SVE decode skeleton Richard Henderson
2018-01-11 18:20   ` Peter Maydell
2018-01-11 19:12     ` Richard Henderson
2018-01-12 16:12       ` Bastian Koppelmann
2018-01-12 18:59         ` Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 03/23] target/arm: Implement SVE Bitwise Logical - Unpredicated Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 04/23] target/arm: Implement PTRUE, PFALSE, SETFFR Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 05/23] target/arm: Implement SVE predicate logical operations Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 06/23] target/arm: Implement SVE load vector/predicate Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 07/23] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 08/23] target/arm: Handle SVE registers in write_fp_dreg Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 09/23] target/arm: Handle SVE registers when using clear_vec_high Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 10/23] target/arm: Implement SVE Integer Reduction Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 11/23] target/arm: Implement SVE bitwise shift by immediate (predicated) Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 12/23] target/arm: Implement SVE bitwise shift by vector (predicated) Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 13/23] target/arm: Implement SVE bitwise shift by wide elements (predicated) Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 14/23] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 15/23] target/arm: Implement SVE Integer Multiply-Add Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 16/23] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 17/23] target/arm: Implement SVE Index Generation Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 18/23] target/arm: Implement SVE Stack Allocation Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 19/23] target/arm: Implement SVE Bitwise Shift - Unpredicated Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 20/23] target/arm: Implement SVE Compute Vector Address Group Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 21/23] target/arm: Implement SVE floating-point exponential accelerator Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 22/23] target/arm: Implement SVE floating-point trig select coefficient Richard Henderson
2017-12-18 17:45 ` [Qemu-devel] [PATCH 23/23] target/arm: Implement SVE Element Count Group, register destinations Richard Henderson
2018-01-11 17:56 ` [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Peter Maydell
2018-01-11 19:23   ` Richard Henderson
2018-01-11 19:27     ` Peter Maydell
2018-01-11 19:34       ` Richard Henderson
2018-01-12 12:42         ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.