All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [Consult] tilegx: About floating point instructions
@ 2015-08-01  9:47 Chen Gang
  2015-08-03 16:40 ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-01  9:47 UTC (permalink / raw)
  To: Chris Metcalf, Peter Maydell, rth, Andreas Färber, walt; +Cc: qemu-devel

Hello All:

I am just adding floating point instructions (e.g. fsingle_add1), but
for me, I can not find any details about them (the ISA documents only
give a summary description, but not details), e.g.

  fsingle_add1

  Floating Point Single Precision Add Part 1

  Syntax
         fsingle_add1 Dest, SrcA, SrcB

  Example
         fsingle_add1 r5, r6, r7

  Description

  Performs the first part of a floating point single precision add. This
  instruction also sets the floating point comparison flags in the
  destination register (see Table 7-2 on page 135).

  Functional Description

        rf[Dest] = fsingle_addsub1 (rf[SrcA], rf[SrcB], false);

  (there is no additional information for fsingle_addsub1, and Table 7-2
  will set 'Dest', is 'Dest' only for the comparison flags?).

At present, for me, the only way for finding the related details are gcc
source. But I am just testing gcc (when testing gcc, I found I have to
implement floating point instructions), so it is not a quite well way.

Welcome any additional information, ideas or suggestions.


(By the way, I have implemented 'iret' instruction, for me, we can just
skip it for linux user. Also welcome any additional ideas, suggestions,
and completions for it).


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-01  9:47 [Qemu-devel] [Consult] tilegx: About floating point instructions Chen Gang
@ 2015-08-03 16:40 ` Richard Henderson
  2015-08-03 20:47   ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2015-08-03 16:40 UTC (permalink / raw)
  To: Chen Gang, Chris Metcalf, Peter Maydell, Andreas Färber, walt
  Cc: qemu-devel

On 08/01/2015 02:47 AM, Chen Gang wrote:
> I am just adding floating point instructions (e.g. fsingle_add1), but
> for me, I can not find any details about them (the ISA documents only
> give a summary description, but not details), e.g.

The tilegx splits the four/six cycle arithmetic into multiple black-box
instructions.  You need only really implement one of the four, with the
rest of them being implemented as nops or moves.

Looking at what gcc produces gives the hints:

	fdouble_unpack_min	min, srca, srcb
	fdouble_unpack_max	max, srca, srcb
	fdouble_add_flags	flg, srca, srcb
	fdouble_addsub		max, min, flg
	fdouble_pack1		dst, max, flg
	fdouble_pack2		dst, max, zero

The unpack, addsub, and pack2 insns can be ignored, the add_flags insn can
perform the whole operation, the pack1 insn performs a move from "flg" to "dst".

Similarly for the single-precision:

	fsingle_add1		tmp, srca, srcb
	fsingle_addsub2		tmp, srca, srcb
	fsingle_pack1		flg, tmp
	fsingle_pack2		dst, tmp, flg

The add1 insn performs the whole operation, the addsub2 and pack1 insns are
ignored, and the pack2 insn is a move from tmp to dst.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-03 16:40 ` Richard Henderson
@ 2015-08-03 20:47   ` Chen Gang
  2015-08-04 13:56     ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-03 20:47 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel

On 8/4/15 00:40, Richard Henderson wrote:
> On 08/01/2015 02:47 AM, Chen Gang wrote:
>> I am just adding floating point instructions (e.g. fsingle_add1),
>> but for me, I can not find any details about them (the ISA
>> documents only give a summary description, but not details), e.g.
> 
> The tilegx splits the four/six cycle arithmetic into multiple
> black-box instructions.  You need only really implement one of the
> four, with the rest of them being implemented as nops or moves.
> 
> Looking at what gcc produces gives the hints:
> 
> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
> 
> The unpack, addsub, and pack2 insns can be ignored, the add_flags
> insn can perform the whole operation, the pack1 insn performs a move
> from "flg" to "dst".
> 
> Similarly for the single-precision:
> 
> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
> 
> The add1 insn performs the whole operation, the addsub2 and pack1
> insns are ignored, and the pack2 insn is a move from tmp to dst.
> 

Thank you very much. I am just analyzing the template file (tilegx.md)
and testsuite executable file (20000603-1.exe) for it. Your information
is really valuable to me! :-)

And still welcome any additional information for it: especially related
details documentations which should be more 'standard' than the third-
party's implementation (e.g. gcc implementation).


I shall try to let qemu tilegx support gcc testsuite successfully within
this month.


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-03 20:47   ` Chen Gang
@ 2015-08-04 13:56     ` Chen Gang
  2015-08-04 15:04       ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-04 13:56 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel


On 8/4/15 04:47, Chen Gang wrote:
> On 8/4/15 00:40, Richard Henderson wrote:
>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>> but for me, I can not find any details about them (the ISA
>>> documents only give a summary description, but not details), e.g.
>>
>> The tilegx splits the four/six cycle arithmetic into multiple
>> black-box instructions.  You need only really implement one of the
>> four, with the rest of them being implemented as nops or moves.
>>
>> Looking at what gcc produces gives the hints:
>>
>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>
>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>> insn can perform the whole operation, the pack1 insn performs a move
>> from "flg" to "dst".
>>
>> Similarly for the single-precision:
>>
>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>
>> The add1 insn performs the whole operation, the addsub2 and pack1
>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>

After check the tilegx.md completely, for me, we still need implement
each of them precisely, or we can not emulate all cases (e.g. muldf3).

But it doesn't matter. Based on tilegx.md, the summary contents in ISA
document, and your explanation, I guess, it is enough for me to finish
them. :-)

And still welcome additional information for it.

Thanks.

> 
> Thank you very much. I am just analyzing the template file (tilegx.md)
> and testsuite executable file (20000603-1.exe) for it. Your information
> is really valuable to me! :-)
> 
> And still welcome any additional information for it: especially related
> details documentations which should be more 'standard' than the third-
> party's implementation (e.g. gcc implementation).
> 
> 
> I shall try to let qemu tilegx support gcc testsuite successfully within
> this month.
> 
> 
> Thanks.
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-04 13:56     ` Chen Gang
@ 2015-08-04 15:04       ` Richard Henderson
  2015-08-05 14:16         ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2015-08-04 15:04 UTC (permalink / raw)
  To: Chen Gang, Chris Metcalf, Peter Maydell, Andreas Färber, walt
  Cc: qemu-devel

On 08/04/2015 06:56 AM, Chen Gang wrote:
> 
> On 8/4/15 04:47, Chen Gang wrote:
>> On 8/4/15 00:40, Richard Henderson wrote:
>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>> but for me, I can not find any details about them (the ISA
>>>> documents only give a summary description, but not details), e.g.
>>>
>>> The tilegx splits the four/six cycle arithmetic into multiple
>>> black-box instructions.  You need only really implement one of the
>>> four, with the rest of them being implemented as nops or moves.
>>>
>>> Looking at what gcc produces gives the hints:
>>>
>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>
>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>> insn can perform the whole operation, the pack1 insn performs a move
>>> from "flg" to "dst".
>>>
>>> Similarly for the single-precision:
>>>
>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>
>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>
> 
> After check the tilegx.md completely, for me, we still need implement
> each of them precisely, or we can not emulate all cases (e.g. muldf3).

No, you can still implement all of muldf3 in fdouble_mul_flags.
Again, the fdouble_pack1 copies from the flag input to the output.

Yes, there is a 64-bit multiply in there, but the tcg optimizer
should be able to delete all of that as unused.  Especially if you have the
fdouble_unpack* insns store zero into their destinations.

Don't get me wrong -- more accurate implementation of the actual
insns would be nice, especially for debugging.  But if the insns
aren't accurately documented I don't see what choice we have.

On the good side, implementing the entire operation as part of the "flags" step
probably results in faster emulation.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-04 15:04       ` Richard Henderson
@ 2015-08-05 14:16         ` Chen Gang
  2015-08-08 17:23           ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-05 14:16 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel

On 8/4/15 23:04, Richard Henderson wrote:
> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>
>> On 8/4/15 04:47, Chen Gang wrote:
>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>> but for me, I can not find any details about them (the ISA
>>>>> documents only give a summary description, but not details), e.g.
>>>>
>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>> black-box instructions.  You need only really implement one of the
>>>> four, with the rest of them being implemented as nops or moves.
>>>>
>>>> Looking at what gcc produces gives the hints:
>>>>
>>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>>
>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>> from "flg" to "dst".
>>>>
>>>> Similarly for the single-precision:
>>>>
>>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>>
>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>
>>
>> After check the tilegx.md completely, for me, we still need implement
>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
> 
> No, you can still implement all of muldf3 in fdouble_mul_flags.
> Again, the fdouble_pack1 copies from the flag input to the output.
> 
> Yes, there is a 64-bit multiply in there, but the tcg optimizer
> should be able to delete all of that as unused.  Especially if you have the
> fdouble_unpack* insns store zero into their destinations.
> 

For me, I am not quite sure. But I guess, what you said should be OK (at
least, what you said is very useful for the implementation).


> Don't get me wrong -- more accurate implementation of the actual
> insns would be nice, especially for debugging.  But if the insns
> aren't accurately documented I don't see what choice we have.
> 

For me, I guess, we can still try to implement the details.

 - The document has all floating point instructions' summary, so we can
   think of, or guess its implementation entirely.

 - gcc uses them all and completely, so it is our good sample and good
   reference (but we should not assume gcc must be correct, since we
   just use qemu for gcc testsuite).

 - Tilegx floating point format should be standard (at least, reference
   to the standard format), so we can reference the related information
   from google/baidu.


> On the good side, implementing the entire operation as part of the "flags" step
> probably results in faster emulation.
> 

I guess so, too.


I shall try to finish the simple implementation, firstly. Then try to
implement the floating point instructions in details in the future (it
should be lower priority).


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-05 14:16         ` Chen Gang
@ 2015-08-08 17:23           ` Chen Gang
  2015-08-09  1:10             ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-08 17:23 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel

Hello all:

Below is my current idea for all floating point insns. For me, it is not
the precise implementation, even not completely implement -- assume pack
insns can only for packing (u)int32_t when they are used individually:

  fsingle_add1        ; return calc flags, save calc result to env.

  fsingle_sub1        ; return calc flags, save calc result to env.

  fsingle_addsub2     ; set "has result" flag.

  fsingle_mul1        ; skip return value, save calc result to env.
                        set "has result" flag.

  fsingle_mul2        ; skipped.


  fsingle_pack1       ; skipped.

  fsingle_pack1       ; if "has result"
                            reset "has result" flag.
                            return calc result from env.
                        else
                            pack srca 
                            reference from tilegx.md: float(uns)sisf2.
                            get (u)int32_t a, then (u)int32_to_float32.

  fdouble_unpack_max: ; skipped.

  fdouble_unpack_min: ; skipped.

  fdouble_add_flags:  ; return calc flags, save calc result to env.

  fdouble_sub_flags:  ; return calc flags, save calc result to env.

  fdouble_addsub:     ; set "has result" flag.

  fdouble_mul_flags:  ; skip return flags, save calc result to env.
                        set "has result" flag.

  fdouble_pack1:      ; if "has result" 
                            reset "has result" flag.
                            return calc result from env.
                        else
                            pack srca and srcb.
                            reference from tilegx.md: float(uns)sidf2.
                            get (u)int32_t a, then (u)int32_to_float64.

  fdouble_pack2:      ; skipped.


  (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually,
   e.g gcc testsuit for complex number).


Next, I shall implement the floating point insns, welcome any related
ideas, suggestions, and completions.

Thanks.


On 8/5/15 22:16, Chen Gang wrote:
> On 8/4/15 23:04, Richard Henderson wrote:
>> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>>
>>> On 8/4/15 04:47, Chen Gang wrote:
>>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>>> but for me, I can not find any details about them (the ISA
>>>>>> documents only give a summary description, but not details), e.g.
>>>>>
>>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>>> black-box instructions.  You need only really implement one of the
>>>>> four, with the rest of them being implemented as nops or moves.
>>>>>
>>>>> Looking at what gcc produces gives the hints:
>>>>>
>>>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>>>
>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>>> from "flg" to "dst".
>>>>>
>>>>> Similarly for the single-precision:
>>>>>
>>>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>>>
>>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>>
>>>
>>> After check the tilegx.md completely, for me, we still need implement
>>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
>>
>> No, you can still implement all of muldf3 in fdouble_mul_flags.
>> Again, the fdouble_pack1 copies from the flag input to the output.
>>
>> Yes, there is a 64-bit multiply in there, but the tcg optimizer
>> should be able to delete all of that as unused.  Especially if you have the
>> fdouble_unpack* insns store zero into their destinations.
>>
> 
> For me, I am not quite sure. But I guess, what you said should be OK (at
> least, what you said is very useful for the implementation).
> 
> 
>> Don't get me wrong -- more accurate implementation of the actual
>> insns would be nice, especially for debugging.  But if the insns
>> aren't accurately documented I don't see what choice we have.
>>
> 
> For me, I guess, we can still try to implement the details.
> 
>  - The document has all floating point instructions' summary, so we can
>    think of, or guess its implementation entirely.
> 
>  - gcc uses them all and completely, so it is our good sample and good
>    reference (but we should not assume gcc must be correct, since we
>    just use qemu for gcc testsuite).
> 
>  - Tilegx floating point format should be standard (at least, reference
>    to the standard format), so we can reference the related information
>    from google/baidu.
> 
> 
>> On the good side, implementing the entire operation as part of the "flags" step
>> probably results in faster emulation.
>>
> 
> I guess so, too.
> 
> 
> I shall try to finish the simple implementation, firstly. Then try to
> implement the floating point instructions in details in the future (it
> should be lower priority).
> 
> 
> Thanks.
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-08 17:23           ` Chen Gang
@ 2015-08-09  1:10             ` Chen Gang
  2015-08-09  1:14               ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-09  1:10 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel


On 8/9/15 01:23, Chen Gang wrote:
> Hello all:
> 
> Below is my current idea for all floating point insns. For me, it is not
> the precise implementation, even not completely implement -- assume pack
> insns can only for packing (u)int32_t when they are used individually:
> 
>   fsingle_add1        ; return calc flags, save calc result to env.
> 
>   fsingle_sub1        ; return calc flags, save calc result to env.
> 
>   fsingle_addsub2     ; set "has result" flag.
> 
>   fsingle_mul1        ; skip return value, save calc result to env.
>                         set "has result" flag.
> 
>   fsingle_mul2        ; skipped.
> 
> 
>   fsingle_pack1       ; skipped.
> 
>   fsingle_pack1       ; if "has result"
>                             reset "has result" flag.
>                             return calc result from env.
>                         else
>                             pack srca 
>                             reference from tilegx.md: float(uns)sisf2.
>                             get (u)int32_t a, then (u)int32_to_float32.

For "pack srca and srcb", the related demo like below (srca and srcb
are uint64_t):

    switch (srca & 0x3ff) {

    /* treat it as uint32_t */
    case 0x9e:
        return uint32_to_float32(srca >> 32, &FP_STATUS);

    /* treat it as int32_t, must be negative number */
    case 0x29e:
        return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS);

    default:
        unimplemented (gen_exception).
    }

> 
>   fdouble_unpack_max: ; skipped.
> 
>   fdouble_unpack_min: ; skipped.
> 
>   fdouble_add_flags:  ; return calc flags, save calc result to env.
> 
>   fdouble_sub_flags:  ; return calc flags, save calc result to env.
> 
>   fdouble_addsub:     ; set "has result" flag.
> 
>   fdouble_mul_flags:  ; skip return flags, save calc result to env.
>                         set "has result" flag.
> 
>   fdouble_pack1:      ; if "has result" 
>                             reset "has result" flag.
>                             return calc result from env.
>                         else
>                             pack srca and srcb.
>                             reference from tilegx.md: float(uns)sidf2.
>                             get (u)int32_t a, then (u)int32_to_float64.
>
 
For "pack srca and srcb", the related demo like below (srca and srcb
are uint64_t):

    switch (srcb & 0xffff) {

    /* treat it as uint32_t */
    case 0x21b00:
        return uint32_to_float64(srca >> 4, &FP_STATUS);

    /* treat it as int32_t, must be negative number */
    case 0xa1b00:
        return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS);

    default:
        unimplemented (gen_exception).
    }

>   fdouble_pack2:      ; skipped.
> 
> 
>   (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually,
>    e.g gcc testsuit for complex number).
> 
> 
> Next, I shall implement the floating point insns, welcome any related
> ideas, suggestions, and completions.
> 
> Thanks.
> 
> 
> On 8/5/15 22:16, Chen Gang wrote:
>> On 8/4/15 23:04, Richard Henderson wrote:
>>> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>>>
>>>> On 8/4/15 04:47, Chen Gang wrote:
>>>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>>>> but for me, I can not find any details about them (the ISA
>>>>>>> documents only give a summary description, but not details), e.g.
>>>>>>
>>>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>>>> black-box instructions.  You need only really implement one of the
>>>>>> four, with the rest of them being implemented as nops or moves.
>>>>>>
>>>>>> Looking at what gcc produces gives the hints:
>>>>>>
>>>>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>>>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>>>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>>>>
>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>>>> from "flg" to "dst".
>>>>>>
>>>>>> Similarly for the single-precision:
>>>>>>
>>>>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>>>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>>>>
>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>>>
>>>>
>>>> After check the tilegx.md completely, for me, we still need implement
>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
>>>
>>> No, you can still implement all of muldf3 in fdouble_mul_flags.
>>> Again, the fdouble_pack1 copies from the flag input to the output.
>>>
>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer
>>> should be able to delete all of that as unused.  Especially if you have the
>>> fdouble_unpack* insns store zero into their destinations.
>>>
>>
>> For me, I am not quite sure. But I guess, what you said should be OK (at
>> least, what you said is very useful for the implementation).
>>
>>
>>> Don't get me wrong -- more accurate implementation of the actual
>>> insns would be nice, especially for debugging.  But if the insns
>>> aren't accurately documented I don't see what choice we have.
>>>
>>
>> For me, I guess, we can still try to implement the details.
>>
>>  - The document has all floating point instructions' summary, so we can
>>    think of, or guess its implementation entirely.
>>
>>  - gcc uses them all and completely, so it is our good sample and good
>>    reference (but we should not assume gcc must be correct, since we
>>    just use qemu for gcc testsuite).
>>
>>  - Tilegx floating point format should be standard (at least, reference
>>    to the standard format), so we can reference the related information
>>    from google/baidu.
>>
>>
>>> On the good side, implementing the entire operation as part of the "flags" step
>>> probably results in faster emulation.
>>>
>>
>> I guess so, too.
>>
>>
>> I shall try to finish the simple implementation, firstly. Then try to
>> implement the floating point instructions in details in the future (it
>> should be lower priority).
>>
>>
>> Thanks.
>>
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-09  1:10             ` Chen Gang
@ 2015-08-09  1:14               ` Chen Gang
  2015-08-11 13:18                 ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-09  1:14 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel

On 8/9/15 09:10, Chen Gang wrote:
> 
> On 8/9/15 01:23, Chen Gang wrote:
>> Hello all:
>>
>> Below is my current idea for all floating point insns. For me, it is not
>> the precise implementation, even not completely implement -- assume pack
>> insns can only for packing (u)int32_t when they are used individually:
>>
>>   fsingle_add1        ; return calc flags, save calc result to env.
>>
>>   fsingle_sub1        ; return calc flags, save calc result to env.
>>
>>   fsingle_addsub2     ; set "has result" flag.
>>
>>   fsingle_mul1        ; skip return value, save calc result to env.
>>                         set "has result" flag.
>>
>>   fsingle_mul2        ; skipped.
>>
>>
>>   fsingle_pack1       ; skipped.
>>
>>   fsingle_pack1       ; if "has result"
>>                             reset "has result" flag.
>>                             return calc result from env.
>>                         else
>>                             pack srca 
>>                             reference from tilegx.md: float(uns)sisf2.
>>                             get (u)int32_t a, then (u)int32_to_float32.
> 
> For "pack srca and srcb", the related demo like below (srca and srcb
> are uint64_t):
> 

Oh, sorry, for "pack srca" (not for "pack srca and srcb")

>     switch (srca & 0x3ff) {
> 
>     /* treat it as uint32_t */
>     case 0x9e:
>         return uint32_to_float32(srca >> 32, &FP_STATUS);
> 
>     /* treat it as int32_t, must be negative number */
>     case 0x29e:
>         return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS);
> 
>     default:
>         unimplemented (gen_exception).
>     }
> 
>>
>>   fdouble_unpack_max: ; skipped.
>>
>>   fdouble_unpack_min: ; skipped.
>>
>>   fdouble_add_flags:  ; return calc flags, save calc result to env.
>>
>>   fdouble_sub_flags:  ; return calc flags, save calc result to env.
>>
>>   fdouble_addsub:     ; set "has result" flag.
>>
>>   fdouble_mul_flags:  ; skip return flags, save calc result to env.
>>                         set "has result" flag.
>>
>>   fdouble_pack1:      ; if "has result" 
>>                             reset "has result" flag.
>>                             return calc result from env.
>>                         else
>>                             pack srca and srcb.
>>                             reference from tilegx.md: float(uns)sidf2.
>>                             get (u)int32_t a, then (u)int32_to_float64.
>>
>  
> For "pack srca and srcb", the related demo like below (srca and srcb
> are uint64_t):
> 
>     switch (srcb & 0xffff) {
> 

Oh, sorry, should use 0xfffff instead of 0xffff.

>     /* treat it as uint32_t */
>     case 0x21b00:
>         return uint32_to_float64(srca >> 4, &FP_STATUS);
> 
>     /* treat it as int32_t, must be negative number */
>     case 0xa1b00:
>         return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS);
> 
>     default:
>         unimplemented (gen_exception).
>     }
> 
>>   fdouble_pack2:      ; skipped.
>>
>>
>>   (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually,
>>    e.g gcc testsuit for complex number).
>>
>>
>> Next, I shall implement the floating point insns, welcome any related
>> ideas, suggestions, and completions.
>>
>> Thanks.
>>
>>
>> On 8/5/15 22:16, Chen Gang wrote:
>>> On 8/4/15 23:04, Richard Henderson wrote:
>>>> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>>>>
>>>>> On 8/4/15 04:47, Chen Gang wrote:
>>>>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>>>>> but for me, I can not find any details about them (the ISA
>>>>>>>> documents only give a summary description, but not details), e.g.
>>>>>>>
>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>>>>> black-box instructions.  You need only really implement one of the
>>>>>>> four, with the rest of them being implemented as nops or moves.
>>>>>>>
>>>>>>> Looking at what gcc produces gives the hints:
>>>>>>>
>>>>>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>>>>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>>>>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>>>>>
>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>>>>> from "flg" to "dst".
>>>>>>>
>>>>>>> Similarly for the single-precision:
>>>>>>>
>>>>>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>>>>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>>>>>
>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>>>>
>>>>>
>>>>> After check the tilegx.md completely, for me, we still need implement
>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
>>>>
>>>> No, you can still implement all of muldf3 in fdouble_mul_flags.
>>>> Again, the fdouble_pack1 copies from the flag input to the output.
>>>>
>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer
>>>> should be able to delete all of that as unused.  Especially if you have the
>>>> fdouble_unpack* insns store zero into their destinations.
>>>>
>>>
>>> For me, I am not quite sure. But I guess, what you said should be OK (at
>>> least, what you said is very useful for the implementation).
>>>
>>>
>>>> Don't get me wrong -- more accurate implementation of the actual
>>>> insns would be nice, especially for debugging.  But if the insns
>>>> aren't accurately documented I don't see what choice we have.
>>>>
>>>
>>> For me, I guess, we can still try to implement the details.
>>>
>>>  - The document has all floating point instructions' summary, so we can
>>>    think of, or guess its implementation entirely.
>>>
>>>  - gcc uses them all and completely, so it is our good sample and good
>>>    reference (but we should not assume gcc must be correct, since we
>>>    just use qemu for gcc testsuite).
>>>
>>>  - Tilegx floating point format should be standard (at least, reference
>>>    to the standard format), so we can reference the related information
>>>    from google/baidu.
>>>
>>>
>>>> On the good side, implementing the entire operation as part of the "flags" step
>>>> probably results in faster emulation.
>>>>
>>>
>>> I guess so, too.
>>>
>>>
>>> I shall try to finish the simple implementation, firstly. Then try to
>>> implement the floating point instructions in details in the future (it
>>> should be lower priority).
>>>
>>>
>>> Thanks.
>>>
>>
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-09  1:14               ` Chen Gang
@ 2015-08-11 13:18                 ` Chen Gang
  2015-08-13 14:59                   ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-11 13:18 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel


Oh, it seems a little complex, for a testsuite case, it lets double add
and double mul together! We need save more information for the correct
calculation in pack1.

It is 20020314-1.exe, the related code (I guess it is correct):

        ...

        fdouble_unpack_max      r10, r3, zero
.LVL2:
        fdouble_unpack_max      r15, r2, zero
        fdouble_add_flags       r12, r0, r1
        mul_hu_lu       r13, r15, r10
        mul_lu_lu       r16, r15, r10
        mula_hu_lu      r13, r10, r15
        fdouble_unpack_min      r11, r0, r1
        {
        shli    r14, r13, 32
        fdouble_unpack_max      r17, r0, r1
        }
        {
        mul_hu_hu       r15, r15, r10
        add     r16, r16, r14
        }
        {
        shrui   r13, r13, 32
        fdouble_addsub  r17, r11, r12
        }
        {
        cmpltu  r14, r16, r14
        fdouble_mul_flags       r3, r2, r3
        }
.LVL3:
        {
        add     r13, r15, r13
        fdouble_pack1   r12, r17, r12
        }
        {
        add     r13, r13, r14
        fdouble_unpack_max      r10, r0, zero
        }
        fdouble_pack1   r3, r13, r3
        fdouble_pack2   r12, r17, zero
        fdouble_pack2   r3, r13, r16

        ... 

Welcome any additional ideas, suggestions and completions.

Thanks.

On 8/9/15 09:14, Chen Gang wrote:
> On 8/9/15 09:10, Chen Gang wrote:
>>
>> On 8/9/15 01:23, Chen Gang wrote:
>>> Hello all:
>>>
>>> Below is my current idea for all floating point insns. For me, it is not
>>> the precise implementation, even not completely implement -- assume pack
>>> insns can only for packing (u)int32_t when they are used individually:
>>>
>>>   fsingle_add1        ; return calc flags, save calc result to env.
>>>
>>>   fsingle_sub1        ; return calc flags, save calc result to env.
>>>
>>>   fsingle_addsub2     ; set "has result" flag.
>>>
>>>   fsingle_mul1        ; skip return value, save calc result to env.
>>>                         set "has result" flag.
>>>
>>>   fsingle_mul2        ; skipped.
>>>
>>>
>>>   fsingle_pack1       ; skipped.
>>>
>>>   fsingle_pack1       ; if "has result"
>>>                             reset "has result" flag.
>>>                             return calc result from env.
>>>                         else
>>>                             pack srca 
>>>                             reference from tilegx.md: float(uns)sisf2.
>>>                             get (u)int32_t a, then (u)int32_to_float32.
>>
>> For "pack srca and srcb", the related demo like below (srca and srcb
>> are uint64_t):
>>
> 
> Oh, sorry, for "pack srca" (not for "pack srca and srcb")
> 
>>     switch (srca & 0x3ff) {
>>
>>     /* treat it as uint32_t */
>>     case 0x9e:
>>         return uint32_to_float32(srca >> 32, &FP_STATUS);
>>
>>     /* treat it as int32_t, must be negative number */
>>     case 0x29e:
>>         return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS);
>>
>>     default:
>>         unimplemented (gen_exception).
>>     }
>>
>>>
>>>   fdouble_unpack_max: ; skipped.
>>>
>>>   fdouble_unpack_min: ; skipped.
>>>
>>>   fdouble_add_flags:  ; return calc flags, save calc result to env.
>>>
>>>   fdouble_sub_flags:  ; return calc flags, save calc result to env.
>>>
>>>   fdouble_addsub:     ; set "has result" flag.
>>>
>>>   fdouble_mul_flags:  ; skip return flags, save calc result to env.
>>>                         set "has result" flag.
>>>
>>>   fdouble_pack1:      ; if "has result" 
>>>                             reset "has result" flag.
>>>                             return calc result from env.
>>>                         else
>>>                             pack srca and srcb.
>>>                             reference from tilegx.md: float(uns)sidf2.
>>>                             get (u)int32_t a, then (u)int32_to_float64.
>>>
>>  
>> For "pack srca and srcb", the related demo like below (srca and srcb
>> are uint64_t):
>>
>>     switch (srcb & 0xffff) {
>>
> 
> Oh, sorry, should use 0xfffff instead of 0xffff.
> 
>>     /* treat it as uint32_t */
>>     case 0x21b00:
>>         return uint32_to_float64(srca >> 4, &FP_STATUS);
>>
>>     /* treat it as int32_t, must be negative number */
>>     case 0xa1b00:
>>         return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS);
>>
>>     default:
>>         unimplemented (gen_exception).
>>     }
>>
>>>   fdouble_pack2:      ; skipped.
>>>
>>>
>>>   (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually,
>>>    e.g gcc testsuit for complex number).
>>>
>>>
>>> Next, I shall implement the floating point insns, welcome any related
>>> ideas, suggestions, and completions.
>>>
>>> Thanks.
>>>
>>>
>>> On 8/5/15 22:16, Chen Gang wrote:
>>>> On 8/4/15 23:04, Richard Henderson wrote:
>>>>> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>>>>>
>>>>>> On 8/4/15 04:47, Chen Gang wrote:
>>>>>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>>>>>> but for me, I can not find any details about them (the ISA
>>>>>>>>> documents only give a summary description, but not details), e.g.
>>>>>>>>
>>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>>>>>> black-box instructions.  You need only really implement one of the
>>>>>>>> four, with the rest of them being implemented as nops or moves.
>>>>>>>>
>>>>>>>> Looking at what gcc produces gives the hints:
>>>>>>>>
>>>>>>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>>>>>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>>>>>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>>>>>>
>>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>>>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>>>>>> from "flg" to "dst".
>>>>>>>>
>>>>>>>> Similarly for the single-precision:
>>>>>>>>
>>>>>>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>>>>>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>>>>>>
>>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>>>>>
>>>>>>
>>>>>> After check the tilegx.md completely, for me, we still need implement
>>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
>>>>>
>>>>> No, you can still implement all of muldf3 in fdouble_mul_flags.
>>>>> Again, the fdouble_pack1 copies from the flag input to the output.
>>>>>
>>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer
>>>>> should be able to delete all of that as unused.  Especially if you have the
>>>>> fdouble_unpack* insns store zero into their destinations.
>>>>>
>>>>
>>>> For me, I am not quite sure. But I guess, what you said should be OK (at
>>>> least, what you said is very useful for the implementation).
>>>>
>>>>
>>>>> Don't get me wrong -- more accurate implementation of the actual
>>>>> insns would be nice, especially for debugging.  But if the insns
>>>>> aren't accurately documented I don't see what choice we have.
>>>>>
>>>>
>>>> For me, I guess, we can still try to implement the details.
>>>>
>>>>  - The document has all floating point instructions' summary, so we can
>>>>    think of, or guess its implementation entirely.
>>>>
>>>>  - gcc uses them all and completely, so it is our good sample and good
>>>>    reference (but we should not assume gcc must be correct, since we
>>>>    just use qemu for gcc testsuite).
>>>>
>>>>  - Tilegx floating point format should be standard (at least, reference
>>>>    to the standard format), so we can reference the related information
>>>>    from google/baidu.
>>>>
>>>>
>>>>> On the good side, implementing the entire operation as part of the "flags" step
>>>>> probably results in faster emulation.
>>>>>
>>>>
>>>> I guess so, too.
>>>>
>>>>
>>>> I shall try to finish the simple implementation, firstly. Then try to
>>>> implement the floating point instructions in details in the future (it
>>>> should be lower priority).
>>>>
>>>>
>>>> Thanks.
>>>>
>>>
>>
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-11 13:18                 ` Chen Gang
@ 2015-08-13 14:59                   ` Chen Gang
  2015-08-15  9:56                     ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-13 14:59 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel

Hello all:

For me, I guess for single insns, they are simple, and each calculation
insns group can not be mixed with each other. So current implementation
should be OK.

For double insns, I guess, only mul calculation can be mixed with other
calculation groups (add/sub groups or int2float/double groups), because
of optimization -- the mul calculation group have many insns.

So the implementation is below:

/*
 * Assume floating point mul operation group can mix with other groups.
 *
 * fdouble_unpack_max: ; skipped.
 *  
 * fdouble_unpack_min: ; skipped.
 *      
 * fdouble_add_flags:  ; move calc flags to dest.
 *                       save calc flags.
 *                       save calc addsub result.
 *
 * fdouble_sub_flags:  ; move calc flags to dest.
 *                       save calc flags.
 *                       save calc addsub result.
 *
 * fdouble_addsub:     ; move calc addsub result to dest.
 *                       set "addsub result" flag.
 *
 * fdouble_mul_flags:  ; move calc mul result to dest.
 *
 * fdouble_pack1:      ; if addsub result set
 *                         && srca == saved addsub result
 *                         && srcb == saved calc flags
 *                           move srca to dest.
 *                       else 
 *                           move srcb to dest.
 *
 * fdouble_pack2:      ; if srcb == r63 && "addsub result" flag
 *                           reset "addsub result" flag.
 *                       else if srcb == r63
 *                           pack srca dest (dest is orig srcb of pack1)
 *                           reference from tilegx.md: float(uns)sidf2.
 *                           get (u)int32_t a, then (u)int32_to_float64.
 *                       else
 *                           skipped.
 */


On 8/11/15 21:18, Chen Gang wrote:
> 
> Oh, it seems a little complex, for a testsuite case, it lets double add
> and double mul together! We need save more information for the correct
> calculation in pack1.
> 
> It is 20020314-1.exe, the related code (I guess it is correct):
> 
>         ...
> 
>         fdouble_unpack_max      r10, r3, zero
> .LVL2:
>         fdouble_unpack_max      r15, r2, zero
>         fdouble_add_flags       r12, r0, r1
>         mul_hu_lu       r13, r15, r10
>         mul_lu_lu       r16, r15, r10
>         mula_hu_lu      r13, r10, r15
>         fdouble_unpack_min      r11, r0, r1
>         {
>         shli    r14, r13, 32
>         fdouble_unpack_max      r17, r0, r1
>         }
>         {
>         mul_hu_hu       r15, r15, r10
>         add     r16, r16, r14
>         }
>         {
>         shrui   r13, r13, 32
>         fdouble_addsub  r17, r11, r12
>         }
>         {
>         cmpltu  r14, r16, r14
>         fdouble_mul_flags       r3, r2, r3
>         }
> .LVL3:
>         {
>         add     r13, r15, r13
>         fdouble_pack1   r12, r17, r12
>         }
>         {
>         add     r13, r13, r14
>         fdouble_unpack_max      r10, r0, zero
>         }
>         fdouble_pack1   r3, r13, r3
>         fdouble_pack2   r12, r17, zero
>         fdouble_pack2   r3, r13, r16
> 
>         ... 
> 
> Welcome any additional ideas, suggestions and completions.
> 
> Thanks.
> 
> On 8/9/15 09:14, Chen Gang wrote:
>> On 8/9/15 09:10, Chen Gang wrote:
>>>
>>> On 8/9/15 01:23, Chen Gang wrote:
>>>> Hello all:
>>>>
>>>> Below is my current idea for all floating point insns. For me, it is not
>>>> the precise implementation, even not completely implement -- assume pack
>>>> insns can only for packing (u)int32_t when they are used individually:
>>>>
>>>>   fsingle_add1        ; return calc flags, save calc result to env.
>>>>
>>>>   fsingle_sub1        ; return calc flags, save calc result to env.
>>>>
>>>>   fsingle_addsub2     ; set "has result" flag.
>>>>
>>>>   fsingle_mul1        ; skip return value, save calc result to env.
>>>>                         set "has result" flag.
>>>>
>>>>   fsingle_mul2        ; skipped.
>>>>
>>>>
>>>>   fsingle_pack1       ; skipped.
>>>>
>>>>   fsingle_pack1       ; if "has result"
>>>>                             reset "has result" flag.
>>>>                             return calc result from env.
>>>>                         else
>>>>                             pack srca 
>>>>                             reference from tilegx.md: float(uns)sisf2.
>>>>                             get (u)int32_t a, then (u)int32_to_float32.
>>>
>>> For "pack srca and srcb", the related demo like below (srca and srcb
>>> are uint64_t):
>>>
>>
>> Oh, sorry, for "pack srca" (not for "pack srca and srcb")
>>
>>>     switch (srca & 0x3ff) {
>>>
>>>     /* treat it as uint32_t */
>>>     case 0x9e:
>>>         return uint32_to_float32(srca >> 32, &FP_STATUS);
>>>
>>>     /* treat it as int32_t, must be negative number */
>>>     case 0x29e:
>>>         return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS);
>>>
>>>     default:
>>>         unimplemented (gen_exception).
>>>     }
>>>
>>>>
>>>>   fdouble_unpack_max: ; skipped.
>>>>
>>>>   fdouble_unpack_min: ; skipped.
>>>>
>>>>   fdouble_add_flags:  ; return calc flags, save calc result to env.
>>>>
>>>>   fdouble_sub_flags:  ; return calc flags, save calc result to env.
>>>>
>>>>   fdouble_addsub:     ; set "has result" flag.
>>>>
>>>>   fdouble_mul_flags:  ; skip return flags, save calc result to env.
>>>>                         set "has result" flag.
>>>>
>>>>   fdouble_pack1:      ; if "has result" 
>>>>                             reset "has result" flag.
>>>>                             return calc result from env.
>>>>                         else
>>>>                             pack srca and srcb.
>>>>                             reference from tilegx.md: float(uns)sidf2.
>>>>                             get (u)int32_t a, then (u)int32_to_float64.
>>>>
>>>  
>>> For "pack srca and srcb", the related demo like below (srca and srcb
>>> are uint64_t):
>>>
>>>     switch (srcb & 0xffff) {
>>>
>>
>> Oh, sorry, should use 0xfffff instead of 0xffff.
>>
>>>     /* treat it as uint32_t */
>>>     case 0x21b00:
>>>         return uint32_to_float64(srca >> 4, &FP_STATUS);
>>>
>>>     /* treat it as int32_t, must be negative number */
>>>     case 0xa1b00:
>>>         return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS);
>>>
>>>     default:
>>>         unimplemented (gen_exception).
>>>     }
>>>
>>>>   fdouble_pack2:      ; skipped.
>>>>
>>>>
>>>>   (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually,
>>>>    e.g gcc testsuit for complex number).
>>>>
>>>>
>>>> Next, I shall implement the floating point insns, welcome any related
>>>> ideas, suggestions, and completions.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On 8/5/15 22:16, Chen Gang wrote:
>>>>> On 8/4/15 23:04, Richard Henderson wrote:
>>>>>> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>>>>>>
>>>>>>> On 8/4/15 04:47, Chen Gang wrote:
>>>>>>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>>>>>>> but for me, I can not find any details about them (the ISA
>>>>>>>>>> documents only give a summary description, but not details), e.g.
>>>>>>>>>
>>>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>>>>>>> black-box instructions.  You need only really implement one of the
>>>>>>>>> four, with the rest of them being implemented as nops or moves.
>>>>>>>>>
>>>>>>>>> Looking at what gcc produces gives the hints:
>>>>>>>>>
>>>>>>>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>>>>>>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>>>>>>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>>>>>>>
>>>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>>>>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>>>>>>> from "flg" to "dst".
>>>>>>>>>
>>>>>>>>> Similarly for the single-precision:
>>>>>>>>>
>>>>>>>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>>>>>>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>>>>>>>
>>>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>>>>>>
>>>>>>>
>>>>>>> After check the tilegx.md completely, for me, we still need implement
>>>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
>>>>>>
>>>>>> No, you can still implement all of muldf3 in fdouble_mul_flags.
>>>>>> Again, the fdouble_pack1 copies from the flag input to the output.
>>>>>>
>>>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer
>>>>>> should be able to delete all of that as unused.  Especially if you have the
>>>>>> fdouble_unpack* insns store zero into their destinations.
>>>>>>
>>>>>
>>>>> For me, I am not quite sure. But I guess, what you said should be OK (at
>>>>> least, what you said is very useful for the implementation).
>>>>>
>>>>>
>>>>>> Don't get me wrong -- more accurate implementation of the actual
>>>>>> insns would be nice, especially for debugging.  But if the insns
>>>>>> aren't accurately documented I don't see what choice we have.
>>>>>>
>>>>>
>>>>> For me, I guess, we can still try to implement the details.
>>>>>
>>>>>  - The document has all floating point instructions' summary, so we can
>>>>>    think of, or guess its implementation entirely.
>>>>>
>>>>>  - gcc uses them all and completely, so it is our good sample and good
>>>>>    reference (but we should not assume gcc must be correct, since we
>>>>>    just use qemu for gcc testsuite).
>>>>>
>>>>>  - Tilegx floating point format should be standard (at least, reference
>>>>>    to the standard format), so we can reference the related information
>>>>>    from google/baidu.
>>>>>
>>>>>
>>>>>> On the good side, implementing the entire operation as part of the "flags" step
>>>>>> probably results in faster emulation.
>>>>>>
>>>>>
>>>>> I guess so, too.
>>>>>
>>>>>
>>>>> I shall try to finish the simple implementation, firstly. Then try to
>>>>> implement the floating point instructions in details in the future (it
>>>>> should be lower priority).
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>
>>
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-13 14:59                   ` Chen Gang
@ 2015-08-15  9:56                     ` Chen Gang
  2015-08-15 15:47                       ` Richard Henderson
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-15  9:56 UTC (permalink / raw)
  To: Richard Henderson, Chris Metcalf, Peter Maydell,
	Andreas Färber, walt
  Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11272 bytes --]

On 8/13/15 22:59, Chen Gang wrote:
> Hello all:
> 
> For me, I guess for single insns, they are simple, and each calculation
> insns group can not be mixed with each other. So current implementation
> should be OK.
> 
> For double insns, I guess, only mul calculation can be mixed with other
> calculation groups (add/sub groups or int2float/double groups), because
> of optimization -- the mul calculation group have many insns.
> 

Oh, we are unlucky, after continue gcc testsuite, add/sub floating point
insns also can be mixed together! The related C code, -save-temps, and
objdump files are in attachments (is it gcc's issue? I guess not).

So, I guess, we have to 'crack' all floating point insns, precisely, or
we can not pass gcc testsuite.

At present, for me, I shall try to fix another issues which are found by
gcc testsuite, at last 'crack' the floating point insns. I guess, I can
not finish it in this month (I shall try to finish in the next month).


Thanks.

> So the implementation is below:
> 
> /*
>  * Assume floating point mul operation group can mix with other groups.
>  *
>  * fdouble_unpack_max: ; skipped.
>  *  
>  * fdouble_unpack_min: ; skipped.
>  *      
>  * fdouble_add_flags:  ; move calc flags to dest.
>  *                       save calc flags.
>  *                       save calc addsub result.
>  *
>  * fdouble_sub_flags:  ; move calc flags to dest.
>  *                       save calc flags.
>  *                       save calc addsub result.
>  *
>  * fdouble_addsub:     ; move calc addsub result to dest.
>  *                       set "addsub result" flag.
>  *
>  * fdouble_mul_flags:  ; move calc mul result to dest.
>  *
>  * fdouble_pack1:      ; if addsub result set
>  *                         && srca == saved addsub result
>  *                         && srcb == saved calc flags
>  *                           move srca to dest.
>  *                       else 
>  *                           move srcb to dest.
>  *
>  * fdouble_pack2:      ; if srcb == r63 && "addsub result" flag
>  *                           reset "addsub result" flag.
>  *                       else if srcb == r63
>  *                           pack srca dest (dest is orig srcb of pack1)
>  *                           reference from tilegx.md: float(uns)sidf2.
>  *                           get (u)int32_t a, then (u)int32_to_float64.
>  *                       else
>  *                           skipped.
>  */
> 
> 
> On 8/11/15 21:18, Chen Gang wrote:
>>
>> Oh, it seems a little complex, for a testsuite case, it lets double add
>> and double mul together! We need save more information for the correct
>> calculation in pack1.
>>
>> It is 20020314-1.exe, the related code (I guess it is correct):
>>
>>         ...
>>
>>         fdouble_unpack_max      r10, r3, zero
>> .LVL2:
>>         fdouble_unpack_max      r15, r2, zero
>>         fdouble_add_flags       r12, r0, r1
>>         mul_hu_lu       r13, r15, r10
>>         mul_lu_lu       r16, r15, r10
>>         mula_hu_lu      r13, r10, r15
>>         fdouble_unpack_min      r11, r0, r1
>>         {
>>         shli    r14, r13, 32
>>         fdouble_unpack_max      r17, r0, r1
>>         }
>>         {
>>         mul_hu_hu       r15, r15, r10
>>         add     r16, r16, r14
>>         }
>>         {
>>         shrui   r13, r13, 32
>>         fdouble_addsub  r17, r11, r12
>>         }
>>         {
>>         cmpltu  r14, r16, r14
>>         fdouble_mul_flags       r3, r2, r3
>>         }
>> .LVL3:
>>         {
>>         add     r13, r15, r13
>>         fdouble_pack1   r12, r17, r12
>>         }
>>         {
>>         add     r13, r13, r14
>>         fdouble_unpack_max      r10, r0, zero
>>         }
>>         fdouble_pack1   r3, r13, r3
>>         fdouble_pack2   r12, r17, zero
>>         fdouble_pack2   r3, r13, r16
>>
>>         ... 
>>
>> Welcome any additional ideas, suggestions and completions.
>>
>> Thanks.
>>
>> On 8/9/15 09:14, Chen Gang wrote:
>>> On 8/9/15 09:10, Chen Gang wrote:
>>>>
>>>> On 8/9/15 01:23, Chen Gang wrote:
>>>>> Hello all:
>>>>>
>>>>> Below is my current idea for all floating point insns. For me, it is not
>>>>> the precise implementation, even not completely implement -- assume pack
>>>>> insns can only for packing (u)int32_t when they are used individually:
>>>>>
>>>>>   fsingle_add1        ; return calc flags, save calc result to env.
>>>>>
>>>>>   fsingle_sub1        ; return calc flags, save calc result to env.
>>>>>
>>>>>   fsingle_addsub2     ; set "has result" flag.
>>>>>
>>>>>   fsingle_mul1        ; skip return value, save calc result to env.
>>>>>                         set "has result" flag.
>>>>>
>>>>>   fsingle_mul2        ; skipped.
>>>>>
>>>>>
>>>>>   fsingle_pack1       ; skipped.
>>>>>
>>>>>   fsingle_pack1       ; if "has result"
>>>>>                             reset "has result" flag.
>>>>>                             return calc result from env.
>>>>>                         else
>>>>>                             pack srca 
>>>>>                             reference from tilegx.md: float(uns)sisf2.
>>>>>                             get (u)int32_t a, then (u)int32_to_float32.
>>>>
>>>> For "pack srca and srcb", the related demo like below (srca and srcb
>>>> are uint64_t):
>>>>
>>>
>>> Oh, sorry, for "pack srca" (not for "pack srca and srcb")
>>>
>>>>     switch (srca & 0x3ff) {
>>>>
>>>>     /* treat it as uint32_t */
>>>>     case 0x9e:
>>>>         return uint32_to_float32(srca >> 32, &FP_STATUS);
>>>>
>>>>     /* treat it as int32_t, must be negative number */
>>>>     case 0x29e:
>>>>         return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS);
>>>>
>>>>     default:
>>>>         unimplemented (gen_exception).
>>>>     }
>>>>
>>>>>
>>>>>   fdouble_unpack_max: ; skipped.
>>>>>
>>>>>   fdouble_unpack_min: ; skipped.
>>>>>
>>>>>   fdouble_add_flags:  ; return calc flags, save calc result to env.
>>>>>
>>>>>   fdouble_sub_flags:  ; return calc flags, save calc result to env.
>>>>>
>>>>>   fdouble_addsub:     ; set "has result" flag.
>>>>>
>>>>>   fdouble_mul_flags:  ; skip return flags, save calc result to env.
>>>>>                         set "has result" flag.
>>>>>
>>>>>   fdouble_pack1:      ; if "has result" 
>>>>>                             reset "has result" flag.
>>>>>                             return calc result from env.
>>>>>                         else
>>>>>                             pack srca and srcb.
>>>>>                             reference from tilegx.md: float(uns)sidf2.
>>>>>                             get (u)int32_t a, then (u)int32_to_float64.
>>>>>
>>>>  
>>>> For "pack srca and srcb", the related demo like below (srca and srcb
>>>> are uint64_t):
>>>>
>>>>     switch (srcb & 0xffff) {
>>>>
>>>
>>> Oh, sorry, should use 0xfffff instead of 0xffff.
>>>
>>>>     /* treat it as uint32_t */
>>>>     case 0x21b00:
>>>>         return uint32_to_float64(srca >> 4, &FP_STATUS);
>>>>
>>>>     /* treat it as int32_t, must be negative number */
>>>>     case 0xa1b00:
>>>>         return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS);
>>>>
>>>>     default:
>>>>         unimplemented (gen_exception).
>>>>     }
>>>>
>>>>>   fdouble_pack2:      ; skipped.
>>>>>
>>>>>
>>>>>   (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually,
>>>>>    e.g gcc testsuit for complex number).
>>>>>
>>>>>
>>>>> Next, I shall implement the floating point insns, welcome any related
>>>>> ideas, suggestions, and completions.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On 8/5/15 22:16, Chen Gang wrote:
>>>>>> On 8/4/15 23:04, Richard Henderson wrote:
>>>>>>> On 08/04/2015 06:56 AM, Chen Gang wrote:
>>>>>>>>
>>>>>>>> On 8/4/15 04:47, Chen Gang wrote:
>>>>>>>>> On 8/4/15 00:40, Richard Henderson wrote:
>>>>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote:
>>>>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1),
>>>>>>>>>>> but for me, I can not find any details about them (the ISA
>>>>>>>>>>> documents only give a summary description, but not details), e.g.
>>>>>>>>>>
>>>>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple
>>>>>>>>>> black-box instructions.  You need only really implement one of the
>>>>>>>>>> four, with the rest of them being implemented as nops or moves.
>>>>>>>>>>
>>>>>>>>>> Looking at what gcc produces gives the hints:
>>>>>>>>>>
>>>>>>>>>> fdouble_unpack_min	min, srca, srcb fdouble_unpack_max	max, srca,
>>>>>>>>>> srcb fdouble_add_flags	flg, srca, srcb fdouble_addsub		max, min, flg 
>>>>>>>>>> fdouble_pack1		dst, max, flg fdouble_pack2		dst, max, zero
>>>>>>>>>>
>>>>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags
>>>>>>>>>> insn can perform the whole operation, the pack1 insn performs a move
>>>>>>>>>> from "flg" to "dst".
>>>>>>>>>>
>>>>>>>>>> Similarly for the single-precision:
>>>>>>>>>>
>>>>>>>>>> fsingle_add1		tmp, srca, srcb fsingle_addsub2		tmp, srca, srcb 
>>>>>>>>>> fsingle_pack1		flg, tmp fsingle_pack2		dst, tmp, flg
>>>>>>>>>>
>>>>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1
>>>>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst.
>>>>>>>>>>
>>>>>>>>
>>>>>>>> After check the tilegx.md completely, for me, we still need implement
>>>>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3).
>>>>>>>
>>>>>>> No, you can still implement all of muldf3 in fdouble_mul_flags.
>>>>>>> Again, the fdouble_pack1 copies from the flag input to the output.
>>>>>>>
>>>>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer
>>>>>>> should be able to delete all of that as unused.  Especially if you have the
>>>>>>> fdouble_unpack* insns store zero into their destinations.
>>>>>>>
>>>>>>
>>>>>> For me, I am not quite sure. But I guess, what you said should be OK (at
>>>>>> least, what you said is very useful for the implementation).
>>>>>>
>>>>>>
>>>>>>> Don't get me wrong -- more accurate implementation of the actual
>>>>>>> insns would be nice, especially for debugging.  But if the insns
>>>>>>> aren't accurately documented I don't see what choice we have.
>>>>>>>
>>>>>>
>>>>>> For me, I guess, we can still try to implement the details.
>>>>>>
>>>>>>  - The document has all floating point instructions' summary, so we can
>>>>>>    think of, or guess its implementation entirely.
>>>>>>
>>>>>>  - gcc uses them all and completely, so it is our good sample and good
>>>>>>    reference (but we should not assume gcc must be correct, since we
>>>>>>    just use qemu for gcc testsuite).
>>>>>>
>>>>>>  - Tilegx floating point format should be standard (at least, reference
>>>>>>    to the standard format), so we can reference the related information
>>>>>>    from google/baidu.
>>>>>>
>>>>>>
>>>>>>> On the good side, implementing the entire operation as part of the "flags" step
>>>>>>> probably results in faster emulation.
>>>>>>>
>>>>>>
>>>>>> I guess so, too.
>>>>>>
>>>>>>
>>>>>> I shall try to finish the simple implementation, firstly. Then try to
>>>>>> implement the floating point instructions in details in the future (it
>>>>>> should be lower priority).
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>
>>>
>>
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

[-- Attachment #2: floating-point-double-add.tar.gz --]
[-- Type: application/x-gzip, Size: 56430 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-15  9:56                     ` Chen Gang
@ 2015-08-15 15:47                       ` Richard Henderson
  2015-08-15 18:16                         ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2015-08-15 15:47 UTC (permalink / raw)
  To: Chen Gang
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel

On Aug 15, 2015 2:56 AM, Chen Gang <xili_gchen_5257@hotmail.com>
> Oh, we are unlucky, after continue gcc testsuite, add/sub floating point 
> insns also can be mixed together! The related C code, -save-temps, and 
> objdump files are in attachments (is it gcc's issue? I guess not). 
>
> So, I guess, we have to 'crack' all floating point insns, precisely, or 
> we can not pass gcc testsuite. 
>

If you go back to my first message to you on the subject, you'll find that my suggestion was to not split the operation at all, using move for pack1.  Which would nicely handle any such interleaving.

r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-15 15:47                       ` Richard Henderson
@ 2015-08-15 18:16                         ` Chen Gang
  2015-08-16  1:41                           ` Chen Gang
  2015-08-17 17:31                           ` Richard Henderson
  0 siblings, 2 replies; 27+ messages in thread
From: Chen Gang @ 2015-08-15 18:16 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel


On 8/15/15 23:47, Richard Henderson wrote:
> On Aug 15, 2015 2:56 AM, Chen Gang <xili_gchen_5257@hotmail.com>
>> Oh, we are unlucky, after continue gcc testsuite, add/sub floating point 
>> insns also can be mixed together! The related C code, -save-temps, and 
>> objdump files are in attachments (is it gcc's issue? I guess not). 
>>
>> So, I guess, we have to 'crack' all floating point insns, precisely, or 
>> we can not pass gcc testsuite. 
>>
> 
> If you go back to my first message to you on the subject, you'll find that my suggestion was to not split the operation at all, using move for pack1.  Which would nicely handle any such interleaving.
> 

OK, thanks, but for float(uns)sisf2 and float(uns)sidf2, we can not only
simply move.  :-(

But what you said is really quite valuable to me!! we can treat the flag
as a caller saved context, then can let the caller can use callee freely
(in fact, I guess, the real hardware treats it as caller context, too).

 - we have to define the flag format based on the existing format in the
   related docs and tilegx.md (reserve 0-20 and 25-31 bits).

 - We can only use 21-24 for mark addsub, mul, or typecast result. If
   21-24 bits are all zero, it means typecast result. For fsingle: 32-63
   bits is the input integer; for fdouble: srca is the input integer.

 - For addsub and mul result, we use 32-63 bits for an index of resource
   handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1,
   fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free.

But if caller "make mistakes", our implementation can not avoid related
resource leak (but the real hardware can, it also lets caller save all
related resources; when it needs them, it can let caller pass them to).


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-15 18:16                         ` Chen Gang
@ 2015-08-16  1:41                           ` Chen Gang
  2015-08-16  3:59                             ` Chen Gang
  2015-08-17 17:31                           ` Richard Henderson
  1 sibling, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-16  1:41 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel

On 8/16/15 02:16, Chen Gang wrote:
> 
> On 8/15/15 23:47, Richard Henderson wrote:
>> On Aug 15, 2015 2:56 AM, Chen Gang <xili_gchen_5257@hotmail.com>
>>> Oh, we are unlucky, after continue gcc testsuite, add/sub floating point 
>>> insns also can be mixed together! The related C code, -save-temps, and 
>>> objdump files are in attachments (is it gcc's issue? I guess not). 
>>>
>>> So, I guess, we have to 'crack' all floating point insns, precisely, or 
>>> we can not pass gcc testsuite. 
>>>
>>
>> If you go back to my first message to you on the subject, you'll find that my suggestion was to not split the operation at all, using move for pack1.  Which would nicely handle any such interleaving.
>>
> 
> OK, thanks, but for float(uns)sisf2 and float(uns)sidf2, we can not only
> simply move.  :-(
> 
> But what you said is really quite valuable to me!! we can treat the flag
> as a caller saved context, then can let the caller can use callee freely
> (in fact, I guess, the real hardware treats it as caller context, too).
> 
>  - we have to define the flag format based on the existing format in the
>    related docs and tilegx.md (reserve 0-20 and 25-31 bits).
> 
>  - We can only use 21-24 for mark addsub, mul, or typecast result. If
>    21-24 bits are all zero, it means typecast result. For fsingle: 32-63
>    bits is the input integer; for fdouble: srca is the input integer.
> 
>  - For addsub and mul result, we use 32-63 bits for an index of resource
>    handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1,
>    fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free.
> 
> But if caller "make mistakes", our implementation can not avoid related
> resource leak (but the real hardware can, it also lets caller save all
> related resources; when it needs them, it can let caller pass them to).
> 

If we assume that the optimization for the floating point insns can not
cross the basic blocks (I guess so), we can reset all related resources
when start a basic block.


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-16  1:41                           ` Chen Gang
@ 2015-08-16  3:59                             ` Chen Gang
  0 siblings, 0 replies; 27+ messages in thread
From: Chen Gang @ 2015-08-16  3:59 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel

On 8/16/15 09:41, Chen Gang wrote:
> On 8/16/15 02:16, Chen Gang wrote:
>>
>> On 8/15/15 23:47, Richard Henderson wrote:
>>> On Aug 15, 2015 2:56 AM, Chen Gang <xili_gchen_5257@hotmail.com>
>>>> Oh, we are unlucky, after continue gcc testsuite, add/sub floating point 
>>>> insns also can be mixed together! The related C code, -save-temps, and 
>>>> objdump files are in attachments (is it gcc's issue? I guess not). 
>>>>
>>>> So, I guess, we have to 'crack' all floating point insns, precisely, or 
>>>> we can not pass gcc testsuite. 
>>>>
>>>
>>> If you go back to my first message to you on the subject, you'll find that my suggestion was to not split the operation at all, using move for pack1.  Which would nicely handle any such interleaving.
>>>
>>
>> OK, thanks, but for float(uns)sisf2 and float(uns)sidf2, we can not only
>> simply move.  :-(
>>
>> But what you said is really quite valuable to me!! we can treat the flag
>> as a caller saved context, then can let the caller can use callee freely
>> (in fact, I guess, the real hardware treats it as caller context, too).
>>
>>  - we have to define the flag format based on the existing format in the
>>    related docs and tilegx.md (reserve 0-20 and 25-31 bits).
>>
>>  - We can only use 21-24 for mark addsub, mul, or typecast result. If
>>    21-24 bits are all zero, it means typecast result. For fsingle: 32-63
>>    bits is the input integer; for fdouble: srca is the input integer.
>>
>>  - For addsub and mul result, we use 32-63 bits for an index of resource
>>    handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1,
>>    fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free.
>>
>> But if caller "make mistakes", our implementation can not avoid related
>> resource leak (but the real hardware can, it also lets caller save all
>> related resources; when it needs them, it can let caller pass them to).
>>
> 
> If we assume that the optimization for the floating point insns can not
> cross the basic blocks (I guess so), we can reset all related resources
> when start a basic block.
> 

Oh, sorry, even qemu itself, my split a basic block into 2 basic blocks,
when the basic block is too big. And we also have to assume a same value
may call fdouble_pack1 individually with multiple times.

So for the resource management, we can do like this:

 - For fsingle, it can be saved in 32-63 bits of caller context (it is
   float32 which is 32-bit).

 - For fdouble, we can allocate a 'bit' buffer for it (e.g. 8KB), when
   the saved values count overflow 1K, let it roundup to 0 again -- of
   cause, the old 1Kth value should be already useless.

I guess, in this way, we can emulate the tilgex floating points insns!!
:-)


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-15 18:16                         ` Chen Gang
  2015-08-16  1:41                           ` Chen Gang
@ 2015-08-17 17:31                           ` Richard Henderson
  2015-08-17 21:09                             ` Chen Gang
  2015-10-25 15:38                             ` Chen Gang
  1 sibling, 2 replies; 27+ messages in thread
From: Richard Henderson @ 2015-08-17 17:31 UTC (permalink / raw)
  To: Chen Gang
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel

On 08/15/2015 11:16 AM, Chen Gang wrote:
> OK, thanks, but for float(uns)sisf2 and float(uns)sidf2, we can not only
> simply move.  :-(

Oh yes, I see that now.  Unfortunate.

> But what you said is really quite valuable to me!! we can treat the flag
> as a caller saved context, then can let the caller can use callee freely
> (in fact, I guess, the real hardware treats it as caller context, too).
> 
>  - we have to define the flag format based on the existing format in the
>    related docs and tilegx.md (reserve 0-20 and 25-31 bits).
> 
>  - We can only use 21-24 for mark addsub, mul, or typecast result. If
>    21-24 bits are all zero, it means typecast result. For fsingle: 32-63
>    bits is the input integer; for fdouble: srca is the input integer.

Plausible.

> 
>  - For addsub and mul result, we use 32-63 bits for an index of resource
>    handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1,
>    fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free.

No, that's a bad idea.  No state external to the inputs to the insns.

It really would be nice if we had the same documentation that was used
to implement the gcc backend.  Otherwise we have to rely on guesswork.

For single-precision it appears that the format is

  63                                      31          24   10  9     0
  [ mantissa with implicit and guard bits | cmp flags | ?? | s | exp ]

We are able to deduce the bias for the exponent based on the input gcc gives us
for floatunssisf: 0x9e == 2**31 when the mantissa is normalized.

So:

  fsingle_add1, fsingle_sub1: Perform the operation.  Split the result
  such that all of the fields above are filled in.

  fsingle_mul1: Perform the operation.  Split the result such that all
  of the fields above except for cmp-flags are filled in.

  fsingle_addsub2: Nop.
  fsingle_mul2: Move srca to dest.

  fsingle_pack1: Normalize and repack the above.  In the add/sub/mul case,
  no normalization will be required, so no change to the result occurs.

  In the floatunssisf2 case, the input implicit bit may not be set, and
  guard bits may be set, so real rounding and normalization must occur,
  adjusting the exponent constructed by gcc in building the flags.

For double-precision things are more complicated.  Precisely because there is
no dedicated fdouble_mul[1-4] instructions, but instead gcc is to use a normal
128-bit integer multiplication on the mantissa.

For double-precision it appears that the format is

         63               57                           4            0
  unpack [ overflow bits? | mantissa with implicit bit | guard bits ]

         63   31          24   20  19    8    0
  flags  [ ?? | cmp flags | ?? | s | exp | ?? ]

Similarly we can compute the bias for exp as 0x21b == 2**53.
Or is it 20 bits of exponent and 0x21b00 == 2**53?

So:

  fdouble_unpack_max, fdouble_unpack_min: Perform the operation as described,
  extracting the mantissa of the min/max absolute value.

  fdouble_add_flags, fdouble_sub_flags: Extract the signs and exponent of the
  sources, and compute the sign and exponent of the result.  Set a bit,
  presumably one of [24:21] that tell fdouble_addsub whether to perform
  addition or subtraction.  Set the comparison flags.

  fdouble_mul_flags: Extract the signs and exponent of the sources, and compute
  the sign and exponent of the result.  Note that the result of the 128-bit
  multiplication is guaranteed to be non-normalized : the 2 57-bit inputs will
  produce a 114-bit intermediate result.  Which means that bits [63:51] are
  guaranteed to be zero on entry to the pack stages.  Which means that some
  bias will need to be applied to the intermediate exponent.

  fdouble_addsub: Add or subtract the mantissas based on a bit in flags.

  fdouble_pack1: Move flags (srcb) to result (dest).
  fdouble_pack2: Take the 128-bit mantissa of srca+srcb, the flags of dest,
  and normalize and pack the result.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-17 17:31                           ` Richard Henderson
@ 2015-08-17 21:09                             ` Chen Gang
  2015-08-17 21:43                               ` Richard Henderson
  2015-10-25 15:38                             ` Chen Gang
  1 sibling, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-17 21:09 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel

On 8/18/15 01:31, Richard Henderson wrote:
> On 08/15/2015 11:16 AM, Chen Gang wrote:
> 
>> But what you said is really quite valuable to me!! we can treat the flag
>> as a caller saved context, then can let the caller can use callee freely
>> (in fact, I guess, the real hardware treats it as caller context, too).
>>
>>  - we have to define the flag format based on the existing format in the
>>    related docs and tilegx.md (reserve 0-20 and 25-31 bits).
>>
>>  - We can only use 21-24 for mark addsub, mul, or typecast result. If
>>    21-24 bits are all zero, it means typecast result. For fsingle: 32-63
>>    bits is the input integer; for fdouble: srca is the input integer.
> 
> Plausible.
> 
>>
>>  - For addsub and mul result, we use 32-63 bits for an index of resource
>>    handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1,
>>    fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free.
> 
> No, that's a bad idea.  No state external to the inputs to the insns.
> 

We can use 21-24 bits for the state external to the inputs to the insns.
My idea is below:

/*
 * Single floaing point instructions decription.
 *
 *  - fsingle_add1, fsingle_sub1, and fsingle_pack1/2 can be used individually.
 *
 *  - when fsingle_pack1/2 is used individually, it is for type cast.
 *
 *  - the old 4Kth result is alrealy useless for caller.
 *
 * fsingle_add1        ; make context and calc result from rsrca and rsrcb.
 *                     ; save result in roundup array, and add index to context.
 *                     ; move context to rdst.
 *
 * fsingle_sub1        ; make context and calc result from rsrca and rsrcb.
 *                     ; save result in roundup array, and add index to context.
 *                     ; move context to rdst.
 *
 * fsingle_addsub2     ; skipped.
 *
 * fsingle_mul1        ; make context and calc result from rsrca and srcb.
 *                     ; save result in roundup array, and add index to context.
 *                     ; move context to rdst.
 *
 * fsingle_mul2        ; move rsrca to rdst.
 *
 * fsingle_pack1       ; skipped.
 *
 * fsingle_pack2       ; get context from rsrca (rsrca is context).
 *                     ; if context for add/sub/mul
 *                     ;     get result from roundup array based on index.
 *                     ;     move result to rdst.
 *                     ; else
 *                     ;     get (u)int32_t interger from context,
 *                     ;     (u)int32_to_float32.
 */

/*
 * Double floating point instructions' description.
 *
 *  - fdouble_add_flags, fdouble_sub_flags, and fdouble_pack1/2 can be used
 *    individually.
 *
 *  - when fdouble_pack1/2 is used individually, it is for type cast.
 *
 *  - the old 4Kth result is alrealy useless for caller.
 *
 * fdouble_unpack_max: ; skipped.
 *
 * fdouble_unpack_min: ; skipped.
 *
 * fdouble_add_flags:  ; make context and calc result from rsrca and rsrcb.
 *                     ; save result in roundup array, and add index to context.
 *                     ; move context to rdst.
 *
 * fdouble_sub_flags:  ; make context and calc result from rsrca and rsrcb.
 *                     ; save result in roundup array, and add index to context.
 *                     ; move context to rdst.
 *
 * fdouble_addsub:     ; skipped.
 *
 * fdouble_mul_flags:  ; make context and calc result from rsrca and rsrcb.
 *                     ; save result in roundup array, and add index to context.
 *                     ; move context to rdst.
 *
 * fdouble_pack1:      ; get context from rsrcb.
 *                     ; if context for add/sub/mul
 *                     ;     get result from roundup array based on index.
 *                     ;     move result to rdst.
 *                     ; else
 *                     ;     get (u)int32_t interger from rsrca
 *                     ;     (u)int32_to_float64.
 *
 * fdouble_pack2:      ; skipped.
 */

#define TILEGX_F_COUNT 0x1000  /* Maximized results count for fdouble */

#define TILEGX_F_DUINT 0x21b00 /* exp is for uint32_t to double */
#define TILEGX_F_DINT  0xa1b00 /* exp is for int32_t to double */
#define TILEGX_F_SUINT 0x9e    /* exp is for uint32_t to single */
#define TILEGX_F_SINT  0x29e   /* exp is for int32_t to single */

#define TILEGX_F_TCAST 0       /* Result type is for typecast, MUST BE 0 */
#define TILEGX_F_TCALC 1       /* Result type is for add/sub/mul */

#pragma pack(push, 1)
typedef struct TileGXFPCtx {

    /* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */
    uint64_t exp : 20;         /* Exponent, for TILEGX_F_(D/S)(U)INT */

    /* Context type, defined and used by callee */
    uint64_t type : 5;         /* For TILEGX_F_T(CAST/CALC) */

    /* Come from TILE-Gx ISA document, Table 7-2 for floating point */
    uint64_t unordered : 1;    /* The two are unordered */
    uint64_t lt : 1;           /* 1st is less than 2nd */
    uint64_t le : 1;           /* 1st is less than or equal to 2nd */
    uint64_t gt : 1;           /* 1st is greater than 2nd */
    uint64_t ge : 1;           /* 1st is greater than or equal to 2nd */
    uint64_t eq : 1;           /* The two operands are equal */
    uint64_t neq : 1;          /* The two operands are not equal */

    /* Result data according to the context type */
    uint64_t data : 32;        /* The explanation is below */
#if 0
    /* This is the explanation for 'data' above */
    union {
        uint32_t idx;          /* Index for the add/sub/mul result */
        uint32_t aint;         /* Absolute input integer for fsingle typecast */
        /*
         * There is no input integer for fdouble typecast in context, it is in
         * rsrca parameter of fdouble_pack1 instruction.
         */
    };
#endif
} TileGXFPCtx;
#pragma pack(pop)

typedef struct FPUTLGState {
    float_status fp_status;         /* floating point status */
    int pos32;                      /* Current position for fsingle result */
    int pos64;                      /* Current position for fdouble result */
    float32 val32s[TILEGX_F_COUNT]; /* results roudup array for fsingle */
    float64 val64s[TILEGX_F_COUNT]; /* results roudup array for fdouble */
} FPUTLGState;


>
> It really would be nice if we had the same documentation that was used
> to implement the gcc backend.  Otherwise we have to rely on guesswork.
> 
> For single-precision it appears that the format is
> 
>   63                                      31          24   10  9     0
>   [ mantissa with implicit and guard bits | cmp flags | ?? | s | exp ]
> 
> We are able to deduce the bias for the exponent based on the input gcc gives us
> for floatunssisf: 0x9e == 2**31 when the mantissa is normalized.
> 
> So:
> 
>   fsingle_add1, fsingle_sub1: Perform the operation.  Split the result
>   such that all of the fields above are filled in.
> 
>   fsingle_mul1: Perform the operation.  Split the result such that all
>   of the fields above except for cmp-flags are filled in.
> 
>   fsingle_addsub2: Nop.
>   fsingle_mul2: Move srca to dest.
> 
>   fsingle_pack1: Normalize and repack the above.  In the add/sub/mul case,
>   no normalization will be required, so no change to the result occurs.
> 
>   In the floatunssisf2 case, the input implicit bit may not be set, and
>   guard bits may be set, so real rounding and normalization must occur,
>   adjusting the exponent constructed by gcc in building the flags.
> 
> For double-precision things are more complicated.  Precisely because there is
> no dedicated fdouble_mul[1-4] instructions, but instead gcc is to use a normal
> 128-bit integer multiplication on the mantissa.
> 
> For double-precision it appears that the format is
> 
>          63               57                           4            0
>   unpack [ overflow bits? | mantissa with implicit bit | guard bits ]
> 
>          63   31          24   20  19    8    0
>   flags  [ ?? | cmp flags | ?? | s | exp | ?? ]
> 
> Similarly we can compute the bias for exp as 0x21b == 2**53.
> Or is it 20 bits of exponent and 0x21b00 == 2**53?
> 
> So:
> 
>   fdouble_unpack_max, fdouble_unpack_min: Perform the operation as described,
>   extracting the mantissa of the min/max absolute value.
> 
>   fdouble_add_flags, fdouble_sub_flags: Extract the signs and exponent of the
>   sources, and compute the sign and exponent of the result.  Set a bit,
>   presumably one of [24:21] that tell fdouble_addsub whether to perform
>   addition or subtraction.  Set the comparison flags.
> 
>   fdouble_mul_flags: Extract the signs and exponent of the sources, and compute
>   the sign and exponent of the result.  Note that the result of the 128-bit
>   multiplication is guaranteed to be non-normalized : the 2 57-bit inputs will
>   produce a 114-bit intermediate result.  Which means that bits [63:51] are
>   guaranteed to be zero on entry to the pack stages.  Which means that some
>   bias will need to be applied to the intermediate exponent.
> 
>   fdouble_addsub: Add or subtract the mantissas based on a bit in flags.
> 
>   fdouble_pack1: Move flags (srcb) to result (dest).
>   fdouble_pack2: Take the 128-bit mantissa of srca+srcb, the flags of dest,
>   and normalize and pack the result.
> 

OK, thanks, what you said above sounds reasonable. It is more precise
than my current implementation (but it is also a little more complex).

For me, if my current implementation can not pass gcc testsuite (I guess
not), I shall try to implement what you said above, next.



Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-17 21:09                             ` Chen Gang
@ 2015-08-17 21:43                               ` Richard Henderson
  2015-08-18 14:27                                 ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2015-08-17 21:43 UTC (permalink / raw)
  To: Chen Gang
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel

On 08/17/2015 02:09 PM, Chen Gang wrote:
> On 8/18/15 01:31, Richard Henderson wrote:
>> On 08/15/2015 11:16 AM, Chen Gang wrote:
>>
>>> But what you said is really quite valuable to me!! we can treat the flag
>>> as a caller saved context, then can let the caller can use callee freely
>>> (in fact, I guess, the real hardware treats it as caller context, too).
>>>
>>>   - we have to define the flag format based on the existing format in the
>>>     related docs and tilegx.md (reserve 0-20 and 25-31 bits).
>>>
>>>   - We can only use 21-24 for mark addsub, mul, or typecast result. If
>>>     21-24 bits are all zero, it means typecast result. For fsingle: 32-63
>>>     bits is the input integer; for fdouble: srca is the input integer.
>>
>> Plausible.
>>
>>>
>>>   - For addsub and mul result, we use 32-63 bits for an index of resource
>>>     handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1,
>>>     fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free.
>>
>> No, that's a bad idea.  No state external to the inputs to the insns.

...

>      float32 val32s[TILEGX_F_COUNT]; /* results roudup array for fsingle */
>      float64 val64s[TILEGX_F_COUNT]; /* results roudup array for fdouble */

I repeat: This is an extremely bad idea.
I will certainly not sign off on any patch that includes this.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-17 21:43                               ` Richard Henderson
@ 2015-08-18 14:27                                 ` Chen Gang
  2015-08-18 14:32                                   ` Peter Maydell
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-18 14:27 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel

On 8/18/15 05:43, Richard Henderson wrote:
> 
> I repeat: This is an extremely bad idea.
> I will certainly not sign off on any patch that includes this.
> 

OK. At least it is not the professional implementation. But for me, it
can always get the correct result:

 - It 'guess' the 'flag' format instead of 'guess' whole data format,
   so this is an easier way (although it wastes resources, and it is
   UN-professional).

 - The 'flag' has 'room' for callee to save its information, so we can
   use the 'room' for recognizing each different 'flag' type (so we can
   treat 'flag' as callee's context).

 - The caller must keep the 'flag' consistent for all floating point
   insns. So if caller finds new using way (it is almost impossible),
   our implementation can still adapt the new using way, automatically.

If it is really can always get the correct result (else I have to
implement with your suggested way, right now):

 - I can use it for gcc testsuite (now, it fixed 100+ gcc testsuite
   failures -- I guess, all floating point tests are passed test).

 - And within this month, I shall try to finish gcc testsuite with qemu
   in my free time (still 700- issues left), and also make patches for
   kernel mm for learning (may be helpful for softmmu linux user).

 - Next month (after finish gcc testsuite with tilegx qemu), I shall
   implement the floating point in the preciser ways (you suggested),
   instead of current temporary implementation.

By the way, I have delay quite much for Linux kernel and gcc in my free
time:

 - Originally, I want to cross compile kernel under another archs in my
   free time within this month, but until now, I do nothing for it (but
   next, I shall try some mm patches instead of, hope I can succeed).

 - Originally, I want to fix one gcc bug under bugzilla, but until now,
   I do nothing for it (it is really delayed quite too much).


Welcome any ideas, suggestions and completions.


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-18 14:27                                 ` Chen Gang
@ 2015-08-18 14:32                                   ` Peter Maydell
  2015-08-18 21:29                                     ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2015-08-18 14:32 UTC (permalink / raw)
  To: Chen Gang
  Cc: qemu-devel, walt, Chris Metcalf, Andreas Färber, Richard Henderson

On 18 August 2015 at 15:27, Chen Gang <xili_gchen_5257@hotmail.com> wrote:
> Welcome any ideas, suggestions and completions.

You should stop working on adding new features and instructions,
and concentrate on getting a coherent set of patches for some
subset of the instruction set reviewed and into QEMU.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-18 14:32                                   ` Peter Maydell
@ 2015-08-18 21:29                                     ` Chen Gang
  2015-08-18 22:15                                       ` Peter Maydell
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-08-18 21:29 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel, walt, Chris Metcalf, Andreas Färber, Richard Henderson

On 8/18/15 22:32, Peter Maydell wrote:
> On 18 August 2015 at 15:27, Chen Gang <xili_gchen_5257@hotmail.com> wrote:
>> Welcome any ideas, suggestions and completions.
> 
> You should stop working on adding new features and instructions,
> and concentrate on getting a coherent set of patches for some
> subset of the instruction set reviewed and into QEMU.
> 

OK, thanks. It sounds good. But I guess, it is not executable:

 - I have already send a set of patches, but they are not integrated
   into qemu (or not reviewed).

 - I have to continue, although they are not integrated into (it means
   I have to add new features and instructions, at present).

 - For me, when tilegx qemu pass gcc testsuite, and finish floating
   point insns in the preciser way, I guess, that is a reasonable point
   to send new patches to qemu.


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-18 21:29                                     ` Chen Gang
@ 2015-08-18 22:15                                       ` Peter Maydell
  2015-08-18 22:24                                         ` Chen Gang
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2015-08-18 22:15 UTC (permalink / raw)
  To: Chen Gang
  Cc: qemu-devel, walt, Chris Metcalf, Andreas Färber, Richard Henderson

On 18 August 2015 at 22:29, Chen Gang <xili_gchen_5257@hotmail.com> wrote:
> On 8/18/15 22:32, Peter Maydell wrote:
>> On 18 August 2015 at 15:27, Chen Gang <xili_gchen_5257@hotmail.com> wrote:
>>> Welcome any ideas, suggestions and completions.
>>
>> You should stop working on adding new features and instructions,
>> and concentrate on getting a coherent set of patches for some
>> subset of the instruction set reviewed and into QEMU.
>>
>
> OK, thanks. It sounds good. But I guess, it is not executable:
>
>  - I have already send a set of patches, but they are not integrated
>    into qemu (or not reviewed).

You need to concentrate on getting these in. That means:
 * check whether there are outstanding review comments
   (or trivial bugs you found yourself in the instructions
   covered by these parts) -- if so, then respin the patchset
   and resend it
 * 'ping' the patch series to remind people to review it

(Specifically, IIRC, RTH needs to review the codegen bits of
the integer patches.)

>  - I have to continue, although they are not integrated into (it means
>    I have to add new features and instructions, at present).

If you do this then you are drawing the attention and time
of reviewers away from the patches which are nearly ready
to go into QEMU and towards the new stuff you post. This
means that the older patchsets are less likely to move forward.

>  - For me, when tilegx qemu pass gcc testsuite, and finish floating
>    point insns in the preciser way, I guess, that is a reasonable point
>    to send new patches to qemu.

This will result in a huge patchset which is very hard to
review (and which is likely to get requests from me to
split it up and send a smaller subset which is reviewable).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-18 22:15                                       ` Peter Maydell
@ 2015-08-18 22:24                                         ` Chen Gang
  0 siblings, 0 replies; 27+ messages in thread
From: Chen Gang @ 2015-08-18 22:24 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel, walt, Chris Metcalf, Andreas Färber, Richard Henderson


OK, thanks. What you said sounds reasonable to me. I shall try to send
patches to qemu, firstly.

:-)

On 8/19/15 06:15, Peter Maydell wrote:
> On 18 August 2015 at 22:29, Chen Gang <xili_gchen_5257@hotmail.com> wrote:
>> On 8/18/15 22:32, Peter Maydell wrote:
>>> On 18 August 2015 at 15:27, Chen Gang <xili_gchen_5257@hotmail.com> wrote:
>>>> Welcome any ideas, suggestions and completions.
>>>
>>> You should stop working on adding new features and instructions,
>>> and concentrate on getting a coherent set of patches for some
>>> subset of the instruction set reviewed and into QEMU.
>>>
>>
>> OK, thanks. It sounds good. But I guess, it is not executable:
>>
>>  - I have already send a set of patches, but they are not integrated
>>    into qemu (or not reviewed).
> 
> You need to concentrate on getting these in. That means:
>  * check whether there are outstanding review comments
>    (or trivial bugs you found yourself in the instructions
>    covered by these parts) -- if so, then respin the patchset
>    and resend it
>  * 'ping' the patch series to remind people to review it
> 
> (Specifically, IIRC, RTH needs to review the codegen bits of
> the integer patches.)
> 
>>  - I have to continue, although they are not integrated into (it means
>>    I have to add new features and instructions, at present).
> 
> If you do this then you are drawing the attention and time
> of reviewers away from the patches which are nearly ready
> to go into QEMU and towards the new stuff you post. This
> means that the older patchsets are less likely to move forward.
> 
>>  - For me, when tilegx qemu pass gcc testsuite, and finish floating
>>    point insns in the preciser way, I guess, that is a reasonable point
>>    to send new patches to qemu.
> 
> This will result in a huge patchset which is very hard to
> review (and which is likely to get requests from me to
> split it up and send a smaller subset which is reviewable).
> 
> thanks
> -- PMM
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-08-17 17:31                           ` Richard Henderson
  2015-08-17 21:09                             ` Chen Gang
@ 2015-10-25 15:38                             ` Chen Gang
  2015-10-26 14:14                               ` Chen Gang
  1 sibling, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-10-25 15:38 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel

On 8/18/15 01:31, Richard Henderson wrote:
> 
> For single-precision it appears that the format is
> 
>   63                                      31          24   10  9     0
>   [ mantissa with implicit and guard bits | cmp flags | ?? | s | exp ]
> 
> We are able to deduce the bias for the exponent based on the input gcc gives us
> for floatunssisf: 0x9e == 2**31 when the mantissa is normalized.
>

For me, 0x9e == 2**30:

 - For int32_t: 31-bits for values, highest bit for sign. Mantissa can
   'move' 30 bits to express 31-bits values, and 31st bit will never
   'move'.

 - 0x9e - 0x1e(30) = 0x80, which is the smallest 8-bits number, in our
   case, 0x80 means: "do not 'move' mantissa bits".

 - According to IEEE standard, the exp of float type is 8-bits, so 0x80
   is really the smallest value of exp.

/*
 * Single exp analyzing: 0x9e - 0x1e(30) = 0x80
 *
 *   7   6   5   4   3   2   1   0
 *
 *   1   0   0   1   1   1   1   0
 *
 *   0   0   0   1   1   1   1   0    => 0x1e(30)
 *
 *   1   0   0   0   0   0   0   0    => 0x80
 */

[...]
 
> 
> For double-precision things are more complicated.  Precisely because there is
> no dedicated fdouble_mul[1-4] instructions, but instead gcc is to use a normal
> 128-bit integer multiplication on the mantissa.
> 
> For double-precision it appears that the format is
> 
>          63               57                           4            0
>   unpack [ overflow bits? | mantissa with implicit bit | guard bits ]
> 
>          63   31          24   20  19    8    0
>   flags  [ ?? | cmp flags | ?? | s | exp | ?? ]
> 
> Similarly we can compute the bias for exp as 0x21b == 2**53.
> Or is it 20 bits of exponent and 0x21b00 == 2**53?
> 

For me, exp is (0x21b << 1):

 - According to IEEE standard for double, it's exp is 11 bits, the
   smallest exp is 0x400.

 - So for 0x21b00, we can only consider about bits between 0 and 17 (the
   highest bit must be 1).

 - And it is 11 bits, so the real value of 0x21b00 is (0x21b << 1). Then
   we know it 'move' 0x36(54) bits to get integer, then we know fdouble
   mantissa have 55-bits internally.

/*
 * Double exp analyzing: (0x21b << 1) - 0x36(54) = 0x400
 *
 *   17  16  15  14  13  12  11  10   9   8   7    6   5   4   3   2   1   0
 *
 *    1   0   0   0   0   1   1   0   1   1   0    0   0   0   0   0   0   0
 *
 *    0   0   0   0   0   1   1   0   1   1   0    => 0x36(54)
 *
 *    1   0   0   0   0   0   0   0   0   0   0    => 0x400
 *
 */

So for me, the format in C header file is:

#pragma pack(push, 1)

/*
 * Single format, it is 64-bit.
 */
typedef struct TileGXFPSFmt {

    /* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */
    uint64_t exp : 8;             /* exp, 0x9e: 30 + TILEGX_F_EXP_FZERO */
    uint64_t unknown0 : 1;        /* unknown */
    uint64_t sign : 1;            /* Sign bit for the total value */
    uint64_t unknown1 : 15;       /* unknown */

    /* Come from TILE-Gx ISA document, Table 7-2 for floating point */
    uint64_t unordered : 1;       /* The two are unordered */
    uint64_t lt : 1;              /* 1st is less than 2nd */
    uint64_t le : 1;              /* 1st is less than or equal to 2nd */
    uint64_t gt : 1;              /* 1st is greater than 2nd */
    uint64_t ge : 1;              /* 1st is greater than or equal to 2nd */
    uint64_t eq : 1;              /* The two operands are equal */
    uint64_t neq : 1;             /* The two operands are not equal */

    /* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */
    uint64_t mantissa : 31;       /* mantissa */
    uint64_t unknown2 : 1;        /* unknown */
} TileGXFPSFmt;

/*
 * Dobule format, flag, 64-bit.
 */
typedef struct TileGXFPDFmtF {

    uint64_t unknown0 : 7;        /* unknown */
    uint64_t exp : 11;            /* exp, 0x21b << 1: 54 + TILEGX_F_EXP_DZERO */
    uint64_t unknown1 : 2;        /* unknown */
    uint64_t sign : 1;            /* Sign bit for the total value */
    uint64_t unknown2: 4;         /* unknown */

    /* Come from TILE-Gx ISA document, Table 7-2 for floating point */
    uint64_t unordered : 1;       /* The two are unordered */
    uint64_t lt : 1;              /* 1st is less than 2nd */
    uint64_t le : 1;              /* 1st is less than or equal to 2nd */
    uint64_t gt : 1;              /* 1st is greater than 2nd */
    uint64_t ge : 1;              /* 1st is greater than or equal to 2nd */
    uint64_t eq : 1;              /* The two operands are equal */
    uint64_t neq : 1;             /* The two operands are not equal */

    uint64_t unknown3 : 32;       /* unknown */
} TileGXFPDFmtF;

/*
 * Dobule format, value, 64-bit.
 */
typedef struct TileGXFPDFmtV {
    uint64_t unknown0 : 4;        /* unknown */
    uint64_t mantissa : 55;       /* mantissa */
    uint64_t unknown1 : 5;        /* unknown */
} TileGXFPDFmtV;

#pragma pack(pop)


Welcome any ideas, suggestions, and completions from any members.

Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
  2015-10-25 15:38                             ` Chen Gang
@ 2015-10-26 14:14                               ` Chen Gang
       [not found]                                 ` <5630EF69.90906@hotmail.com>
  0 siblings, 1 reply; 27+ messages in thread
From: Chen Gang @ 2015-10-26 14:14 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel


I guess, for sign number, the highest bit will not be used, but for
unsigned number, the highest bit will be used (then can let sign and
unsigned number can use the same format contents).

On 10/25/15 23:38, Chen Gang wrote:
> 
> /*
>  * Single format, it is 64-bit.
>  */
> typedef struct TileGXFPSFmt {
> 
>     /* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */
>     uint64_t exp : 8;             /* exp, 0x9e: 30 + TILEGX_F_EXP_FZERO */
>     uint64_t unknown0 : 1;        /* unknown */
>     uint64_t sign : 1;            /* Sign bit for the total value */
>     uint64_t unknown1 : 15;       /* unknown */
> 
>     /* Come from TILE-Gx ISA document, Table 7-2 for floating point */
>     uint64_t unordered : 1;       /* The two are unordered */
>     uint64_t lt : 1;              /* 1st is less than 2nd */
>     uint64_t le : 1;              /* 1st is less than or equal to 2nd */
>     uint64_t gt : 1;              /* 1st is greater than 2nd */
>     uint64_t ge : 1;              /* 1st is greater than or equal to 2nd */
>     uint64_t eq : 1;              /* The two operands are equal */
>     uint64_t neq : 1;             /* The two operands are not equal */
> 
>     /* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */
>     uint64_t mantissa : 31;       /* mantissa */
>     uint64_t unknown2 : 1;        /* unknown */

It is not unknown2, it means 0x80000000 when sign is 0, and meaningless
when sign is 1.


> } TileGXFPSFmt;
>

[...]
 
> 
> /*
>  * Dobule format, value, 64-bit.
>  */
> typedef struct TileGXFPDFmtV {
>     uint64_t unknown0 : 4;        /* unknown */
>     uint64_t mantissa : 55;       /* mantissa */
>     uint64_t unknown1 : 5;        /* unknown */

I guess, unknow1 is 4 bits, and the other 1 bit is for (1ULL << 55) when
sign is 0, and meaningless when sign is 1.


> } TileGXFPDFmtV;
> 
> #pragma pack(pop)
> 
> 


Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
       [not found]                                 ` <5630EF69.90906@hotmail.com>
@ 2015-10-28 15:53                                   ` Chen Gang
  0 siblings, 0 replies; 27+ messages in thread
From: Chen Gang @ 2015-10-28 15:53 UTC (permalink / raw)
  To: rth; +Cc: Peter Maydell, walt, Chris Metcalf, Andreas Färber, qemu-devel


Oh, sorry, After reference another documents (about sw_64 arch floating
point introduction), I know, 0x7f is fsingle exp '0' (no move), 0x3ff is
fdouble exp '0' (no move).

And for fdouble, we still can calculate the real value base on qemu soft
fpu, so can simplify many details. So for me, the related completely
information are below:

/*
 * From IEEE standard, exp of float is 8-bits, exp of double is 11-bits.
 */
#define TILEGX_F_EXP_FZERO  0x7f  /* Zero exp for single 8-bits */
#define TILEGX_F_EXP_DZERO  0x3ff /* Zero exp for double 11-bits */

/*
 * For fdouble addsub bit
 */
#define TILEGX_F_ADDSUB_ADD 0     /* Perform absolute add operation */
#define TILEGX_F_ADDSUB_SUB 1     /* Perform absolute sub operation */

#pragma pack(push, 1)

/*
 * Single format, it is 64-bit.
 *
 * Single exp analyzing: 0x9e - 0x1e(30) = 0x80
 *
 *   7   6   5   4   3   2   1   0
 *
 *   1   0   0   1   1   1   1   0
 *
 *   0   0   0   1   1   1   1   1    => 0x1f(31)
 *
 *   0   1   1   1   1   1   1   1    => 0x7f
 */
typedef struct TileGXFPSFmt {

    /* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */
    uint64_t exp : 8;             /* exp, 0x9e: 31 + TILEGX_F_EXP_FZERO */
    uint64_t unknown0 : 1;        /* unknown */
    uint64_t sign : 1;            /* Sign bit for the total value */
    uint64_t unknown1 : 15;       /* unknown */

    /* Come from TILE-Gx ISA document, Table 7-2 for floating point */
    uint64_t unordered : 1;       /* The two are unordered */
    uint64_t lt : 1;              /* 1st is less than 2nd */
    uint64_t le : 1;              /* 1st is less than or equal to 2nd */
    uint64_t gt : 1;              /* 1st is greater than 2nd */
    uint64_t ge : 1;              /* 1st is greater than or equal to 2nd */
    uint64_t eq : 1;              /* The two operands are equal */
    uint64_t neq : 1;             /* The two operands are not equal */

    /* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */
    uint64_t mantissa : 32;       /* mantissa */
} TileGXFPSFmt;
/*
 * FSingle instructions implemenation:
 *
 * fsingle_add1         ; calc srca and srcb,
 *                      ; convert float_32 to TileGXFPSFmt result.
 *                      ; move TileGXFPSFmt result to dest.
 *
 * fsingle_sub1         ; calc srca and srcb.
 *                      ; convert float_32 to TileGXFPSFmt result.
 *                      ; move TileGXFPSFmt result to dest.
 *
 * fsingle_addsub2      ; nop.
 *
 * fsingle_mul1         ; calc srca and srcb.
 *                      ; convert float_32 value to TileGXFPSFmt result.
 *                      ; move TileGXFPSFmt result to dest.
 *
 * fsingle_mul2         ; move srca to dest.
 *
 * fsingle_pack1        ; nop
 *
 * fsingle_pack2        ; treate srca as TileGXFPSFmt result.
 *                      ; convert TileGXFPSFmt result to float_32 value.
 *                      ; move float_32 value to dest.
 */

/*
 * Dobule format. flag: 64 bits, value: 64 bits.
 *
 * Double exp analyzing: (0x21b00 << 1) - 0x36(54) = 0x400
 *
 *   17  16  15  14  13  12  11  10   9   8   7    6   5   4   3   2   1   0
 *
 *    1   0   0   0   0   1   1   0   1   1   0    0   0   0   0   0   0   0
 *
 *    0   0   0   0   0   1   1   0   1   1   1    => 0x37(55)
 *
 *    0   1   1   1   1   1   1   1   1   1   1    => 0x3ff
 *
 */
typedef struct TileGXFPDFmtF {

    uint64_t unknown0 : 7;        /* unknown */
    uint64_t exp : 11;            /* exp, 0x21b << 1: 55 + TILEGX_F_EXP_DZERO */
    uint64_t unknown1 : 2;        /* unknown */
    uint64_t sign : 1;            /* Sign bit for the total value */

    uint64_t addsub: 1;           /* add or sub bit */
    uint64_t unknown2: 3;         /* unknown */

    /* Come from TILE-Gx ISA document, Table 7-2 for floating point */
    uint64_t unordered : 1;       /* The two are unordered */
    uint64_t lt : 1;              /* 1st is less than 2nd */
    uint64_t le : 1;              /* 1st is less than or equal to 2nd */
    uint64_t gt : 1;              /* 1st is greater than 2nd */
    uint64_t ge : 1;              /* 1st is greater than or equal to 2nd */
    uint64_t eq : 1;              /* The two operands are equal */
    uint64_t neq : 1;             /* The two operands are not equal */

    uint64_t unknown3 : 32;       /* unknown */
} TileGXFPDFmtF;

typedef struct TileGXFPDFmtV {
    uint64_t mantissa : 60;       /* mantissa */
    uint64_t unknown1 : 4;        /* unknown */
} TileGXFPDFmtV;
/*
 * FDouble instructions implemenation:
 *
 * fdouble_unpack_min   ; srca and srcb are float_64 value.
 *                      ; get the min absolute value's mantissa.
 *                      ; move mantissa to dest.
 *
 * fdouble_unpack_max   ; srca and srcb are float_64 value.
 *                      ; get the max absolute value's mantissa.
 *                      ; move "mantissa << (exp_max - exp_min)" to dest.
 *
 * fdouble_add_flags    ; srca and srcb are float_64 value.
 *                      ; calc exp (exp_min), sign, and comp bits for flags.
 *                      ; set addsub bit to flags and move flags to dest.
 *
 * fdouble_sub_flags    ; srca and srcb are float_64 value.
 *                      ; calc exp (exp_min), sign, and comp bits for flags.
 *                      ; set addsub bit to flags and move flags to dest.
 *
 * fdouble_addsub:      ; dest, srca (max, min mantissa), and srcb (flags).
 *                      ; "dest +/- srca" depend on the add/sub bit of flags.
 *                      ; move result mantissa to dest.
 *
 * fdouble_mul_flags:   ; srca and srcb are float_64 value.
 *                      ; calc sign (xor), exp (exp_min + exp_max), and comp bits.
 *                      ; mix sign, exp, and comp bits as flags to dest.
 *
 * fdouble_pack1        ; move srcb (flags) to dest.
 *
 * fdouble_pack2        ; srca, srcb (high, low mantissa), and dest (flags)
 *                      ; normalize and pack result from srca, srcb, and dest.
 *                      ; move result to dest.
 */

#pragma pack(pop)


Thanks.
-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed

 		 	   		  

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-10-28 15:53 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-01  9:47 [Qemu-devel] [Consult] tilegx: About floating point instructions Chen Gang
2015-08-03 16:40 ` Richard Henderson
2015-08-03 20:47   ` Chen Gang
2015-08-04 13:56     ` Chen Gang
2015-08-04 15:04       ` Richard Henderson
2015-08-05 14:16         ` Chen Gang
2015-08-08 17:23           ` Chen Gang
2015-08-09  1:10             ` Chen Gang
2015-08-09  1:14               ` Chen Gang
2015-08-11 13:18                 ` Chen Gang
2015-08-13 14:59                   ` Chen Gang
2015-08-15  9:56                     ` Chen Gang
2015-08-15 15:47                       ` Richard Henderson
2015-08-15 18:16                         ` Chen Gang
2015-08-16  1:41                           ` Chen Gang
2015-08-16  3:59                             ` Chen Gang
2015-08-17 17:31                           ` Richard Henderson
2015-08-17 21:09                             ` Chen Gang
2015-08-17 21:43                               ` Richard Henderson
2015-08-18 14:27                                 ` Chen Gang
2015-08-18 14:32                                   ` Peter Maydell
2015-08-18 21:29                                     ` Chen Gang
2015-08-18 22:15                                       ` Peter Maydell
2015-08-18 22:24                                         ` Chen Gang
2015-10-25 15:38                             ` Chen Gang
2015-10-26 14:14                               ` Chen Gang
     [not found]                                 ` <5630EF69.90906@hotmail.com>
2015-10-28 15:53                                   ` Chen Gang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.