Re: QEMU for Qualcomm Hexagon - KVM Forum talk and code available

From: "Alex Bennée" <alex.bennee@linaro.org>
To: qemu-devel@nongnu.org
Cc: "Alessandro Di Federico" <ale@rev.ng>,
	"Taylor Simpson" <tsimpson@quicinc.com>,
	"nizzo@rev.ng" <nizzo@rev.ng>,
	"Niccolò Izzo" <izzoniccolo@gmail.com>,
	"Aleksandar Markovic" <aleksandar.m.mail@gmail.com>
Subject: Re: QEMU for Qualcomm Hexagon - KVM Forum talk and code available
Date: Wed, 13 Nov 2019 10:31:16 +0000	[thread overview]
Message-ID: <87d0dw83uz.fsf@linaro.org> (raw)
In-Reply-To: <BYAPR02MB48865884056A88B660B620FCDE770@BYAPR02MB4886.namprd02.prod.outlook.com>

Taylor Simpson <tsimpson@quicinc.com> writes:

> I had discussions with several people at the KVM Forum, and I’ve been thinking about how to divide up the code for community review.  Here is my proposal for the steps.
>
>   1.  linux-user changes + linux-user/hexagon + skeleton of target/hexagon
> This is the minimum amount to build and run a very simple program.  I
>   have an assembly program that prints “Hello” and exits.  It is
>   constructed to use very few instructions that can be added brute
>   force in the Hexagon back end.

I'm hoping most of the linux-user changes are in the hexagon runloop?
There has been quite a bit of work splitting up and cleaning up the
#ifdef mess in linux-user over the last few years.

>   2.  Add the code that is imported from the Hexagon simulator and the qemu helper generator
> This will allow the scalar ISA to be executed.  This will grow the set
> of programs that could execute, but there will still be limitations.
> In particular, there can be no packets which means the C library won’t
> work .  We have to build with -nostdlib

You could run -nostdlib system TCG tests (hello and memory) but that
would require modelling some sort of hardware and assumes you have a
simple serial port or semihosting solution. That said a bunch of the
MIPS tests are linux-user and -nostdlib so that isn't a major problem in
getting some of the tests running.

When you say code imported from the hexagon simulator I was under the
impression you were generating code from the instruction description.
Otherwise you'll need to be very clear about your licensing grants.

>   3.  Add support for packet semantics
> At this point, we will be able to execute full programs linked with
> the C library.  This will include the check-tcg tests.

I think the interesting question is if the roll-back semantics of the
hexagon are something we might need for other emulated architectures or
is a particularly specific solution for Hexagon (I'm guessing the later).

>   4.  Add support for the wide vector extensions
>   5.  Add the helper overrides for performance optimization
> Some of these will be written by hand, and we’ll work with rev.ng to
>   integrate their flex/bison generator.

One thing to nail down will be will we include the generated code in the
source tree with a tool to regenerate (much like we do for
linux-headers) or if we want to add the dependency and regenerate each
time from scratch. I don't see including flex/bison as a dependency
being a major issue (in fact we have it in our docker images so I guess
something uses it). However it might be trickier depending on
libclang which was also being discussed.

>
> I would love some feedback on this proposal.  Hopefully, that is enough detail so that people can comment.  If anything isn’t clear, please ask questions.
>
>
> Thanks,
> Taylor
>
>
> From: Qemu-devel <qemu-devel-bounces+tsimpson=quicinc.com@nongnu.org> On Behalf Of Taylor Simpson
> Sent: Tuesday, November 5, 2019 10:33 AM
> To: Aleksandar Markovic <aleksandar.m.mail@gmail.com>
> Cc: Alessandro Di Federico <ale@rev.ng>; nizzo@rev.ng; qemu-devel@nongnu.org; Niccolò Izzo <izzoniccolo@gmail.com>
> Subject: RE: QEMU for Qualcomm Hexagon - KVM Forum talk and code available
>
> Hi Aleksandar,
>
> Thank you – We’re glad you enjoyed the talk.
>
> One point of clarification on SIMD in Hexagon.  What we refer to as the “scalar” core does have some SIMD operations.  Register pairs are 8 bytes, and there are several SIMD instructions.  The example we showed in the talk included a VADDH instruction.  It treats the register pair as 4 half-words and does a vector add.  Then there are the Hexagon Vector eXtensions (HVX) instructions that operate on 128-byte vectors.  There is a wide variety of instructions in this set.  As you mentioned, some of them are pure SIMD and others are very complex.
>
> For the helper generator, the vast majority of these are implemented with helpers.  There are only 2 vector instructions in the scalar core that have a TCG override, and all of the HVX instructions are implemented with helpers.  If you are interested in a deeper dive, see below.
>
> Alessandro and Niccolo can comment on the flex/bison implementation.
>
> Thanks,
> Taylor
>
>
> Now for the deeper dive in case anyone is interested.  Look at the genptr.c file in target/hexagon.
>
> The first vector instruction that is with an override is A6_vminub_RdP.  It does a byte-wise comparison of two register pairs and sets a predicate register indicating whether the byte in the left or right operand is greater.  Here is the TCG code.
> #define fWRAP_A6_vminub_RdP(GENHLPR, SHORTCODE) \
> { \
>     TCGv BYTE = tcg_temp_new(); \
>     TCGv left = tcg_temp_new(); \
>     TCGv right = tcg_temp_new(); \
>     TCGv tmp = tcg_temp_new(); \
>     int i; \
>     tcg_gen_movi_tl(PeV, 0); \
>     tcg_gen_movi_i64(RddV, 0); \
>     for (i = 0; i < 8; i++) { \
>         fGETUBYTE(i, RttV); \
>         tcg_gen_mov_tl(left, BYTE); \
>         fGETUBYTE(i, RssV); \
>         tcg_gen_mov_tl(right, BYTE); \
>         tcg_gen_setcond_tl(TCG_COND_GT, tmp, left, right); \
>         fSETBIT(i, PeV, tmp); \
>         fMIN(tmp, left, right); \
>         fSETBYTE(i, RddV, tmp); \
>     } \
>     tcg_temp_free(BYTE); \
>     tcg_temp_free(left); \
>     tcg_temp_free(right); \
>     tcg_temp_free(tmp); \
> }
>
> The second instruction is S2_vsplatrb.  It takes the byte from the operand and replicates it 4 times into the destination register.  Here is the TCG code.
> #define fWRAP_S2_vsplatrb(GENHLPR, SHORTCODE) \
> { \
>     TCGv tmp = tcg_temp_new(); \
>     int i; \
>     tcg_gen_movi_tl(RdV, 0); \
>     tcg_gen_andi_tl(tmp, RsV, 0xff); \
>     for (i = 0; i < 4; i++) { \
>         tcg_gen_shli_tl(RdV, RdV, 8); \
>         tcg_gen_or_tl(RdV, RdV, tmp); \
>     } \
>     tcg_temp_free(tmp); \
> }
>
>
> From: Aleksandar Markovic <aleksandar.m.mail@gmail.com<mailto:aleksandar.m.mail@gmail.com>>
> Sent: Monday, November 4, 2019 6:05 PM
> To: Taylor Simpson <tsimpson@quicinc.com<mailto:tsimpson@quicinc.com>>
> Cc: qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>; Alessandro Di Federico <ale@rev.ng<mailto:ale@rev.ng>>; nizzo@rev.ng<mailto:nizzo@rev.ng>; Niccolò Izzo <izzoniccolo@gmail.com<mailto:izzoniccolo@gmail.com>>
> Subject: Re: QEMU for Qualcomm Hexagon - KVM Forum talk and code available
>
>
> CAUTION: This email originated from outside of the organization.
>
>
> On Friday, October 25, 2019, Taylor Simpson <tsimpson@quicinc.com<mailto:tsimpson@quicinc.com>> wrote:
> We would like inform the you that we will be doing a talk at the KVM Forum next week on QEMU for Qualcomm Hexagon.  Alessandro Di Federico, Niccolo Izzo, and I have been working independently on implementations of the Hexagon target.  We plan to merge the implementations, have a community review, and ultimately have Hexagon be an official target in QEMU.  Our code is available at the links below.
> https://github.com/revng/qemu-hexagon
> https://github.com/quic/qemu
> If anyone has any feedback on the code as it stands today or guidance on how best to prepare it for review, please let us know.
>
>
> Hi, Taylor, Niccolo (and Alessandro too).
>
> I didn't have a chance to take a look at neither the code nor the docs, but I did attend you presentation at KVM Forum, and I found it superb and attractive, one of the best on the conference, if not the very best.
>
> I just have a couple of general questions:
>
> - Regarding the code you plan to upstream, are all SIMD instructions implemented via tcg API, or perhaps some of them remain being implemented using helpers?
>
> - Most of SIMD instructions can be viewed simply as several paralel elementary operations. However, for a given SIMD instruction set, usually not all of them fit into this pattern. For example, "horizontal add" (addind data elements from the same SIMD register), various "pack/unpack/interleave/merge" operations, and more general "shuffle/permute" operations as well (here I am not sure which of these are included in Hexagon SIMD set, but there must be some). How did you deal with them?
>
> - What were the most challenging Hexagon SIMD instructions you came accross while developing your solution?
>
> Sincerely,
> Aleksandar
>
>
>
>
> Thanks,
> Taylor

--
Alex Bennée