From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:44077) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gyiKM-0000Oh-3Y for qemu-devel@nongnu.org; Tue, 26 Feb 2019 14:23:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gyiKL-0004Qv-60 for qemu-devel@nongnu.org; Tue, 26 Feb 2019 14:23:42 -0500 References: <20190226113915.20150-1-david@redhat.com> <20190226113915.20150-7-david@redhat.com> From: David Hildenbrand Message-ID: <7bfddcbc-5ace-5f28-8042-8b576f31d573@redhat.com> Date: Tue, 26 Feb 2019 20:23:18 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson , qemu-devel@nongnu.org Cc: qemu-s390x@nongnu.org, Cornelia Huck , Thomas Huth , Richard Henderson On 26.02.19 20:12, Richard Henderson wrote: > On 2/26/19 3:38 AM, David Hildenbrand wrote: >> +static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o) >> +{ >> + const uint16_t i2 = get_field(s->fields, i2); >> + TCGv_i32 ones = tcg_const_i32(-1u); >> + TCGv_i32 zeroes = tcg_const_i32(0); >> + int i; >> + >> + for (i = 0; i < 16; i++) { >> + if (extract32(i2, 15 - i, 1)) { >> + write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8); >> + } else { >> + write_vec_element_i32(zeroes, get_field(s->fields, v1), i, MO_8); >> + } >> + } >> + tcg_temp_free_i32(ones); >> + tcg_temp_free_i32(zeroes); >> + return DISAS_NEXT; >> +} > > While this works, it's not in the spirit of > >> Programming Note: VECTOR GENERATE BYTE >> MASK is the preferred method for setting a vector >> register to all zeroes or ones. Good point, I skipped that note so far. > > Better, I think, with Many instructions to implement, so little time to fine tune stuff so far. However I have tests for VGBM, so I can easily get it working. Will play with it! > > uint64_t generate_byte_mask(uint8_t mask) > { > uint64_t r = 0; > int i; > for (i = 0; i < 8; i++) { > if ((mask >> i) & 1) { > r |= 0xffull << (i * 8); > } > } > return r; > } > > if (i2 == (i2 & 0xff) * 0x0101) { > /* masks for both halves of the vector are the same. > trust tcg to produce a good constant loading. */ > tcg_gen_gvec_dup64i(vec_full_reg_offset(s, v1), 16, 16, > generate_byte_mask(i2 & 0xff)); > } else { > TCGv_i64 t = tcg_temp_new_i64(); > tcg_gen_movi_i64(t, generate_byte_mask(i2 >> 8)); > write_vec_element_i64(t, v1, 0, MO_64); > tcg_gen_movi_i64(t, generate_byte_mask(i2 & 0xff)); > write_vec_element_i64(t, v1, 1, MO_64); > tcg_temp_free_i64(); > } > > Somewhere behind tcg_gen_gvec_dup64i, I check to see if the constant can be > decomposed further, which will eventually bottom out at > > vpxor %xmm0,%xmm0,%xmm0 // all zeros > vpcmpeq %xmm0,%xmm0,%xmm0 // all ones > > and even more interesting combinations for tcg/aarch64. > > At this point I want to highlight how helpful your reviews are. Amazing! :) > > r~ > -- Thanks, David / dhildenb