From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B5D1C433EF for ; Fri, 10 Sep 2021 13:37:46 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5CAA8611BF for ; Fri, 10 Sep 2021 13:37:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5CAA8611BF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=nongnu.org Received: from localhost ([::1]:59884 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mOgiu-0006Kc-EU for qemu-devel@archiver.kernel.org; Fri, 10 Sep 2021 09:37:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:54356) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mOghW-0004hH-7N for qemu-devel@nongnu.org; Fri, 10 Sep 2021 09:36:18 -0400 Received: from mail-wr1-x436.google.com ([2a00:1450:4864:20::436]:45635) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mOghS-0002xn-Ms for qemu-devel@nongnu.org; Fri, 10 Sep 2021 09:36:17 -0400 Received: by mail-wr1-x436.google.com with SMTP id n5so2660661wro.12 for ; Fri, 10 Sep 2021 06:36:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull-eu.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=C+dqxZ1jmgnV95OhuqA0NZRVNTqOwR5CueYrQ9kqk9w=; b=vfhEYTOTGEVBB4J0g9QKaFbxntLW/xTQ/SNgFi6uk48CE5nsbsG/P9tz9oFKKTmehy vELvDsqii9sSv0Fyw0UEpcAJcI3ALggIwYC8gCDFywQ3BWbQATIXjUnYFkPbatPArld0 jiuyLRXeYpA1p9/SqzrvKPPtOLuTbvbAT/6+SHAT6O/7z9bIPl7V215SPNaF2eCXUfpS qEjfAOzKOK471PDEafKp0GriCmoDnaZ1c2n/Ku5TXJK0FCt5SdqOINZVkAVgH10W9HL4 deC+A7oPm861kk4s20MOIXTXOc6EjbqaOoTHoaF6hAG6ZqyG7k8Jz1YIw/2ff210CvBd DVVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C+dqxZ1jmgnV95OhuqA0NZRVNTqOwR5CueYrQ9kqk9w=; b=rdi9Ygwt0ACN/y3lmLuyPFCnhSBycyWnLU5dhJcqEsN6avxgdE18snFWpRisnUVw9U zPSccUiWYqcABOUUDhSvSbOn7EIQAqMPd02Qll2520nJhkyzIAhbOFcocgCotm6pJ9c6 aw5Rg3kycVFUCs5sb+Rx2Hd9U5+2D7vK/NMcrpgs2eWJYRRewDriVka4B7ZF2xy/YKvS IRGNWxhC1AdDYLXXbI06dqcvp9wxBg/s98bEsuVZmX+7tuwWhmLyoRpSrmjqEGTacxkS 3mKsN5QVwflF4zn53CkMrsyqZxbjJOR0U/BBYsO5bf6i69qrGE6MMnVf6gE8TJJfQ8TG GTKw== X-Gm-Message-State: AOAM532kgwJDuICdrPeTUk9IWovCRoif/tTTJ3JX8HMYY//BZiInoOxB dVp5Y/Vxt6v65GSijTQr9qZYH2RizDojBpr6X2FsZg== X-Google-Smtp-Source: ABdhPJwyRfw/0x/ohJvvRfoK9u8oy48CoF1KVBXunwCKE0qcE9HKOvM0abhD6oqE0xkcaMNt/dMlZRoHzaPBpCTRYRg= X-Received: by 2002:a5d:6750:: with SMTP id l16mr9745103wrw.174.1631280971493; Fri, 10 Sep 2021 06:36:11 -0700 (PDT) MIME-Version: 1.0 References: <20210904203516.2570119-1-philipp.tomsich@vrull.eu> <20210904203516.2570119-4-philipp.tomsich@vrull.eu> <3e608998-3270-cf41-66b5-32158db99de0@linaro.org> In-Reply-To: From: Philipp Tomsich Date: Fri, 10 Sep 2021 16:36:00 +0300 Message-ID: Subject: Re: [PATCH v10 03/16] target/riscv: clwz must ignore high bits (use shift-left & changed logic) To: Richard Henderson Content-Type: multipart/alternative; boundary="000000000000d5c74405cba43216" Received-SPF: pass client-ip=2a00:1450:4864:20::436; envelope-from=philipp.tomsich@vrull.eu; helo=mail-wr1-x436.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kito Cheng , Alistair Francis , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --000000000000d5c74405cba43216 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Richard, Did you have a chance to consider what to do with clzw? I would prefer to avoid the extra extension instructions and change the implementation (and would update the commit message to provide more context), but if you insist on setting 'ctx->w' I'll just have the extra extensions emitted than delay this series further=E2=80=A6 Thanks, Philipp. On Sun, 5 Sept 2021 at 12:01, Philipp Tomsich wrote: > On Sun 5. Sep 2021 at 11:11, Richard Henderson > wrote: > > > > On 9/4/21 10:35 PM, Philipp Tomsich wrote: > > > Assume clzw being executed on a register that is not sign-extended, > such > > > as for the following sequence that uses (1ULL << 63) | 392 as the > operand > > > to clzw: > > > bseti a2, zero, 63 > > > addi a2, a2, 392 > > > clzw a3, a2 > > > The correct result of clzw would be 23, but the current implementatio= n > > > returns -32 (as it performs a 64bit clz, which results in 0 leading > zero > > > bits, and then subtracts 32). > > > > > > Fix this by changing the implementation to: > > > 1. shift the original register up by 32 > > > 2. performs a target-length (64bit) clz > > > 3. return 32 if no bits are set > > > > > > Signed-off-by: Philipp Tomsich > > > --- > > > > > > Changes in v10: > > > - New patch, fixing correctnes for clzw called on a register with > undefined > > > (as in: not properly sign-extended) upper bits. > > > > But we have > > > > return gen_unary(ctx, a, EXT_ZERO, gen_clzw); > > > > should *not* be undefined. Ah, what's missing is > > > > ctx->w =3D true; > > > > within trans_clzw to cause the extend to take effect. > > > > There are a few other "w" functions that are missing that set, though > they use EXT_NONE so > > there is no visible bug, it would probably be best to set w anyway. > > > I had originally considered that (it would have to be ctx->w =3D true; > and EXT_SIGN), > but that has the side-effect of performing an extension on the result > of the clzw as > well (i.e. bot get_gpr and set_gpr are extended). > > However, this is not what clzw does: it only ignores the upper bits > and then produces > a result in the value-range [0..32]. An extension on the result of > either a clz or clzw > is never needed (we have been fighting that problem in GCC and had to > use patterns > for the combiner), so I don't want to introduce this inefficiency in qemu= . > > If you have EXT_SIGN (or _ZERO), we will end up with 2 additional > extensions (one > on the argument, one on the result) in addition to the 2 other tcg > operations that we > need (i.e. clzi/subi for the extending case, shli/clzi for the proposed > fix). > > So I believe that this commit is the best fix: > 1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the > input _and_ > extends the output), as it only treats the input as 32bit, but the > output is xlen. > 2. It doesn't introduce any unnecessary extensions, but handles the > case in-place. > > Philipp. > --000000000000d5c74405cba43216 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Richard,

Did you have = a chance to consider what to do with clzw?
I would prefer to avoi= d the extra extension instructions and change the implementation (and would= update the commit message to provide more context), but if you insist on s= etting 'ctx->w' I'll just have the extra extensions emitted = than delay this series further=E2=80=A6

Thanks= ,
Philipp.

On Sun, 5 Sept 2021 at 12:01, Philipp Tomsich <philipp.tomsich@vrull.eu> wrot= e:
On Sun 5. Sep= 2021 at 11:11, Richard Henderson
<richa= rd.henderson@linaro.org> wrote:
>
> On 9/4/21 10:35 PM, Philipp Tomsich wrote:
> > Assume clzw being executed on a register that is not sign-extende= d, such
> > as for the following sequence that uses (1ULL << 63) | 392 = as the operand
> > to clzw:
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0bseti=C2=A0 =C2=A0a2, zero, 63
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0addi=C2=A0 =C2=A0 a2, a2, 392
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0clzw=C2=A0 =C2=A0 a3, a2
> > The correct result of clzw would be 23, but the current implement= ation
> > returns -32 (as it performs a 64bit clz, which results in 0 leadi= ng zero
> > bits, and then subtracts 32).
> >
> > Fix this by changing the implementation to:
> >=C2=A0 =C2=A01. shift the original register up by 32
> >=C2=A0 =C2=A02. performs a target-length (64bit) clz
> >=C2=A0 =C2=A03. return 32 if no bits are set
> >
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > ---
> >
> > Changes in v10:
> > - New patch, fixing correctnes for clzw called on a register with= undefined
> >=C2=A0 =C2=A0 (as in: not properly sign-extended) upper bits.
>
> But we have
>
>=C2=A0 =C2=A0 =C2=A0 return gen_unary(ctx, a, EXT_ZERO, gen_clzw);
>
> should *not* be undefined.=C2=A0 Ah, what's missing is
>
>=C2=A0 =C2=A0 =C2=A0 ctx->w =3D true;
>
> within trans_clzw to cause the extend to take effect.
>
> There are a few other "w" functions that are missing that se= t, though they use EXT_NONE so
> there is no visible bug, it would probably be best to set w anyway.

I had originally considered that (it would have to be ctx->w =3D true; and EXT_SIGN),
but that has the side-effect of performing an extension on the result
of the clzw as
well (i.e. bot get_gpr and set_gpr are extended).

However, this is not what clzw does: it only ignores the upper bits
and then produces
a result in the value-range [0..32]. An extension on the result of
either a clz or clzw
is never needed (we have been fighting that problem in GCC and had to
use patterns
for the combiner), so I don't want to introduce this inefficiency in qe= mu.

If you have EXT_SIGN (or _ZERO), we will end up with 2 additional
extensions (one
on the argument, one on the result) in addition to the 2 other tcg
operations that we
need (i.e. clzi/subi for the extending case, shli/clzi for the proposed fix= ).

So I believe that this commit is the best fix:
1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the<= br> input _and_
extends the output), as it only treats the input as 32bit, but the
output is xlen.
2. It doesn't introduce any unnecessary extensions, but handles the
case in-place.

Philipp.
--000000000000d5c74405cba43216--