From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B7BD8C677F1 for ; Thu, 12 Jan 2023 15:30:38 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pFzVY-0002nz-T6; Thu, 12 Jan 2023 10:28:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pFzVO-0002lR-BU for qemu-devel@nongnu.org; Thu, 12 Jan 2023 10:28:38 -0500 Received: from rev.ng ([5.9.113.41]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pFzVM-0003y0-18 for qemu-devel@nongnu.org; Thu, 12 Jan 2023 10:28:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rev.ng; s=dkim; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=cWEwDq7sYWw+XlGl/O65SStEEAQ6Qs9HeNBDr0z0Iek=; b=nxPwV1dIo+nGd7Rgfnq1tOhYMZ xPbrEzIfKlqd4FWW+BS8arfxkdlPM7mf/8rapWjisOI2kCX+pGD4T0/412r0He4QsFp3TvzpFK9+m NCtYuiTXD2z4XSbaRWdjv8XheJXKngPMbwnnrB2xSrW1+1hihIQNVZcb9Di3o+jCBypQ=; Date: Thu, 12 Jan 2023 16:28:21 +0100 To: Richard Henderson Cc: qemu-devel@nongnu.org, Taylor Simpson , Alex =?UTF-8?B?QmVubsOpZQ==?= Subject: Re: [RFC] Reducing NEED_CPU_H usage Message-ID: <20230112162821.21ae8d7a@orange> In-Reply-To: References: <20221228171617.059750c3@orange> Followup-To: Philippe =?UTF-8?B?TWF0aGlldS1EYXVkw6k=?= MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=5.9.113.41; envelope-from=ale@rev.ng; helo=rev.ng X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Alessandro Di Federico From: Alessandro Di Federico via Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Tue, 10 Jan 2023 11:56:50 -0800 Richard Henderson wrote: > However, at some point we do want to keep some target addresses in > the proper size. For instance within the softmmu tlb, where > CPUTLBEntry is either 16 or 32 bytes, depending. So that would be an optimization if `HOST_LONG_BITS =3D=3D 32 && TARGET_LONG_BITS =3D=3D 32`, correct? I've heard about dropping 32 bit hosts multiple times here and there. Maybe we could start with dropping oversized guests, which AFAIU are the real offenders for most (all?) of these situations. > > ## `abi_ulong` > >=20 > > Similar to `target_ulong`, but with alignment info. =20 >=20 > Pardon? There's no alignment info in abi_ulong. =46rom include/exec/user/abitypes.h: typedef uint32_t abi_ulong __attribute__((aligned(ABI_LONG_ALIGNMENT))); typedef target_ulong abi_ulong __attribute__((aligned(ABI_LONG_ALIGNMEN= T))); I thought that was the difference. Thanks for the clarification. > This one requires some work within tcg/ to handle two target address > sizes simultaneously. It should not be technically difficult to > solve, but it does involve adding a few TCG opcodes and adjusting all > tcg backends. I'm a bit confused by this, do backends for some reason have expectations about the address size? Wouldn't this be enough? diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c index 019fab00ccb..175162b8fef 100644 --- a/tcg/tcg-op.c +++ b/tcg/tcg-op.c @@ -29,0 +29,0 @@ @@ -2827,17 +2843,17 @@ static inline MemOp tcg_canonicalize_memop(MemOp op= , bool is64, bool st) return op; } =20 -static void gen_ldst_i32(TCGOpcode opc, TCGv_i32 val, TCGv addr, +static void gen_ldst_i32(TCGOpcode opc, TCGv_i32 val, TCGv_dyn addr, MemOp memop, TCGArg idx) { MemOpIdx oi =3D make_memop_idx(memop, idx); -#if TARGET_LONG_BITS =3D=3D 32 - tcg_gen_op3i_i32(opc, val, addr, oi); -#else - if (TCG_TARGET_REG_BITS =3D=3D 32) { - tcg_gen_op4i_i32(opc, val, TCGV_LOW(addr), TCGV_HIGH(addr), oi); - } else { - tcg_gen_op3(opc, tcgv_i32_arg(val), tcgv_i64_arg(addr), oi); - } -#endif + if (addr.size =3D=3D 32) { + tcg_gen_op3i_i32(opc, val, addr.i32, oi); + } else { + if (TCG_TARGET_REG_BITS =3D=3D 32) { + tcg_gen_op4i_i32(opc, val, TCGV_LOW(addr.i64), TCGV_HIGH(addr.= i64), oi); + } else { + tcg_gen_op3(opc, tcgv_i32_arg(val), tcgv_i64_arg(addr.i64), oi= ); + } + } } > This forgets that both TCGv_i32 and TCGv_i64 are represented by > TCGTemp, https://i.imgflip.com/777wax.jpg > which contains 'TCGType type' to discriminate. This is not > exposed to target/, but it's there. >=20 > Anyway, there's no need for this. So, if I got it right, we could just make TCGv become a new opaque type, propagate it down until the spot where we actually need to know its size and then just have some `TCGTemp *tcgv_temp(TCGv v)` function to inspect the actual size? Makes sense! > Before CPUNegativeOffsetState, we had all of those variables within > CPUState, and even comments that they should remain at the end of the > struct. But those comments were ignored and one day icount_decr was > too far away from env for easy addressing on arm host. Luckily there > was an assert that fired, but it didn't get caught right away. How comes it wasn't caught immediately? We could have something like: QEMU_BUILD_BUG_MSG(&ArchCPU.env - &ArchCPU.neg.tlb < DESIRED_THRESHOLD) > > # Current status > >=20 > > We currently have a branch where we can build (but not link nor > > run) a `x86_64-linux-user` configuration where `NEED_CPU_H` is > > defined only for translation units in `target/` and `linux-user/`. =20 >=20 > This effort should be exclusive to system mode -- don't consider > *-user at all. Each linux-user target bakes in not just target > architecture parameters, which are complicated enough, but also C > ABI, and kernel API. Moreover, the most common use case for > linux-user is a statically linked binary that operates within a > chroot. Our current goal is to get the following compilation unit to build without NEED_CPU_H: trace/control-target.c gdbstub/gdbstub.c cpu.c disas.c page-vary.c tcg/optimize.c tcg/region.c tcg/tcg.c tcg/tcg-common.c tcg/tcg-op.c tcg/tcg-op-gvec.c tcg/tcg-op-vec.c fpu/softfloat.c accel/accel-common.c accel/tcg/tcg-all.c accel/tcg/cpu-exec-common.c accel/tcg/cpu-exec.c accel/tcg/tb-maint.c accel/tcg/tcg-runtime-gvec.c accel/tcg/tcg-runtime.c accel/tcg/translate-all.c accel/tcg/translator.c accel/tcg/user-exec.c accel/tcg/user-exec-stub.c accel/tcg/plugin-gen.c plugins/loader.c plugins/core.c plugins/api.c They are subset of `arch_srcs` from `meson.build`. Making them target agnostic for *-user too should be easy and could save some build time. But yeah, we'll now focus on system-mode. We'll now try to sort out things from the easiest to the most complex and start send out patches. Thanks a lot for your valuable insights! --=20 Alessandro Di Federico rev.ng Labs