From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49AE9C38145 for ; Tue, 6 Sep 2022 06:23:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233280AbiIFGXK (ORCPT ); Tue, 6 Sep 2022 02:23:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236570AbiIFGWr (ORCPT ); Tue, 6 Sep 2022 02:22:47 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 776C9303E1 for ; Mon, 5 Sep 2022 23:22:46 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0826760C53 for ; Tue, 6 Sep 2022 06:22:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5519CC433C1; Tue, 6 Sep 2022 06:22:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1662445365; bh=Ttd6jMmiYXjXUEFse6N5hwSYxHOuhR86E0Y0Q0bE73I=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=aanE65dZVoW8kmB0r72rrPoxzcKiNeeSQpQ4q5eDOx6UN2RPomAl/e4yeeXP+S5bH fLNd937+8GYbE+45fu9j0X+bqkEFimjPS8AfraLX+0PECuTToE0bg6DPz6XvBOMKfF XlV0B8szXFk6+z5H01g8Oebw1IPNH8YoF5tAs+C+/nFZObz7ZB8uwPl9Ecde5OveKH 6NSYugL+rwhtDM9gUGUiJRYq6DJh7LkRYrp+8lf4992Ned/pS3PUeHv/imlgrSYhPq /PMOpCeqwXju8ctfBPr6xsbxPqYShU3xw8sM6SvTC65XAsBoKI1LvlPU/4HWinR1rz pAafqX/dd/gxg== Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oVRyt-008EQd-30; Tue, 06 Sep 2022 07:22:43 +0100 Date: Tue, 06 Sep 2022 07:22:42 +0100 Message-ID: <87sfl5uiml.wl-maz@kernel.org> From: Marc Zyngier To: Leo Yan Cc: Thomas Gleixner , linux-kernel@vger.kernel.org, Ard Biesheuvel , Bertrand Marquis , Rahul Singh , Julien Grall , Mathieu Poirier Subject: Re: [PATCH] irqchip/gic-v3: Don't reserve persistent memory for Xen domain In-Reply-To: <20220906024040.503764-1-leo.yan@linaro.org> References: <20220906024040.503764-1-leo.yan@linaro.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: leo.yan@linaro.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, ardb@kernel.org, Bertrand.Marquis@arm.com, Rahul.Singh@arm.com, jgrall@amazon.com, mathieu.poirier@linaro.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 06 Sep 2022 03:40:40 +0100, Leo Yan wrote: > > For GICv3 with its redistributor, the driver needs to reserve the > persistent memory for LPI configuration and pending tables, so the > reserved pages will not be overwritten by secondary kernel launched by > kexec, the hardware can continue to use the pages for maintenance > LPI states. > > When kernel runs in Xen domain, Xen uses FDT with encapsulating ACPI > table in device tree. Therefore, the EFI stub is not invoked and > the memreserve table is not installed, this leads to the memory > cannot be reserved as persistent region and kernel reports oops: > > [ 0.403737] ------------[ cut here ]------------ > [ 0.403738] WARNING: CPU: 30 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:3074 its_cpu_init+0x814/0xae0 > [ 0.403745] Modules linked in: > [ 0.403748] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G W 5.15.23-ampere-lts-standard #1 > [ 0.403752] pstate: 600001c5 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 0.403755] pc : its_cpu_init+0x814/0xae0 > [ 0.403758] lr : its_cpu_init+0x810/0xae0 > [ 0.403761] sp : ffff800009c03ce0 > [ 0.403762] x29: ffff800009c03ce0 x28: 000000000000001e x27: ffff880711f43000 > [ 0.403767] x26: ffff80000a3c0070 x25: fffffc1ffe0a4400 x24: ffff80000a3c0000 > [ 0.403770] x23: ffff8000095bc998 x22: ffff8000090a6000 x21: ffff800009850cb0 > [ 0.403774] x20: ffff800009701a10 x19: ffff800009701000 x18: ffffffffffffffff > [ 0.403777] x17: 3030303035303031 x16: 3030313030303078 x15: 303a30206e6f6967 > [ 0.403780] x14: 6572206530312072 x13: 3030303030353030 x12: 3130303130303030 > [ 0.403784] x11: 78303a30206e6f69 x10: 6765722065303120 x9 : ffff80000870e710 > [ 0.403788] x8 : 6964657220646e75 x7 : 0000000000000003 x6 : 0000000000000000 > [ 0.403791] x5 : 0000000000000000 x4 : fffffc0000000000 x3 : 0000000000000010 > [ 0.403794] x2 : 000000000000ffff x1 : 0000000000010000 x0 : 00000000ffffffed > [ 0.403798] Call trace: > [ 0.403799] its_cpu_init+0x814/0xae0 > [ 0.403802] gic_starting_cpu+0x48/0x90 > [ 0.403805] cpuhp_invoke_callback+0x16c/0x5b0 > [ 0.403808] cpuhp_invoke_callback_range+0x78/0xf0 > [ 0.403811] notify_cpu_starting+0xbc/0xdc > [ 0.403814] secondary_start_kernel+0xe0/0x170 > [ 0.403817] __secondary_switched+0x94/0x98 > [ 0.403821] ---[ end trace f68728a0d3053b70 ]--- > > GICv3 interrupt controller is emulated by Xen hypervisor, this means the > LPI configuration table and pending table allocated by Linux kernel are > only emulated by software by not accessed by hardware, so it has no risk > to introduce race condition between the secondary kernel launched by > kexec and the physical interrupt controller. And when the secondary > kernel is booting, it uses totally separate memory region from the > primary kernel, the secondary kernel can allocate its own LPI > configuration table and pending table and register them into Xen > hypervisor afterwards. > > If look into the GIC implementation, LPI serves for message-based > interrupts (MSI), it comes from ITS or directly from MSI, and at the end > forward LPI to redistributor. This means the physical LPIs are received > in Xen hypervisor (in EL2) and sets List Register for virtual CPU > interface (consumed in EL1). Furthermore, to support the emulated LPIs, > the first question is how to connect virtual GICv3 with MSI, and then > it also requires Xen to emulate the ITS and redistributor; so far, Xen > hypervisor doesn't really emulate these hardware mechanism thus the > allocated LPI tables in Linux are not used by Xen hypervisor. > > For above reasons, this patch simply skips to reserve persistent memory > for Xen domain so can mute the useless oops. > > Cc: Ard Biesheuvel > Cc: Marc Zyngier > Cc: Bertrand Marquis > Cc: Rahul Singh > Cc: Julien Grall > Cc: Mathieu Poirier > Signed-off-by: Leo Yan > --- > drivers/irqchip/irq-gic-v3-its.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index 5ff09de6c48f..9ba9984401de 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -34,6 +34,8 @@ > #include > #include > > +#include > + > #include > #include > > @@ -2220,6 +2222,21 @@ static bool gic_check_reserved_range(phys_addr_t addr, unsigned long size) > > static int gic_reserve_range(phys_addr_t addr, unsigned long size) > { > + /* > + * When kernel runs in Xen domain, it misses to invoke the EFI stub, > + * thus the memreserve table is not installed; in this case, the > + * memory cannot be reserved as persistent region. > + * > + * On the other hand, the GICv3 controller is emulated by Xen > + * hypervisor, given a redistrubitor its LPI pending table and > + * configuration table are emulated by software but not manipulated > + * by hardware. Therefore, it's not necessary to reserve them, for > + * kexec/kdump the secondary kernel can allocate new pages for these > + * two tables. > + */ > + if (xen_domain()) > + return 0; > + > if (efi_enabled(EFI_CONFIG_TABLES)) > return efi_mem_reserve_persistent(addr, size); > No, never. The driver follows the architecture to the letter, and I'm not going to pollute it because Xen is broken. M. -- Without deviation from the norm, progress is not possible.