From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9387C43460 for ; Tue, 18 May 2021 10:16:26 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4A242611AD for ; Tue, 18 May 2021 10:16:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A242611AD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=w55Qa4lrzFhh83qkinzHQ9RCCFBo1Y1g+yeeoG4IWS0=; b=OWxPOmXtjnxTKbYBs6T81WONR uHU2z1SUXqxUCMjYIVFDNCb2ASXvWSQlINTU16cODKJENT252VwtwEIbIjtL38taCqDZ7wzjkCQQB BU/x0rPsOdrLa0eDDFx79EebXvE3U5B54VmbQUPOeU40HS6ywNROb1RO/eNwaNi7kGpYpfketEAOU DF8iADXbIlwpspsQRQz3/vOVFgiB4DbTc8VHrylnWTKx6imXxteP9tbBugh9CQg67bwfchLDrpF7x +/Vcux4VAD7c4NTPtKQp23ZZLNjhghTI8Vt207OAGq4peWHhB/3OE0Y8W0pI3FEFTqRWIgaxSysrZ UOjYm6i+A==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1liwk6-000JSY-CD; Tue, 18 May 2021 10:14:26 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1liwVS-000G6U-Qx for linux-arm-kernel@desiato.infradead.org; Tue, 18 May 2021 09:59:19 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=IyW/rnBH+LHbqcKEGQ6bkV3fsstKfOqvGIGhjiKMYms=; b=Y+8yV55gQVKraFv3tEJlacBbzC Viv5U7CPSQiMFmyJlFlIpE4XzXAU21iqnaHDdc6oTyv+EcXzj+bfkAkOjB7Pw91SdwFc5k7fyrOK1 cIbMd1HLCAVq/QpZNO4Q6VEY6saB12tVW75yqIIgNzddeEhYothR1QNu51/CPEgROOjwaTgBd3dZu K2KqUB4JkxlNNqvD02KfCQw7kq9EiFulVQXk2RC8cR2y1RmG/xZxRX+ww1wP7xlZd73vAjZKvZQ6O TlJ8d9vWBmJwA+hKvIMwy2ATNgLtfgBBnDRbw0lcDIgBWFIu0xbuePfAwdCFTmJaevSxdfHRYg+Cx MpY3+b6g==; Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1liwVP-00EXH7-CQ for linux-arm-kernel@lists.infradead.org; Tue, 18 May 2021 09:59:17 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 541471FB; Tue, 18 May 2021 02:59:14 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.6.226]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 228923F719; Tue, 18 May 2021 02:59:10 -0700 (PDT) Date: Tue, 18 May 2021 10:59:04 +0100 From: Mark Rutland To: Marc Zyngier Cc: John Stultz , Catalin Marinas , Kees Cook , Will Deacon , Sami Tolvanen , linux-arm-kernel , Linux Kernel Mailing List , Bjorn Andersson , YongQin Liu , Amit Pundir , Michael Walle , Mark Brown , Masahiro Yamada Subject: Re: REGRESSION: kernel BUG at arch/arm64/kernel/alternative.c:157! Message-ID: <20210518095904.GB82842@C02TD0UTHF1T.local> References: <878s4cv20v.wl-maz@kernel.org> <20210518093211.GA82842@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210518093211.GA82842@C02TD0UTHF1T.local> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210518_025915_547154_2D70E69C X-CRM114-Status: GOOD ( 33.40 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Adding Mark Brown and Mazahiro Yamada. It looks like there's a dependency issue where assembly files don't get rebuilt when a generated header they depend upon is rebuilt, and from commit: 0c6c2d3615efb7c2 ("arm64: Generate cpucaps.h") ... we can have stale objects with old cpucap values. More detail below. On Tue, May 18, 2021 at 10:32:11AM +0100, Mark Rutland wrote: > On Tue, May 18, 2021 at 10:12:32AM +0100, Marc Zyngier wrote: > > On Mon, 17 May 2021 22:52:59 +0100, > > John Stultz wrote: > > > > > > With v5.13-rc2, I've been seeing an odd boot regression with the > > > DragonBoard 845c: > > > > > > Unfortunately, trying to bisect it down (v5.13-rc1 works ok) is giving > > > me inconsistent results so far. It feels a bit like maybe some config > > > option gets enabled moving forward, and then sticks around when we go > > > back. I'll take another swing at bisecting it later today, but I have > > > to move on to some other work right now, so I figured I'd share (with > > > folks who better know the recent __apply_alternatives changes) in case > > > folks have a better idea: > > > > > > [ 0.254384] CPU features: detected: RAS Extension Support > > > [ 0.259928] CPU: All CPU(s) started at EL1 > > > [ 0.264127] alternatives: patching kernel code > > > [ 0.268635] ------------[ cut here ]------------ > > > [ 0.273303] kernel BUG at arch/arm64/kernel/alternative.c:157! > > > [ 0.279192] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > > > [ 0.284736] Modules linked in: > > > [ 0.287833] CPU: 0 PID: 14 Comm: migration/0 Not tainted > > > 5.13.0-rc2-mainline #4501 > > > [ 0.295472] Hardware name: Thundercomm Dragonboard 845c (DT) > > > [ 0.301182] Stopper: multi_cpu_stop+0x0/0x1a0 <- > > > stop_machine_cpuslocked+0x128/0x160 > > > [ 0.309020] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO BTYPE=--) > > > [ 0.315086] pc : __apply_alternatives+0x1f0/0x270 > > > [ 0.319847] lr : __apply_alternatives+0xf4/0x270 > > > [ 0.324515] sp : ffffffc01020bca0 > > > [ 0.327874] x29: ffffffc01020bca0 x28: 00000000000000a0 x27: ffffffd7f5c11124 > > > [ 0.335086] x26: ffffffd7f5c11128 x25: 00000000001b0020 x24: ffffffd7f700ab90 > > > [ 0.342297] x23: 0000000000000000 x22: ffffffc01020bd20 x21: ffffffd7f7bea374 > > > [ 0.349508] x20: ffffffc01020bd30 x19: ffffffd7f72194fc x18: ffffffffffffffff > > > [ 0.356718] x17: ffffffd7f7bdce40 x16: 000000005c8e1b43 x15: ffffffd7f76d9d10 > > > [ 0.363929] x14: ffffffc09020b967 x13: ffffffc01020b975 x12: ffffffd7f76d9e30 > > > [ 0.371140] x11: 0000000005f5e0ff x10: ffffffc01020b8c0 x9 : 00000000ffffffd0 > > > [ 0.378350] x8 : 6b20676e69686374 x7 : ffffffd7f79b9238 x6 : c0000000ffff7fff > > > [ 0.385560] x5 : 0000000000000000 x4 : ffffffd7f5c22898 x3 : 0000000000000010 > > > [ 0.392771] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 000000000000003f > > > [ 0.399982] Call trace: > > > [ 0.402461] __apply_alternatives+0x1f0/0x270 > > > [ 0.406873] __apply_alternatives_multi_stop+0xc0/0xe0 > > > [ 0.412062] multi_cpu_stop+0xb8/0x1a0 > > > [ 0.415851] cpu_stopper_thread+0xac/0x120 > > > [ 0.419997] smpboot_thread_fn+0x200/0x238 > > > [ 0.424146] kthread+0x14c/0x158 > > > [ 0.427423] ret_from_fork+0x10/0x1c > > > [ 0.431045] Code: 39402e61 39402a62 6b01005f 54fff500 (d4210000) > > > [ 0.437199] ---[ end trace 523e13d9d60a992d ]--- > > > [ 0.441868] note: migration/0[14] exited with preempt_count 2 > > > [ 0.447739] migration/0 (14) used greatest stack depth: 12448 bytes left > > > > [/me digs in my IRC logs] > > > > This looks a lot like an issue that was reported my Michael Walle a > > few days ago on IRC, leading to a crash that looked like this: > > > > [ 0.325238] alternatives: patching kernel code > > [ 0.329735] ------------[ cut here ]------------ > > [ 0.334394] kernel BUG at arch/arm64/kernel/alternative.c:157! > > [ 0.340300] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > > [ 0.345836] Modules linked in: > > [ 0.348916] CPU: 0 PID: 14 Comm: migration/0 Not tainted 5.13.0-rc1-next-20210511+ #536 > > [ 0.356998] Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT) > > [ 0.365339] Stopper: multi_cpu_stop+0x0/0x1a8 <- stop_cpus.constprop.9+0x78/0xc8 > > [ 0.372820] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--) > > [ 0.378882] pc : __apply_alternatives.isra.1+0x1c4/0x270 > > [ 0.384246] lr : __apply_alternatives.isra.1+0x110/0x270 > > [ 0.389606] sp : ffff800012db3ca0 > > [ 0.392946] x29: ffff800012db3ca0 x28: 0000000000000000 x27: ffff800010011924 > > [ 0.400155] x26: ffff800010011928 x25: 00000000001b0020 x24: ffff8000115ad350 > > [ 0.407364] x23: ffff800012db3d28 x22: 0000000000000000 x21: ffff800011fb24cd > > [ 0.414571] x20: ffff800012db3d30 x19: ffff800011840b38 x18: 0000000000000010 > > [ 0.421779] x17: 0000000044a56c23 x16: 0000000000000002 x15: ffffffffffffffff > > [ 0.428986] x14: ffff800011d50a48 x13: ffff800092db3987 x12: ffff800011de6a70 > > [ 0.436193] x11: 0000000000000003 x10: ffff800011dcea30 x9 : ffff8000105d8928 > > [ 0.443401] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001 > > [ 0.450608] x5 : 0000000000000000 x4 : ffff800010024398 x3 : 0000000000000010 > > [ 0.457815] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 000000000000003f > > [ 0.465022] Call trace: > > [ 0.467483] __apply_alternatives.isra.1+0x1c4/0x270 > > [ 0.472493] __apply_alternatives_multi_stop+0xcc/0xe0 > > [ 0.477679] multi_cpu_stop+0xac/0x1a8 > > [ 0.481460] cpu_stopper_thread+0xa4/0x138 > > [ 0.485592] smpboot_thread_fn+0x12c/0x268 > > [ 0.489725] kthread+0x164/0x168 > > [ 0.492980] ret_from_fork+0x10/0x30 > > [ 0.496588] Code: 39402e61 39402a62 6b01005f 54fff6a0 (d4210000) > > [ 0.502742] ---[ end trace 24ef7d65759ab825 ]--- > > [ 0.507398] note: migration/0[14] exited with preempt_count 2 > > [ 0.513290] ------------[ cut here ]------------ > > > > Michael subsequently reported that: > > > > > > mhh nevermind, I can't reproduce it anymore. Maybe I should have > > recompiled with a clean build dir at first > > > > > > My gut feeling is that we can end-up with some build leftovers when > > going between -rc1 and -rc2, hence the screw-up when the capabilities > > get reordered. Dependency issues? > > I've just reproduced this, and I'm dissecting what I have. It looks like > something goes wrong when moving from v5,13-rc1 to v5.13-rc2. I hacked some debug in and got: [ 0.362573] alternatives: patching kernel code [ 0.363856] alternatives: Bad alt region at 0xffffd98efe90f4c8 [ 0.365510] alternatives: alt->cpufeature = 63 [ 0.366829] alternatives: alt->orig_len = 4 [ 0.368155] alternatives: alt->alt_len = 0 [ 0.369377] alternatives: ptr = 0xffffd98efd211168, el0_sync_invalid+0x100/0x1a4 The alt->cpufeature value is bad, since ARM64_NCAPS is 61 since commit: 0c6c2d3615efb7c2 ("arm64: Generate cpucaps.h") ... so it looks like we're missing a dependency on the generated header, and are not rebuilding arch/arm64/kernel/entry.S. If I forcefully rebuild entry.S, I instead get: [ 0.303521] alternatives: patching kernel code [ 0.304689] alternatives: Bad alt region at 0xffffd4e7567110ac [ 0.306177] alternatives: alt->cpufeature = 63 [ 0.307302] alternatives: alt->orig_len = 16 [ 0.308440] alternatives: alt->alt_len = 0 [ 0.309540] alternatives: ptr = 0xffffd4e755073808, __bp_harden_hyp_vecs+0x808/0x1810 ... and so on, though I'm eventually left with: [ 0.356180] alternatives: Bad alt region at 0xffffc4db559116f4 [ 0.357621] alternatives: alt->cpufeature = 63 [ 0.358739] alternatives: alt->orig_len = 16 [ 0.359899] alternatives: alt->alt_len = 0 [ 0.361009] alternatives: ptr = 0xffffc4db550a20d8, el1_error_invalid+0x27a4/0xcb94 ... and that doesn't seem to exist: [mark@lakrids:~/src/linux]% ./scripts/faddr2line vmlinux el1_error_invalid+0x27a4/0xcb94 skipping el1_error_invalid address at 0xffff800010014028 due to size mismatch (0xcb94 != 0xbc) skipping el1_error_invalid address at 0xffff800010ea20d8 due to size mismatch (0xcb94 != 0x6cc) skipping el1_error_invalid address at 0xffff800010eb08a8 due to size mismatch (0xcb94 != 0x6fc) no match for el1_error_invalid+0x27a4/0xcb94 ... so there might be another issue here (kallsyms?). Thanks, Mark. > > Reproduction steps below. Note `usekorg` is my script to run a specific > build of the kernel.org crosstool binaries. > > $ git clean -fdx > $ git checkout v5.13-rc1 > $ usekorg 10.1.0 make ARCH=arm64 CROSS_COMPILE=aarch64-linux- defconfig > $ usekorg 10.1.0 make ARCH=arm64 CROSS_COMPILE=aarch64-linux- Image -j50 > $ git checkout v5.13-rc2 > $ usekorg 10.1.0 make ARCH=arm64 CROSS_COMPILE=aarch64-linux- Image -j50 > > ... then when I run the resulting Image in a KVM guest on ThunderX2, I > get a splat at boot: > > [ 0.437023] ------------[ cut here ]------------ > [ 0.438314] kernel BUG at arch/arm64/kernel/alternative.c:157! > [ 0.439970] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > [ 0.441428] Modules linked in: > [ 0.442293] CPU: 0 PID: 12 Comm: migration/0 Not tainted 5.13.0-rc2 #2 > [ 0.444123] Hardware name: linux,dummy-virt (DT) > [ 0.445351] Stopper: multi_cpu_stop+0x0/0x17c <- stop_machine_cpuslocked+0x11c/0x16c > [ 0.447542] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO BTYPE=--) > [ 0.449155] pc : __apply_alternatives+0x210/0x250 > [ 0.450473] lr : __apply_alternatives+0xf8/0x250 > [ 0.451800] sp : ffff8000100abcb0 > [ 0.452718] x29: ffff8000100abcb0 x28: ffffa8930f011128 x27: ffffa8931070f4c8 > [ 0.454836] x26: ffffa8930f01112c x25: 00000000001b0020 x24: ffffa89310470650 > [ 0.456854] x23: 0000000000000000 x22: ffffa89310f5ea84 x21: ffff8000100abd30 > [ 0.458957] x20: ffff8000100abd40 x19: ffffa8931070f4cc x18: 0000000000000030 > [ 0.460944] x17: ffff16ea7efee940 x16: 0000000000000068 x15: ffff16e94016a050 > [ 0.463142] x14: ffffffffffffffff x13: ffffa89310ca29d8 x12: 0000000000000135 > [ 0.465157] x11: 0000000000000067 x10: ffffa89310cfa9d8 x9 : 00000000fffff000 > [ 0.467292] x8 : ffffa89310ca29d8 x7 : ffffa89310cfa9d8 x6 : 0000000000000000 > [ 0.469273] x5 : 0000000000000000 x4 : 000000000000003f x3 : ffffffffffffffc0 > [ 0.471471] x2 : 0000000000000023 x1 : 0000000000000004 x0 : 0000000000000000 > [ 0.473532] Call trace: > [ 0.474216] __apply_alternatives+0x210/0x250 > [ 0.475426] __apply_alternatives_multi_stop+0xc0/0xd4 > [ 0.476994] multi_cpu_stop+0xa8/0x17c > [ 0.478052] cpu_stopper_thread+0x9c/0x130 > [ 0.479235] smpboot_thread_fn+0x254/0x280 > [ 0.480434] kthread+0x158/0x160 > [ 0.481395] ret_from_fork+0x10/0x30 > [ 0.482449] Code: 8b040264 d63f0080 f94006a1 17ffffc9 (d4210000) > [ 0.484144] ---[ end trace f570e0f98f46a6c3 ]--- > > Thanks, > Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel