* Re: next/master bisection: baseline.login on rk3288-rock2-square [not found] <601b773a.1c69fb81.9f381.a32a@mx.google.com> @ 2021-02-04 8:43 ` Guillaume Tucker 2021-02-04 9:07 ` Ard Biesheuvel 0 siblings, 1 reply; 28+ messages in thread From: Guillaume Tucker @ 2021-02-04 8:43 UTC (permalink / raw) To: Russell King, Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, linux-kernel, Russell King, linux-arm-kernel Hi Ard, Please see the bisection report below about a boot failure on rk3288 with next-20210203. It was also bisected on imx6q-var-dt6customboard with next-20210202. Reports aren't automatically sent to the public while we're trialing new bisection features on kernelci.org but this one looks valid. The kernel is most likely crashing very early on, so there's nothing in the logs. Please let us know if you need some help with debugging or trying a fix on these platforms. Best wishes, Guillaume On 04/02/2021 04:25, KernelCI bot wrote: > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > * This automated bisection report was sent to you on the basis * > * that you may be involved with the breaking commit it has * > * found. No manual investigation has been done to verify it, * > * and the root cause of the problem may be somewhere else. * > * * > * If you do send a fix, please include this trailer: * > * Reported-by: "kernelci.org bot" <bot@kernelci.org> * > * * > * Hope this helps! * > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > > next/master bisection: baseline.login on rk3288-rock2-square > > Summary: > Start: 58b6c0e507b7 Add linux-next specific files for 20210203 > Plain log: https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.txt > HTML log: https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.html > Result: 5a29552af92d ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 > > Checks: > revert: PASS > verify: PASS > > Parameters: > Tree: next > URL: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > Branch: master > Target: rk3288-rock2-square > CPU arch: arm > Lab: lab-collabora > Compiler: clang-11 > Config: multi_v7_defconfig > Test case: baseline.login > > Breaking commit found: > > ------------------------------------------------------------------------------- > commit 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2 > Author: Ard Biesheuvel <ardb@kernel.org> > Date: Sun Jan 24 18:03:45 2021 +0100 > > ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 > > Commit 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance > for v7 cores") replaced the by-set/way cache maintenance in the decompressor > with by-VA cache maintenance, which is more appropriate for the task at > hand, especially under virtualization on hosts with non-architected system > caches that are not affected by by-set/way maintenance at all. > > On such systems, that commit inadvertently removed the cache clean and > invalidate of all of the guest's memory that is performed by KVM on behalf > of the guest after its MMU is disabled (but only if any by-set/way cache > maintenance instructions were issued first). This resulted in various > erroneous behaviors observed by Russell, all involving the mini-stack > used by the core kernel's v7 boot code, and which resides in BSS. It > seems intractable to figure out exactly what goes wrong in each of these > cases, but some small experiments did suggest that the lack of a cache > clean and invalidate *after* disabling the MMU and caches is what > triggers the errors, presumably because cachelines are being allocated > or reallocated while the first cache clean and invalidate is in progress. > > To ensure that no cache lines cover any of the data that is accessed by > the booting kernel with the MMU off, include the uncompressed kernel's > BSS region in the cache clean operation. > > Also, to ensure that no cachelines are allocated while the cache is being > cleaned, perform the cache clean operation *after* disabling the MMU and > caches when running on v7 or later, by making a tail call to the clean > routine from the cache_off routine. This requires passing the VA range > to cache_off(), which means some care needs to be taken to preserve > R0 and R1 across the call to cache_off(). > > Since this makes the first cache clean redundant, call it with the > range reduced to zero. This only affects v7, as all other versions > ignore R0/R1 entirely. > > Link: https://lore.kernel.org/linux-arm-kernel/20210122152012.30075-1-ardb@kernel.org > > Fixes: 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance for v7 cores") > Reported-by: Russell King <rmk+kernel@armlinux.org.uk> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S > index d9cce7238a36..5f231b6f0d1a 100644 > --- a/arch/arm/boot/compressed/head.S > +++ b/arch/arm/boot/compressed/head.S > @@ -609,11 +609,24 @@ not_relocated: mov r0, #0 > mov r3, r7 > bl decompress_kernel > > + @ > + @ Perform a cache clean before disabling the MMU entirely. > + @ In cases where the MMU needs to be disabled first (v7+), > + @ the clean is performed again by cache_off(), using by-VA > + @ operations on the range [R0, R1], making this prior call to > + @ cache_clean_flush() redundant. In other cases, the clean is > + @ performed by set/way and R0/R1 are ignored. > + @ > + mov r0, #0 > + mov r1, #0 > + bl cache_clean_flush > + > get_inflated_image_size r1, r2, r3 > + ldr r2, =_kernel_bss_size > + add r1, r1, r2 > > - mov r0, r4 @ start of inflated image > - add r1, r1, r0 @ end of inflated image > - bl cache_clean_flush > + mov r0, r4 @ start of decompressed kernel > + add r1, r1, r0 @ end of kernel BSS > bl cache_off > > #ifdef CONFIG_ARM_VIRT_EXT > @@ -1124,12 +1137,14 @@ proc_types: > * reading the control register, but ARMv4 does. > * > * On exit, > - * r0, r1, r2, r3, r9, r12 corrupted > + * r0, r1, r2, r3, r9, r10, r11, r12 corrupted > * This routine must preserve: > * r4, r7, r8 > */ > .align 5 > cache_off: mov r3, #12 @ cache_off function > + mov r10, r0 > + mov r11, r1 > b call_cache_fn > > __armv4_mpu_cache_off: > @@ -1176,7 +1191,9 @@ __armv7_mmu_cache_off: > mcr p15, 0, r0, c7, c5, 6 @ invalidate BTC > mcr p15, 0, r0, c7, c10, 4 @ DSB > mcr p15, 0, r0, c7, c5, 4 @ ISB > - mov pc, lr > + > + mov r0, r10 > + b __armv7_mmu_cache_flush > > /* > * Clean and flush the cache to maintain consistency. > ------------------------------------------------------------------------------- > > > Git bisection log: > > ------------------------------------------------------------------------------- > git bisect start > # good: [62c31574cdb770c78f67e7aa6e0b0244ad122901] Merge tag 'imx-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes > git bisect good 62c31574cdb770c78f67e7aa6e0b0244ad122901 > # bad: [58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6] Add linux-next specific files for 20210203 > git bisect bad 58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6 > # bad: [18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2] Merge remote-tracking branch 'net-next/master' > git bisect bad 18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2 > # bad: [58d92989a8d24b6aaaabee52624d891b5103e04a] Merge remote-tracking branch 'parisc-hd/for-next' > git bisect bad 58d92989a8d24b6aaaabee52624d891b5103e04a > # bad: [b0b5c935b4dcf824ef30f6ddf719b49f729c2795] Merge remote-tracking branch 'sound-current/for-linus' > git bisect bad b0b5c935b4dcf824ef30f6ddf719b49f729c2795 > # good: [d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct page for holes in memory layout > git bisect good d3921cb8be29ce5668c64e23ffdaeec5f8c69399 > # good: [c64396cc36c6e60704ab06c1fb1c4a46179c9120] Merge tag 'locking-urgent-2021-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good c64396cc36c6e60704ab06c1fb1c4a46179c9120 > # good: [2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a] Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block > git bisect good 2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a > # good: [88bb507a74ea7d75fa49edd421eaa710a7d80598] Merge tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media > git bisect good 88bb507a74ea7d75fa49edd421eaa710a7d80598 > # good: [2e02677e961fd4b96d8cf106b5979e6a3cdb7362] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost > git bisect good 2e02677e961fd4b96d8cf106b5979e6a3cdb7362 > # bad: [d3aa3465622d6d96645611b331312b773806d1a7] Merge remote-tracking branch 'arm64-fixes/for-next/fixes' > git bisect bad d3aa3465622d6d96645611b331312b773806d1a7 > # good: [245a7d47066ac0a266004110bd4d57d0d1329823] scripts: switch some more scripts explicitly to Python 3 > git bisect good 245a7d47066ac0a266004110bd4d57d0d1329823 > # bad: [199a427c3a3da01c5db4784a75b37251e7befa64] ARM: ensure the signal page contains defined contents > git bisect bad 199a427c3a3da01c5db4784a75b37251e7befa64 > # good: [538eea5362a1179dfa7770dd2b6607dc30cc50c6] ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor > git bisect good 538eea5362a1179dfa7770dd2b6607dc30cc50c6 > # bad: [d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956] ARM: decompressor: tidy up register usage > git bisect bad d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956 > # bad: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 > git bisect bad 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2 > # first bad commit: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 > ------------------------------------------------------------------------------- > > > -=-=-=-=-=-=-=-=-=-=-=- > Groups.io Links: You receive all messages sent to this group. > View/Reply Online (#6431): https://groups.io/g/kernelci-results/message/6431 > Mute This Topic: https://groups.io/mt/80373377/924702 > Group Owner: kernelci-results+owner@groups.io > Unsubscribe: https://groups.io/g/kernelci-results/unsub [guillaume.tucker@collabora.com] > -=-=-=-=-=-=-=-=-=-=-=- > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 8:43 ` next/master bisection: baseline.login on rk3288-rock2-square Guillaume Tucker @ 2021-02-04 9:07 ` Ard Biesheuvel 2021-02-04 10:06 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-04 9:07 UTC (permalink / raw) To: Guillaume Tucker Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Linux Kernel Mailing List, Russell King, Russell King, Linux ARM On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > > Hi Ard, > > Please see the bisection report below about a boot failure on > rk3288 with next-20210203. It was also bisected on > imx6q-var-dt6customboard with next-20210202. > > Reports aren't automatically sent to the public while we're > trialing new bisection features on kernelci.org but this one > looks valid. > > The kernel is most likely crashing very early on, so there's > nothing in the logs. Please let us know if you need some help > with debugging or trying a fix on these platforms. > Thanks for the report. Mind trying the following fix? --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -617,8 +617,10 @@ not_relocated: mov r0, #0 @ cache_clean_flush() redundant. In other cases, the clean is @ performed by set/way and R0/R1 are ignored. @ - mov r0, #0 - mov r1, #0 + get_inflated_image_size r1, r2, r3 + + mov r0, r4 @ start of decompressed kernel + add r1, r1, r0 @ end of kernel BSS bl cache_clean_flush get_inflated_image_size r1, r2, r3 > On 04/02/2021 04:25, KernelCI bot wrote: > > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > > * This automated bisection report was sent to you on the basis * > > * that you may be involved with the breaking commit it has * > > * found. No manual investigation has been done to verify it, * > > * and the root cause of the problem may be somewhere else. * > > * * > > * If you do send a fix, please include this trailer: * > > * Reported-by: "kernelci.org bot" <bot@kernelci.org> * > > * * > > * Hope this helps! * > > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > > > > next/master bisection: baseline.login on rk3288-rock2-square > > > > Summary: > > Start: 58b6c0e507b7 Add linux-next specific files for 20210203 > > Plain log: https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.txt > > HTML log: https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.html > > Result: 5a29552af92d ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 > > > > Checks: > > revert: PASS > > verify: PASS > > > > Parameters: > > Tree: next > > URL: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > Branch: master > > Target: rk3288-rock2-square > > CPU arch: arm > > Lab: lab-collabora > > Compiler: clang-11 > > Config: multi_v7_defconfig > > Test case: baseline.login > > > > Breaking commit found: > > > > ------------------------------------------------------------------------------- > > commit 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2 > > Author: Ard Biesheuvel <ardb@kernel.org> > > Date: Sun Jan 24 18:03:45 2021 +0100 > > > > ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 > > > > Commit 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance > > for v7 cores") replaced the by-set/way cache maintenance in the decompressor > > with by-VA cache maintenance, which is more appropriate for the task at > > hand, especially under virtualization on hosts with non-architected system > > caches that are not affected by by-set/way maintenance at all. > > > > On such systems, that commit inadvertently removed the cache clean and > > invalidate of all of the guest's memory that is performed by KVM on behalf > > of the guest after its MMU is disabled (but only if any by-set/way cache > > maintenance instructions were issued first). This resulted in various > > erroneous behaviors observed by Russell, all involving the mini-stack > > used by the core kernel's v7 boot code, and which resides in BSS. It > > seems intractable to figure out exactly what goes wrong in each of these > > cases, but some small experiments did suggest that the lack of a cache > > clean and invalidate *after* disabling the MMU and caches is what > > triggers the errors, presumably because cachelines are being allocated > > or reallocated while the first cache clean and invalidate is in progress. > > > > To ensure that no cache lines cover any of the data that is accessed by > > the booting kernel with the MMU off, include the uncompressed kernel's > > BSS region in the cache clean operation. > > > > Also, to ensure that no cachelines are allocated while the cache is being > > cleaned, perform the cache clean operation *after* disabling the MMU and > > caches when running on v7 or later, by making a tail call to the clean > > routine from the cache_off routine. This requires passing the VA range > > to cache_off(), which means some care needs to be taken to preserve > > R0 and R1 across the call to cache_off(). > > > > Since this makes the first cache clean redundant, call it with the > > range reduced to zero. This only affects v7, as all other versions > > ignore R0/R1 entirely. > > > > Link: https://lore.kernel.org/linux-arm-kernel/20210122152012.30075-1-ardb@kernel.org > > > > Fixes: 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance for v7 cores") > > Reported-by: Russell King <rmk+kernel@armlinux.org.uk> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > > > > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S > > index d9cce7238a36..5f231b6f0d1a 100644 > > --- a/arch/arm/boot/compressed/head.S > > +++ b/arch/arm/boot/compressed/head.S > > @@ -609,11 +609,24 @@ not_relocated: mov r0, #0 > > mov r3, r7 > > bl decompress_kernel > > > > + @ > > + @ Perform a cache clean before disabling the MMU entirely. > > + @ In cases where the MMU needs to be disabled first (v7+), > > + @ the clean is performed again by cache_off(), using by-VA > > + @ operations on the range [R0, R1], making this prior call to > > + @ cache_clean_flush() redundant. In other cases, the clean is > > + @ performed by set/way and R0/R1 are ignored. > > + @ > > + mov r0, #0 > > + mov r1, #0 > > + bl cache_clean_flush > > + > > get_inflated_image_size r1, r2, r3 > > + ldr r2, =_kernel_bss_size > > + add r1, r1, r2 > > > > - mov r0, r4 @ start of inflated image > > - add r1, r1, r0 @ end of inflated image > > - bl cache_clean_flush > > + mov r0, r4 @ start of decompressed kernel > > + add r1, r1, r0 @ end of kernel BSS > > bl cache_off > > > > #ifdef CONFIG_ARM_VIRT_EXT > > @@ -1124,12 +1137,14 @@ proc_types: > > * reading the control register, but ARMv4 does. > > * > > * On exit, > > - * r0, r1, r2, r3, r9, r12 corrupted > > + * r0, r1, r2, r3, r9, r10, r11, r12 corrupted > > * This routine must preserve: > > * r4, r7, r8 > > */ > > .align 5 > > cache_off: mov r3, #12 @ cache_off function > > + mov r10, r0 > > + mov r11, r1 > > b call_cache_fn > > > > __armv4_mpu_cache_off: > > @@ -1176,7 +1191,9 @@ __armv7_mmu_cache_off: > > mcr p15, 0, r0, c7, c5, 6 @ invalidate BTC > > mcr p15, 0, r0, c7, c10, 4 @ DSB > > mcr p15, 0, r0, c7, c5, 4 @ ISB > > - mov pc, lr > > + > > + mov r0, r10 > > + b __armv7_mmu_cache_flush > > > > /* > > * Clean and flush the cache to maintain consistency. > > ------------------------------------------------------------------------------- > > > > > > Git bisection log: > > > > ------------------------------------------------------------------------------- > > git bisect start > > # good: [62c31574cdb770c78f67e7aa6e0b0244ad122901] Merge tag 'imx-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes > > git bisect good 62c31574cdb770c78f67e7aa6e0b0244ad122901 > > # bad: [58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6] Add linux-next specific files for 20210203 > > git bisect bad 58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6 > > # bad: [18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2] Merge remote-tracking branch 'net-next/master' > > git bisect bad 18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2 > > # bad: [58d92989a8d24b6aaaabee52624d891b5103e04a] Merge remote-tracking branch 'parisc-hd/for-next' > > git bisect bad 58d92989a8d24b6aaaabee52624d891b5103e04a > > # bad: [b0b5c935b4dcf824ef30f6ddf719b49f729c2795] Merge remote-tracking branch 'sound-current/for-linus' > > git bisect bad b0b5c935b4dcf824ef30f6ddf719b49f729c2795 > > # good: [d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct page for holes in memory layout > > git bisect good d3921cb8be29ce5668c64e23ffdaeec5f8c69399 > > # good: [c64396cc36c6e60704ab06c1fb1c4a46179c9120] Merge tag 'locking-urgent-2021-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > git bisect good c64396cc36c6e60704ab06c1fb1c4a46179c9120 > > # good: [2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a] Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block > > git bisect good 2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a > > # good: [88bb507a74ea7d75fa49edd421eaa710a7d80598] Merge tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media > > git bisect good 88bb507a74ea7d75fa49edd421eaa710a7d80598 > > # good: [2e02677e961fd4b96d8cf106b5979e6a3cdb7362] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost > > git bisect good 2e02677e961fd4b96d8cf106b5979e6a3cdb7362 > > # bad: [d3aa3465622d6d96645611b331312b773806d1a7] Merge remote-tracking branch 'arm64-fixes/for-next/fixes' > > git bisect bad d3aa3465622d6d96645611b331312b773806d1a7 > > # good: [245a7d47066ac0a266004110bd4d57d0d1329823] scripts: switch some more scripts explicitly to Python 3 > > git bisect good 245a7d47066ac0a266004110bd4d57d0d1329823 > > # bad: [199a427c3a3da01c5db4784a75b37251e7befa64] ARM: ensure the signal page contains defined contents > > git bisect bad 199a427c3a3da01c5db4784a75b37251e7befa64 > > # good: [538eea5362a1179dfa7770dd2b6607dc30cc50c6] ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor > > git bisect good 538eea5362a1179dfa7770dd2b6607dc30cc50c6 > > # bad: [d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956] ARM: decompressor: tidy up register usage > > git bisect bad d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956 > > # bad: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 > > git bisect bad 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2 > > # first bad commit: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7 > > ------------------------------------------------------------------------------- > > > > > > -=-=-=-=-=-=-=-=-=-=-=- > > Groups.io Links: You receive all messages sent to this group. > > View/Reply Online (#6431): https://groups.io/g/kernelci-results/message/6431 > > Mute This Topic: https://groups.io/mt/80373377/924702 > > Group Owner: kernelci-results+owner@groups.io > > Unsubscribe: https://groups.io/g/kernelci-results/unsub [guillaume.tucker@collabora.com] > > -=-=-=-=-=-=-=-=-=-=-=- > > > > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 9:07 ` Ard Biesheuvel @ 2021-02-04 10:06 ` Russell King - ARM Linux admin 2021-02-04 10:27 ` Ard Biesheuvel 0 siblings, 1 reply; 28+ messages in thread From: Russell King - ARM Linux admin @ 2021-02-04 10:06 UTC (permalink / raw) To: Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List, Linux ARM On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote: > On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker > <guillaume.tucker@collabora.com> wrote: > > > > Hi Ard, > > > > Please see the bisection report below about a boot failure on > > rk3288 with next-20210203. It was also bisected on > > imx6q-var-dt6customboard with next-20210202. > > > > Reports aren't automatically sent to the public while we're > > trialing new bisection features on kernelci.org but this one > > looks valid. > > > > The kernel is most likely crashing very early on, so there's > > nothing in the logs. Please let us know if you need some help > > with debugging or trying a fix on these platforms. > > > > Thanks for the report. Ard, I want to send my fixes branch today which includes your regression fix that caused this regression. As this is proving difficult to fix, I can only drop your fix from my fixes branch - and given that this seems to be problematical, I'm tempted to revert the original change at this point which should fix both of these regressions - and then we have another go at getting rid of the set/way instructions during the next cycle. Thoughts? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 10:06 ` Russell King - ARM Linux admin @ 2021-02-04 10:27 ` Ard Biesheuvel 2021-02-04 10:33 ` Guillaume Tucker 2021-02-04 10:47 ` Russell King - ARM Linux admin 0 siblings, 2 replies; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-04 10:27 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List, Linux ARM On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote: > > On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker > > <guillaume.tucker@collabora.com> wrote: > > > > > > Hi Ard, > > > > > > Please see the bisection report below about a boot failure on > > > rk3288 with next-20210203. It was also bisected on > > > imx6q-var-dt6customboard with next-20210202. > > > > > > Reports aren't automatically sent to the public while we're > > > trialing new bisection features on kernelci.org but this one > > > looks valid. > > > > > > The kernel is most likely crashing very early on, so there's > > > nothing in the logs. Please let us know if you need some help > > > with debugging or trying a fix on these platforms. > > > > > > > Thanks for the report. > > Ard, > > I want to send my fixes branch today which includes your regression > fix that caused this regression. > > As this is proving difficult to fix, I can only drop your fix from > my fixes branch - and given that this seems to be problematical, I'm > tempted to revert the original change at this point which should fix > both of these regressions - and then we have another go at getting rid > of the set/way instructions during the next cycle. > > Thoughts? > Hi Russell, If Guillaume is willing to do the experiment, and it fixes the issue, it proves that rk3288 is relying on the flush before the MMU is disabled, and so in that case, the fix is trivial, and we can just apply it. If the experiment fails (which would mean rk3288 does not tolerate the cache maintenance being performed after cache off), it is going to be hairy, and so it will definitely take more time. So in the latter case (or if Guillaume does not get back to us), I think reverting my queued fix is the only sane option. But in that case, may I suggest that we queue the revert of the original by-VA change for v5.12 so it gets lots of coverage in -next, and allows us an opportunity to come up with a proper fix in the same timeframe, and backport the revert and the subsequent fix as a pair? Otherwise, we'll end up in the situation where v5.10.x until today has by-va, v5.10.x-y has set/way, and v5.10y+ has by-va again. (I don't think we care about anything before that, given that v5.4 predates any of this) But in the end, I'm happy to go along with whatever works best for you. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 10:27 ` Ard Biesheuvel @ 2021-02-04 10:33 ` Guillaume Tucker 2021-02-04 11:32 ` Guillaume Tucker 2021-02-04 10:47 ` Russell King - ARM Linux admin 1 sibling, 1 reply; 28+ messages in thread From: Guillaume Tucker @ 2021-02-04 10:33 UTC (permalink / raw) To: Ard Biesheuvel, Russell King - ARM Linux admin Cc: Geert Uytterhoeven, kernelci-results, Nicolas Pitre, Linus Walleij, Linux Kernel Mailing List, Linux ARM On 04/02/2021 10:27, Ard Biesheuvel wrote: > On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: >> >> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote: >>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker >>> <guillaume.tucker@collabora.com> wrote: >>>> >>>> Hi Ard, >>>> >>>> Please see the bisection report below about a boot failure on >>>> rk3288 with next-20210203. It was also bisected on >>>> imx6q-var-dt6customboard with next-20210202. >>>> >>>> Reports aren't automatically sent to the public while we're >>>> trialing new bisection features on kernelci.org but this one >>>> looks valid. >>>> >>>> The kernel is most likely crashing very early on, so there's >>>> nothing in the logs. Please let us know if you need some help >>>> with debugging or trying a fix on these platforms. >>>> >>> >>> Thanks for the report. >> >> Ard, >> >> I want to send my fixes branch today which includes your regression >> fix that caused this regression. >> >> As this is proving difficult to fix, I can only drop your fix from >> my fixes branch - and given that this seems to be problematical, I'm >> tempted to revert the original change at this point which should fix >> both of these regressions - and then we have another go at getting rid >> of the set/way instructions during the next cycle. >> >> Thoughts? >> > > Hi Russell, > > If Guillaume is willing to do the experiment, and it fixes the issue, Yes, I'm running some tests with that fix now and should have some results shortly. > it proves that rk3288 is relying on the flush before the MMU is > disabled, and so in that case, the fix is trivial, and we can just > apply it. > > If the experiment fails (which would mean rk3288 does not tolerate the > cache maintenance being performed after cache off), it is going to be > hairy, and so it will definitely take more time. > > So in the latter case (or if Guillaume does not get back to us), I > think reverting my queued fix is the only sane option. But in that > case, may I suggest that we queue the revert of the original by-VA > change for v5.12 so it gets lots of coverage in -next, and allows us > an opportunity to come up with a proper fix in the same timeframe, and > backport the revert and the subsequent fix as a pair? Otherwise, we'll > end up in the situation where v5.10.x until today has by-va, v5.10.x-y > has set/way, and v5.10y+ has by-va again. (I don't think we care about > anything before that, given that v5.4 predates any of this) > > But in the end, I'm happy to go along with whatever works best for you. Thanks, Guillaume _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 10:33 ` Guillaume Tucker @ 2021-02-04 11:32 ` Guillaume Tucker 2021-02-04 11:44 ` Russell King - ARM Linux admin 2021-02-04 15:42 ` Ard Biesheuvel 0 siblings, 2 replies; 28+ messages in thread From: Guillaume Tucker @ 2021-02-04 11:32 UTC (permalink / raw) To: Ard Biesheuvel, Russell King - ARM Linux admin Cc: Geert Uytterhoeven, kernelci-results, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Linux ARM On 04/02/2021 10:33, Guillaume Tucker wrote: > On 04/02/2021 10:27, Ard Biesheuvel wrote: >> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin >> <linux@armlinux.org.uk> wrote: >>> >>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote: >>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker >>>> <guillaume.tucker@collabora.com> wrote: >>>>> >>>>> Hi Ard, >>>>> >>>>> Please see the bisection report below about a boot failure on >>>>> rk3288 with next-20210203. It was also bisected on >>>>> imx6q-var-dt6customboard with next-20210202. >>>>> >>>>> Reports aren't automatically sent to the public while we're >>>>> trialing new bisection features on kernelci.org but this one >>>>> looks valid. >>>>> >>>>> The kernel is most likely crashing very early on, so there's >>>>> nothing in the logs. Please let us know if you need some help >>>>> with debugging or trying a fix on these platforms. >>>>> >>>> >>>> Thanks for the report. >>> >>> Ard, >>> >>> I want to send my fixes branch today which includes your regression >>> fix that caused this regression. >>> >>> As this is proving difficult to fix, I can only drop your fix from >>> my fixes branch - and given that this seems to be problematical, I'm >>> tempted to revert the original change at this point which should fix >>> both of these regressions - and then we have another go at getting rid >>> of the set/way instructions during the next cycle. >>> >>> Thoughts? >>> >> >> Hi Russell, >> >> If Guillaume is willing to do the experiment, and it fixes the issue, > > Yes, I'm running some tests with that fix now and should have > some results shortly. Yes it does fix the issue: https://lava.collabora.co.uk/scheduler/job/3173819 with Ard's fix applied to this test branch: https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/ +clang +Nick It's worth mentioning that the issue only happens with kernels built with Clang. As you can see there are several other arm platforms failing with clang-11 builds but booting fine with gcc-8: https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/ Here's a sample build log: https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log Essentially: make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage I believe it should be using the GNU assembler as LLVM_IAS=1 is not defined, but there may be something more subtle about it. Thanks, Guillaume >> it proves that rk3288 is relying on the flush before the MMU is >> disabled, and so in that case, the fix is trivial, and we can just >> apply it. >> >> If the experiment fails (which would mean rk3288 does not tolerate the >> cache maintenance being performed after cache off), it is going to be >> hairy, and so it will definitely take more time. >> >> So in the latter case (or if Guillaume does not get back to us), I >> think reverting my queued fix is the only sane option. But in that >> case, may I suggest that we queue the revert of the original by-VA >> change for v5.12 so it gets lots of coverage in -next, and allows us >> an opportunity to come up with a proper fix in the same timeframe, and >> backport the revert and the subsequent fix as a pair? Otherwise, we'll >> end up in the situation where v5.10.x until today has by-va, v5.10.x-y >> has set/way, and v5.10y+ has by-va again. (I don't think we care about >> anything before that, given that v5.4 predates any of this) >> >> But in the end, I'm happy to go along with whatever works best for you. > > Thanks, > Guillaume > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 11:32 ` Guillaume Tucker @ 2021-02-04 11:44 ` Russell King - ARM Linux admin 2021-02-04 12:09 ` Ard Biesheuvel 2021-02-04 15:42 ` Ard Biesheuvel 1 sibling, 1 reply; 28+ messages in thread From: Russell King - ARM Linux admin @ 2021-02-04 11:44 UTC (permalink / raw) To: Guillaume Tucker Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Ard Biesheuvel, Linux ARM On Thu, Feb 04, 2021 at 11:32:05AM +0000, Guillaume Tucker wrote: > Yes it does fix the issue: > > https://lava.collabora.co.uk/scheduler/job/3173819 > > with Ard's fix applied to this test branch: > > https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/ > > > +clang +Nick > > It's worth mentioning that the issue only happens with kernels > built with Clang. As you can see there are several other arm > platforms failing with clang-11 builds but booting fine with > gcc-8: My gut feeling is that it isn't Clang specific - it's likely down to the exact code/data placement, how things end up during decompression, and exactly what state the cache ends up in. That certainly was the case with the original regression. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 11:44 ` Russell King - ARM Linux admin @ 2021-02-04 12:09 ` Ard Biesheuvel 0 siblings, 0 replies; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-04 12:09 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Linux ARM On Thu, 4 Feb 2021 at 12:45, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Thu, Feb 04, 2021 at 11:32:05AM +0000, Guillaume Tucker wrote: > > Yes it does fix the issue: > > > > https://lava.collabora.co.uk/scheduler/job/3173819 > > > > with Ard's fix applied to this test branch: > > > > https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/ > > > > > > +clang +Nick > > > > It's worth mentioning that the issue only happens with kernels > > built with Clang. As you can see there are several other arm > > platforms failing with clang-11 builds but booting fine with > > gcc-8: > > My gut feeling is that it isn't Clang specific - it's likely down to > the exact code/data placement, how things end up during decompression, > and exactly what state the cache ends up in. > > That certainly was the case with the original regression. > Agreed. So given that my queued fix turns this cache_clean cache_off into this cache_off cache_clean for v7 only, and considering that turning this into cache_clean cache_off cache_clean (as the diff tested by Guillaume does) fixes the reported issue, it seems like the safest option to me at this point. Reverting both patches, one of which has been in mainline since v5.7, seems unwise to me at this point in the cycle. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 11:32 ` Guillaume Tucker 2021-02-04 11:44 ` Russell King - ARM Linux admin @ 2021-02-04 15:42 ` Ard Biesheuvel 2021-02-04 15:53 ` Guillaume Tucker 1 sibling, 1 reply; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-04 15:42 UTC (permalink / raw) To: Guillaume Tucker Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin, Linux Kernel Mailing List, clang-built-linux, Linux ARM On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > > On 04/02/2021 10:33, Guillaume Tucker wrote: > > On 04/02/2021 10:27, Ard Biesheuvel wrote: > >> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin > >> <linux@armlinux.org.uk> wrote: > >>> > >>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote: > >>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker > >>>> <guillaume.tucker@collabora.com> wrote: > >>>>> > >>>>> Hi Ard, > >>>>> > >>>>> Please see the bisection report below about a boot failure on > >>>>> rk3288 with next-20210203. It was also bisected on > >>>>> imx6q-var-dt6customboard with next-20210202. > >>>>> > >>>>> Reports aren't automatically sent to the public while we're > >>>>> trialing new bisection features on kernelci.org but this one > >>>>> looks valid. > >>>>> > >>>>> The kernel is most likely crashing very early on, so there's > >>>>> nothing in the logs. Please let us know if you need some help > >>>>> with debugging or trying a fix on these platforms. > >>>>> > >>>> > >>>> Thanks for the report. > >>> > >>> Ard, > >>> > >>> I want to send my fixes branch today which includes your regression > >>> fix that caused this regression. > >>> > >>> As this is proving difficult to fix, I can only drop your fix from > >>> my fixes branch - and given that this seems to be problematical, I'm > >>> tempted to revert the original change at this point which should fix > >>> both of these regressions - and then we have another go at getting rid > >>> of the set/way instructions during the next cycle. > >>> > >>> Thoughts? > >>> > >> > >> Hi Russell, > >> > >> If Guillaume is willing to do the experiment, and it fixes the issue, > > > > Yes, I'm running some tests with that fix now and should have > > some results shortly. > > Yes it does fix the issue: > > https://lava.collabora.co.uk/scheduler/job/3173819 > > with Ard's fix applied to this test branch: > > https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/ > > > +clang +Nick > > It's worth mentioning that the issue only happens with kernels > built with Clang. As you can see there are several other arm > platforms failing with clang-11 builds but booting fine with > gcc-8: > > https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/ > > Here's a sample build log: > > https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log > > Essentially: > > make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > > I believe it should be using the GNU assembler as LLVM_IAS=1 is > not defined, but there may be something more subtle about it. > Do you have a link for a failing zImage built from multi_v7_defconfig? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 15:42 ` Ard Biesheuvel @ 2021-02-04 15:53 ` Guillaume Tucker 2021-02-04 16:01 ` Ard Biesheuvel 0 siblings, 1 reply; 28+ messages in thread From: Guillaume Tucker @ 2021-02-04 15:53 UTC (permalink / raw) To: Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin, Linux Kernel Mailing List, clang-built-linux, Linux ARM On 04/02/2021 15:42, Ard Biesheuvel wrote: > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker > <guillaume.tucker@collabora.com> wrote: >> >> On 04/02/2021 10:33, Guillaume Tucker wrote: >>> On 04/02/2021 10:27, Ard Biesheuvel wrote: >>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin >>>> <linux@armlinux.org.uk> wrote: >>>>> >>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote: >>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker >>>>>> <guillaume.tucker@collabora.com> wrote: >>>>>>> >>>>>>> Hi Ard, >>>>>>> >>>>>>> Please see the bisection report below about a boot failure on >>>>>>> rk3288 with next-20210203. It was also bisected on >>>>>>> imx6q-var-dt6customboard with next-20210202. >>>>>>> >>>>>>> Reports aren't automatically sent to the public while we're >>>>>>> trialing new bisection features on kernelci.org but this one >>>>>>> looks valid. >>>>>>> >>>>>>> The kernel is most likely crashing very early on, so there's >>>>>>> nothing in the logs. Please let us know if you need some help >>>>>>> with debugging or trying a fix on these platforms. >>>>>>> >>>>>> >>>>>> Thanks for the report. >>>>> >>>>> Ard, >>>>> >>>>> I want to send my fixes branch today which includes your regression >>>>> fix that caused this regression. >>>>> >>>>> As this is proving difficult to fix, I can only drop your fix from >>>>> my fixes branch - and given that this seems to be problematical, I'm >>>>> tempted to revert the original change at this point which should fix >>>>> both of these regressions - and then we have another go at getting rid >>>>> of the set/way instructions during the next cycle. >>>>> >>>>> Thoughts? >>>>> >>>> >>>> Hi Russell, >>>> >>>> If Guillaume is willing to do the experiment, and it fixes the issue, >>> >>> Yes, I'm running some tests with that fix now and should have >>> some results shortly. >> >> Yes it does fix the issue: >> >> https://lava.collabora.co.uk/scheduler/job/3173819 >> >> with Ard's fix applied to this test branch: >> >> https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/ >> >> >> +clang +Nick >> >> It's worth mentioning that the issue only happens with kernels >> built with Clang. As you can see there are several other arm >> platforms failing with clang-11 builds but booting fine with >> gcc-8: >> >> https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/ >> >> Here's a sample build log: >> >> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log >> >> Essentially: >> >> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage >> >> I believe it should be using the GNU assembler as LLVM_IAS=1 is >> not defined, but there may be something more subtle about it. >> > > > Do you have a link for a failing zImage built from multi_v7_defconfig? Sure, this one was built from a plain next-20210203: http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage You can also find the dtbs, modules and other things in that same directory. For the record, here's the test job that used it: https://lava.collabora.co.uk/scheduler/job/3173792 Guillaume _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 15:53 ` Guillaume Tucker @ 2021-02-04 16:01 ` Ard Biesheuvel 2021-02-04 18:06 ` Nick Desaulniers 2021-02-04 21:09 ` Guillaume Tucker 0 siblings, 2 replies; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-04 16:01 UTC (permalink / raw) To: Guillaume Tucker Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin, Linux Kernel Mailing List, clang-built-linux, Linux ARM On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > > On 04/02/2021 15:42, Ard Biesheuvel wrote: > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker > > <guillaume.tucker@collabora.com> wrote: > >> > >> On 04/02/2021 10:33, Guillaume Tucker wrote: > >>> On 04/02/2021 10:27, Ard Biesheuvel wrote: > >>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin > >>>> <linux@armlinux.org.uk> wrote: > >>>>> > >>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote: > >>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker > >>>>>> <guillaume.tucker@collabora.com> wrote: > >>>>>>> > >>>>>>> Hi Ard, > >>>>>>> > >>>>>>> Please see the bisection report below about a boot failure on > >>>>>>> rk3288 with next-20210203. It was also bisected on > >>>>>>> imx6q-var-dt6customboard with next-20210202. > >>>>>>> > >>>>>>> Reports aren't automatically sent to the public while we're > >>>>>>> trialing new bisection features on kernelci.org but this one > >>>>>>> looks valid. > >>>>>>> > >>>>>>> The kernel is most likely crashing very early on, so there's > >>>>>>> nothing in the logs. Please let us know if you need some help > >>>>>>> with debugging or trying a fix on these platforms. > >>>>>>> > >>>>>> > >>>>>> Thanks for the report. > >>>>> > >>>>> Ard, > >>>>> > >>>>> I want to send my fixes branch today which includes your regression > >>>>> fix that caused this regression. > >>>>> > >>>>> As this is proving difficult to fix, I can only drop your fix from > >>>>> my fixes branch - and given that this seems to be problematical, I'm > >>>>> tempted to revert the original change at this point which should fix > >>>>> both of these regressions - and then we have another go at getting rid > >>>>> of the set/way instructions during the next cycle. > >>>>> > >>>>> Thoughts? > >>>>> > >>>> > >>>> Hi Russell, > >>>> > >>>> If Guillaume is willing to do the experiment, and it fixes the issue, > >>> > >>> Yes, I'm running some tests with that fix now and should have > >>> some results shortly. > >> > >> Yes it does fix the issue: > >> > >> https://lava.collabora.co.uk/scheduler/job/3173819 > >> > >> with Ard's fix applied to this test branch: > >> > >> https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/ > >> > >> > >> +clang +Nick > >> > >> It's worth mentioning that the issue only happens with kernels > >> built with Clang. As you can see there are several other arm > >> platforms failing with clang-11 builds but booting fine with > >> gcc-8: > >> > >> https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/ > >> > >> Here's a sample build log: > >> > >> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log > >> > >> Essentially: > >> > >> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > >> > >> I believe it should be using the GNU assembler as LLVM_IAS=1 is > >> not defined, but there may be something more subtle about it. > >> > > > > > > Do you have a link for a failing zImage built from multi_v7_defconfig? > > Sure, this one was built from a plain next-20210203: > > http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage > > You can also find the dtbs, modules and other things in that same > directory. > > For the record, here's the test job that used it: > > https://lava.collabora.co.uk/scheduler/job/3173792 > Thanks. That zImage boots fine locally. Unfortunately, I don't have rk3288 hardware to reproduce. Could you please point me to the list of all the other platforms that failed to boot this image? To be honest, I am slightly annoyed that a change that works fine with GCC but does not work with Clang version 11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158 (where exp means experimental, I suppose) is the reason for this discussion, especially because the change is in asm code. Is it possible to build with Clang but use the GNU linker? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 16:01 ` Ard Biesheuvel @ 2021-02-04 18:06 ` Nick Desaulniers 2021-02-04 18:12 ` Nathan Chancellor 2021-02-04 21:09 ` Guillaume Tucker 1 sibling, 1 reply; 28+ messages in thread From: Nick Desaulniers @ 2021-02-04 18:06 UTC (permalink / raw) To: Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Russell King - ARM Linux admin, Linux Kernel Mailing List, clang-built-linux, Linux ARM On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker > <guillaume.tucker@collabora.com> wrote: > > > > On 04/02/2021 15:42, Ard Biesheuvel wrote: > > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker > > > <guillaume.tucker@collabora.com> wrote: > > >> > > >> Essentially: > > >> > > >> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage This command should link with BFD (and assemble with GAS; it's only using clang as the compiler. > > To be honest, I am slightly annoyed that a change that works fine with > GCC but does not work with Clang version > > 11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158 > > (where exp means experimental, I suppose) is the reason for this > discussion, especially because the change is in asm code. Is it > possible to build with Clang but use the GNU linker? rk3288 might be the last 32b ARM platform ChromeOS uses. "veyron" -- Thanks, ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 18:06 ` Nick Desaulniers @ 2021-02-04 18:12 ` Nathan Chancellor 2021-02-04 18:23 ` Nick Desaulniers 0 siblings, 1 reply; 28+ messages in thread From: Nathan Chancellor @ 2021-02-04 18:12 UTC (permalink / raw) To: Nick Desaulniers Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List, Russell King - ARM Linux admin, clang-built-linux, Ard Biesheuvel, Linux ARM On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote: > On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker > > <guillaume.tucker@collabora.com> wrote: > > > > > > On 04/02/2021 15:42, Ard Biesheuvel wrote: > > > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker > > > > <guillaume.tucker@collabora.com> wrote: > > > >> > > > >> Essentially: > > > >> > > > >> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > > This command should link with BFD (and assemble with GAS; it's only > using clang as the compiler. I think you missed the 'LLVM=1' before CC="ccache clang". That should use all of the LLVM utilities minus the integrated assembler while wrapping clang with ccache. Cheers, Nathan _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 18:12 ` Nathan Chancellor @ 2021-02-04 18:23 ` Nick Desaulniers 2021-02-04 21:31 ` Guillaume Tucker 0 siblings, 1 reply; 28+ messages in thread From: Nick Desaulniers @ 2021-02-04 18:23 UTC (permalink / raw) To: Nathan Chancellor Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List, Russell King - ARM Linux admin, clang-built-linux, Ard Biesheuvel, Linux ARM On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote: > > On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote: > > On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker > > > <guillaume.tucker@collabora.com> wrote: > > > > > > > > On 04/02/2021 15:42, Ard Biesheuvel wrote: > > > > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker > > > > > <guillaume.tucker@collabora.com> wrote: > > > > >> > > > > >> Essentially: > > > > >> > > > > >> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > > > > This command should link with BFD (and assemble with GAS; it's only > > using clang as the compiler. > > I think you missed the 'LLVM=1' before CC="ccache clang". That should > use all of the LLVM utilities minus the integrated assembler while > wrapping clang with ccache. You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should permit fallback to BFD. -- Thanks, ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 18:23 ` Nick Desaulniers @ 2021-02-04 21:31 ` Guillaume Tucker 2021-02-04 21:50 ` Russell King - ARM Linux admin 2021-02-05 8:21 ` Ard Biesheuvel 0 siblings, 2 replies; 28+ messages in thread From: Guillaume Tucker @ 2021-02-04 21:31 UTC (permalink / raw) To: Nick Desaulniers, Nathan Chancellor, Ard Biesheuvel, Russell King - ARM Linux admin Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Linux Kernel Mailing List, clang-built-linux, Ard Biesheuvel, Linux ARM On 04/02/2021 18:23, Nick Desaulniers wrote: > On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote: >> >> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote: >>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote: >>>> >>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker >>>> <guillaume.tucker@collabora.com> wrote: >>>>> >>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote: >>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker >>>>>> <guillaume.tucker@collabora.com> wrote: >>>>>>> >>>>>>> Essentially: >>>>>>> >>>>>>> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage >>> >>> This command should link with BFD (and assemble with GAS; it's only >>> using clang as the compiler. >> >> I think you missed the 'LLVM=1' before CC="ccache clang". That should >> use all of the LLVM utilities minus the integrated assembler while >> wrapping clang with ccache. > > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should > permit fallback to BFD. That was close, except we're cross-compiling with GCC for arm. So I've now built a plain next-20210203 (without Ard's fix) using this command line: make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage I'm using a modified Docker image gtucker/kernelci-build-clang-11 with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf packages added to be able to use the GNU linker. BTW I guess we should enable this kind of hybrid build setup on kernelci.org as well. Full build log + kernel binaries can be found here: https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/ And this booted fine, which confirms it's really down to how ld.lld puts together the kernel image. Does it actually solve the debate whether this is an issue to fix in the assembly code or at link time? Full test job details for the record: https://lava.collabora.co.uk/scheduler/job/3176004 Hope that helps, Guillaume _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 21:31 ` Guillaume Tucker @ 2021-02-04 21:50 ` Russell King - ARM Linux admin 2021-02-05 8:21 ` Ard Biesheuvel 1 sibling, 0 replies; 28+ messages in thread From: Russell King - ARM Linux admin @ 2021-02-04 21:50 UTC (permalink / raw) To: Guillaume Tucker Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux, Ard Biesheuvel, Linux ARM On Thu, Feb 04, 2021 at 09:31:06PM +0000, Guillaume Tucker wrote: > On 04/02/2021 18:23, Nick Desaulniers wrote: > > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should > > permit fallback to BFD. > > That was close, except we're cross-compiling with GCC for arm. > So I've now built a plain next-20210203 (without Ard's fix) using > this command line: > > make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > > I'm using a modified Docker image gtucker/kernelci-build-clang-11 > with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf > packages added to be able to use the GNU linker. BTW I guess we > should enable this kind of hybrid build setup on kernelci.org as > well. ... > And this booted fine, which confirms it's really down to how > ld.lld puts together the kernel image. Does it actually solve > the debate whether this is an issue to fix in the assembly code > or at link time? Well... as I mentioned previously, we don't really understand what is going on between the decompressor running with the caches on, turning the caches off, jumping into the decompressed kernel, and then getting to the v7 setup code. The results from various attempts at solving the problem which lead to Ard's original patch that caused your breakage were not making a whole lot of sense (I think I wrote that all up in a previous email thread, so won't repeat it here.) So, I was slightly nervous about merging Ard's fix - and your report suggested that there is indeed more going on here that we don't understand. When I was tracking down what was going on, I had this patch applied (I've had to recreate it, so may not be exactly what I had), with the DEBUG_LL stuff appropriately enabled. It may be worth applying this patch, enabling the DEBUG_LL stuff appropriately for one of your failing boards, and try booting it. You should get two strings of identical hex numbers that look something like: ffffffff480000000900040140003000000000004820071d40008090 If they're looking like instructions, for example: ee060f37e3a00080ee020f10ee020f30ee030f10e3a00903ee050f30 Then it's likely that you are seeing a very similar problem as I was without Ard's patch. If you do get instruction-like content, then you will likely find the sequence of instructions in the decompressor code. diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 28c9d32fa99a..19fa93ae282c 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -475,7 +475,39 @@ ENDPROC(cpu_pj4b_do_resume) ldr r12, [r0] add r12, r12, r0 @ the local stack stmia r12, {r1-r6, lr} @ v7_invalidate_l1 touches r0-r6 + ldr r0, [r12, #0] + bl printhex8 + ldr r0, [r12, #4] + bl printhex8 + ldr r0, [r12, #8] + bl printhex8 + ldr r0, [r12, #12] + bl printhex8 + ldr r0, [r12, #16] + bl printhex8 + ldr r0, [r12, #20] + bl printhex8 + ldr r0, [r12, #24] + bl printhex8 + mov r0, #'\n' + bl printch bl v7_invalidate_l1 + ldr r0, [r12, #0] + bl printhex8 + ldr r0, [r12, #4] + bl printhex8 + ldr r0, [r12, #8] + bl printhex8 + ldr r0, [r12, #12] + bl printhex8 + ldr r0, [r12, #16] + bl printhex8 + ldr r0, [r12, #20] + bl printhex8 + ldr r0, [r12, #24] + bl printhex8 + mov r0, #'\n' + bl printch ldmia r12, {r1-r6, lr} __v7_setup_cont: -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 21:31 ` Guillaume Tucker 2021-02-04 21:50 ` Russell King - ARM Linux admin @ 2021-02-05 8:21 ` Ard Biesheuvel 2021-02-05 12:05 ` Ard Biesheuvel 1 sibling, 1 reply; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-05 8:21 UTC (permalink / raw) To: Guillaume Tucker Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin, Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux, Linux ARM On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > > On 04/02/2021 18:23, Nick Desaulniers wrote: > > On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote: > >> > >> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote: > >>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote: > >>>> > >>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker > >>>> <guillaume.tucker@collabora.com> wrote: > >>>>> > >>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote: > >>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker > >>>>>> <guillaume.tucker@collabora.com> wrote: > >>>>>>> > >>>>>>> Essentially: > >>>>>>> > >>>>>>> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > >>> > >>> This command should link with BFD (and assemble with GAS; it's only > >>> using clang as the compiler. > >> > >> I think you missed the 'LLVM=1' before CC="ccache clang". That should > >> use all of the LLVM utilities minus the integrated assembler while > >> wrapping clang with ccache. > > > > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should > > permit fallback to BFD. > > That was close, except we're cross-compiling with GCC for arm. > So I've now built a plain next-20210203 (without Ard's fix) using > this command line: > > make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > > I'm using a modified Docker image gtucker/kernelci-build-clang-11 > with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf > packages added to be able to use the GNU linker. BTW I guess we > should enable this kind of hybrid build setup on kernelci.org as > well. > > Full build log + kernel binaries can be found here: > > https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/ > > And this booted fine, which confirms it's really down to how > ld.lld puts together the kernel image. Does it actually solve > the debate whether this is an issue to fix in the assembly code > or at link time? > > Full test job details for the record: > > https://lava.collabora.co.uk/scheduler/job/3176004 > So the issue appears to be in the way the linker generates the _kernel_bss_size symbol, which obviously has an impact, given that the queued fix takes it into account in the cache_clean operation. On GNU ld, I see 479: 00065e14 0 NOTYPE GLOBAL DEFAULT ABS _kernel_bss_size whereas n LLVM ld.lld, I see 433: c1c86e98 0 NOTYPE GLOBAL DEFAULT ABS _kernel_bss_size and adding this value may cause the cache clean to operate on unmapped addresses, or cause the addition to wrap and not perform a cache clean at all. AFAICT, this also breaks the appended DTB case in LLVM, so this needs a separate fix in any case. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-05 8:21 ` Ard Biesheuvel @ 2021-02-05 12:05 ` Ard Biesheuvel 2021-02-06 13:10 ` Guillaume Tucker 0 siblings, 1 reply; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-05 12:05 UTC (permalink / raw) To: Guillaume Tucker Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin, Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux, Linux ARM On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker > <guillaume.tucker@collabora.com> wrote: > > > > On 04/02/2021 18:23, Nick Desaulniers wrote: > > > On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote: > > >> > > >> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote: > > >>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > >>>> > > >>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker > > >>>> <guillaume.tucker@collabora.com> wrote: > > >>>>> > > >>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote: > > >>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker > > >>>>>> <guillaume.tucker@collabora.com> wrote: > > >>>>>>> > > >>>>>>> Essentially: > > >>>>>>> > > >>>>>>> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > > >>> > > >>> This command should link with BFD (and assemble with GAS; it's only > > >>> using clang as the compiler. > > >> > > >> I think you missed the 'LLVM=1' before CC="ccache clang". That should > > >> use all of the LLVM utilities minus the integrated assembler while > > >> wrapping clang with ccache. > > > > > > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should > > > permit fallback to BFD. > > > > That was close, except we're cross-compiling with GCC for arm. > > So I've now built a plain next-20210203 (without Ard's fix) using > > this command line: > > > > make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > > > > I'm using a modified Docker image gtucker/kernelci-build-clang-11 > > with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf > > packages added to be able to use the GNU linker. BTW I guess we > > should enable this kind of hybrid build setup on kernelci.org as > > well. > > > > Full build log + kernel binaries can be found here: > > > > https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/ > > > > And this booted fine, which confirms it's really down to how > > ld.lld puts together the kernel image. Does it actually solve > > the debate whether this is an issue to fix in the assembly code > > or at link time? > > > > Full test job details for the record: > > > > https://lava.collabora.co.uk/scheduler/job/3176004 > > > > > So the issue appears to be in the way the linker generates the > _kernel_bss_size symbol, which obviously has an impact, given that the > queued fix takes it into account in the cache_clean operation. > > On GNU ld, I see > > 479: 00065e14 0 NOTYPE GLOBAL DEFAULT ABS _kernel_bss_size > > whereas n LLVM ld.lld, I see > > 433: c1c86e98 0 NOTYPE GLOBAL DEFAULT ABS _kernel_bss_size > > and adding this value may cause the cache clean to operate on unmapped > addresses, or cause the addition to wrap and not perform a cache clean > at all. > > AFAICT, this also breaks the appended DTB case in LLVM, so this needs > a separate fix in any case. I pushed a combined branch of torvalds/master, rmk/fixes (still containing my 9052/1 fix) and this patch to my for-kernelci branch https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/ Guillaume, It seems there is no Clang-11 coverage there, right? Mind giving this branch a spin? If this fixes the regressions, we can get these queued up. Thanks, Ard. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-05 12:05 ` Ard Biesheuvel @ 2021-02-06 13:10 ` Guillaume Tucker 2021-02-06 13:12 ` Ard Biesheuvel 0 siblings, 1 reply; 28+ messages in thread From: Guillaume Tucker @ 2021-02-06 13:10 UTC (permalink / raw) To: Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin, Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux, Linux ARM On 05/02/2021 12:05, Ard Biesheuvel wrote: > On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote: >> >> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker >> <guillaume.tucker@collabora.com> wrote: >>> >>> On 04/02/2021 18:23, Nick Desaulniers wrote: >>>> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote: >>>>> >>>>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote: >>>>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote: >>>>>>> >>>>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker >>>>>>> <guillaume.tucker@collabora.com> wrote: >>>>>>>> >>>>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote: >>>>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker >>>>>>>>> <guillaume.tucker@collabora.com> wrote: >>>>>>>>>> >>>>>>>>>> Essentially: >>>>>>>>>> >>>>>>>>>> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage >>>>>> >>>>>> This command should link with BFD (and assemble with GAS; it's only >>>>>> using clang as the compiler. >>>>> >>>>> I think you missed the 'LLVM=1' before CC="ccache clang". That should >>>>> use all of the LLVM utilities minus the integrated assembler while >>>>> wrapping clang with ccache. >>>> >>>> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should >>>> permit fallback to BFD. >>> >>> That was close, except we're cross-compiling with GCC for arm. >>> So I've now built a plain next-20210203 (without Ard's fix) using >>> this command line: >>> >>> make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage >>> >>> I'm using a modified Docker image gtucker/kernelci-build-clang-11 >>> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf >>> packages added to be able to use the GNU linker. BTW I guess we >>> should enable this kind of hybrid build setup on kernelci.org as >>> well. >>> >>> Full build log + kernel binaries can be found here: >>> >>> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/ >>> >>> And this booted fine, which confirms it's really down to how >>> ld.lld puts together the kernel image. Does it actually solve >>> the debate whether this is an issue to fix in the assembly code >>> or at link time? >>> >>> Full test job details for the record: >>> >>> https://lava.collabora.co.uk/scheduler/job/3176004 >>> >> >> >> So the issue appears to be in the way the linker generates the >> _kernel_bss_size symbol, which obviously has an impact, given that the >> queued fix takes it into account in the cache_clean operation. >> >> On GNU ld, I see >> >> 479: 00065e14 0 NOTYPE GLOBAL DEFAULT ABS _kernel_bss_size >> >> whereas n LLVM ld.lld, I see >> >> 433: c1c86e98 0 NOTYPE GLOBAL DEFAULT ABS _kernel_bss_size >> >> and adding this value may cause the cache clean to operate on unmapped >> addresses, or cause the addition to wrap and not perform a cache clean >> at all. >> >> AFAICT, this also breaks the appended DTB case in LLVM, so this needs >> a separate fix in any case. > > I pushed a combined branch of torvalds/master, rmk/fixes (still > containing my 9052/1 fix) and this patch to my for-kernelci branch > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/ > > Guillaume, > > It seems there is no Clang-11 coverage there, right? Mind giving this > branch a spin? If this fixes the regressions, we can get these queued > up. That's right, Clang builds are only enabled on linux-next and mainline at the moment. We could enable it on any other branch where it makes sense. How about just the main defconfig for arm, arm64 and x86_64 on your ardb/for-kernelci branch? For now I've run another set of builds with clang-11 and got the following test results with your branch on staging: https://staging.kernelci.org/test/job/ardb/branch/for-kernelci/kernel/v5.11-rc6-146-g923ca344043a/plan/baseline/ which are all passing. I'll reply to the thread with your patch to confirm. Guillaume _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-06 13:10 ` Guillaume Tucker @ 2021-02-06 13:12 ` Ard Biesheuvel 0 siblings, 0 replies; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-06 13:12 UTC (permalink / raw) To: Guillaume Tucker Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin, Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux, Linux ARM On Sat, 6 Feb 2021 at 14:10, Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > > On 05/02/2021 12:05, Ard Biesheuvel wrote: > > On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote: > >> > >> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker > >> <guillaume.tucker@collabora.com> wrote: > >>> > >>> On 04/02/2021 18:23, Nick Desaulniers wrote: > >>>> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote: > >>>>> > >>>>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote: > >>>>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote: > >>>>>>> > >>>>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker > >>>>>>> <guillaume.tucker@collabora.com> wrote: > >>>>>>>> > >>>>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote: > >>>>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker > >>>>>>>>> <guillaume.tucker@collabora.com> wrote: > >>>>>>>>>> > >>>>>>>>>> Essentially: > >>>>>>>>>> > >>>>>>>>>> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > >>>>>> > >>>>>> This command should link with BFD (and assemble with GAS; it's only > >>>>>> using clang as the compiler. > >>>>> > >>>>> I think you missed the 'LLVM=1' before CC="ccache clang". That should > >>>>> use all of the LLVM utilities minus the integrated assembler while > >>>>> wrapping clang with ccache. > >>>> > >>>> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should > >>>> permit fallback to BFD. > >>> > >>> That was close, except we're cross-compiling with GCC for arm. > >>> So I've now built a plain next-20210203 (without Ard's fix) using > >>> this command line: > >>> > >>> make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage > >>> > >>> I'm using a modified Docker image gtucker/kernelci-build-clang-11 > >>> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf > >>> packages added to be able to use the GNU linker. BTW I guess we > >>> should enable this kind of hybrid build setup on kernelci.org as > >>> well. > >>> > >>> Full build log + kernel binaries can be found here: > >>> > >>> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/ > >>> > >>> And this booted fine, which confirms it's really down to how > >>> ld.lld puts together the kernel image. Does it actually solve > >>> the debate whether this is an issue to fix in the assembly code > >>> or at link time? > >>> > >>> Full test job details for the record: > >>> > >>> https://lava.collabora.co.uk/scheduler/job/3176004 > >>> > >> > >> > >> So the issue appears to be in the way the linker generates the > >> _kernel_bss_size symbol, which obviously has an impact, given that the > >> queued fix takes it into account in the cache_clean operation. > >> > >> On GNU ld, I see > >> > >> 479: 00065e14 0 NOTYPE GLOBAL DEFAULT ABS _kernel_bss_size > >> > >> whereas n LLVM ld.lld, I see > >> > >> 433: c1c86e98 0 NOTYPE GLOBAL DEFAULT ABS _kernel_bss_size > >> > >> and adding this value may cause the cache clean to operate on unmapped > >> addresses, or cause the addition to wrap and not perform a cache clean > >> at all. > >> > >> AFAICT, this also breaks the appended DTB case in LLVM, so this needs > >> a separate fix in any case. > > > > I pushed a combined branch of torvalds/master, rmk/fixes (still > > containing my 9052/1 fix) and this patch to my for-kernelci branch > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/ > > > > Guillaume, > > > > It seems there is no Clang-11 coverage there, right? Mind giving this > > branch a spin? If this fixes the regressions, we can get these queued > > up. > > That's right, Clang builds are only enabled on linux-next and > mainline at the moment. We could enable it on any other branch > where it makes sense. How about just the main defconfig for arm, > arm64 and x86_64 on your ardb/for-kernelci branch? > Yes, please. > For now I've run another set of builds with clang-11 and got the > following test results with your branch on staging: > > https://staging.kernelci.org/test/job/ardb/branch/for-kernelci/kernel/v5.11-rc6-146-g923ca344043a/plan/baseline/ > > which are all passing. > > I'll reply to the thread with your patch to confirm. > Excellent, thanks a lot. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 16:01 ` Ard Biesheuvel 2021-02-04 18:06 ` Nick Desaulniers @ 2021-02-04 21:09 ` Guillaume Tucker 1 sibling, 0 replies; 28+ messages in thread From: Guillaume Tucker @ 2021-02-04 21:09 UTC (permalink / raw) To: Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin, Linux Kernel Mailing List, clang-built-linux, Linux ARM On 04/02/2021 16:01, Ard Biesheuvel wrote: > On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker > <guillaume.tucker@collabora.com> wrote: >> >> On 04/02/2021 15:42, Ard Biesheuvel wrote: >>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker >>> <guillaume.tucker@collabora.com> wrote: >>>> >>>> On 04/02/2021 10:33, Guillaume Tucker wrote: >>>>> On 04/02/2021 10:27, Ard Biesheuvel wrote: >>>>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin >>>>>> <linux@armlinux.org.uk> wrote: >>>>>>> >>>>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote: >>>>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker >>>>>>>> <guillaume.tucker@collabora.com> wrote: >>>>>>>>> >>>>>>>>> Hi Ard, >>>>>>>>> >>>>>>>>> Please see the bisection report below about a boot failure on >>>>>>>>> rk3288 with next-20210203. It was also bisected on >>>>>>>>> imx6q-var-dt6customboard with next-20210202. >>>>>>>>> >>>>>>>>> Reports aren't automatically sent to the public while we're >>>>>>>>> trialing new bisection features on kernelci.org but this one >>>>>>>>> looks valid. >>>>>>>>> >>>>>>>>> The kernel is most likely crashing very early on, so there's >>>>>>>>> nothing in the logs. Please let us know if you need some help >>>>>>>>> with debugging or trying a fix on these platforms. >>>>>>>>> >>>>>>>> >>>>>>>> Thanks for the report. >>>>>>> >>>>>>> Ard, >>>>>>> >>>>>>> I want to send my fixes branch today which includes your regression >>>>>>> fix that caused this regression. >>>>>>> >>>>>>> As this is proving difficult to fix, I can only drop your fix from >>>>>>> my fixes branch - and given that this seems to be problematical, I'm >>>>>>> tempted to revert the original change at this point which should fix >>>>>>> both of these regressions - and then we have another go at getting rid >>>>>>> of the set/way instructions during the next cycle. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>> >>>>>> Hi Russell, >>>>>> >>>>>> If Guillaume is willing to do the experiment, and it fixes the issue, >>>>> >>>>> Yes, I'm running some tests with that fix now and should have >>>>> some results shortly. >>>> >>>> Yes it does fix the issue: >>>> >>>> https://lava.collabora.co.uk/scheduler/job/3173819 >>>> >>>> with Ard's fix applied to this test branch: >>>> >>>> https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/ >>>> >>>> >>>> +clang +Nick >>>> >>>> It's worth mentioning that the issue only happens with kernels >>>> built with Clang. As you can see there are several other arm >>>> platforms failing with clang-11 builds but booting fine with >>>> gcc-8: >>>> >>>> https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/ >>>> >>>> Here's a sample build log: >>>> >>>> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log >>>> >>>> Essentially: >>>> >>>> make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage >>>> >>>> I believe it should be using the GNU assembler as LLVM_IAS=1 is >>>> not defined, but there may be something more subtle about it. >>>> >>> >>> >>> Do you have a link for a failing zImage built from multi_v7_defconfig? >> >> Sure, this one was built from a plain next-20210203: >> >> http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage >> >> You can also find the dtbs, modules and other things in that same >> directory. >> >> For the record, here's the test job that used it: >> >> https://lava.collabora.co.uk/scheduler/job/3173792 >> > > Thanks. > > That zImage boots fine locally. Unfortunately, I don't have rk3288 > hardware to reproduce. > > Could you please point me to the list of all the other platforms that > failed to boot this image? This is the list of platforms from kernelci.org I've gathered which appeared to be impacted: imx6q-sabrelite imx6q-var-dt6customboard imx6dl-riotboard imx6qp-wandboard-revd1 imx7ulp-evk odroid-xu3 rk3288-rock2-square rk3288-veyron-jaq stm32mp157c-dk2 sun4i-a10-olinuxino-lime sun5i-a13-olinuxino-micro sun7i-a20-cubieboard2 sun7i-a20-olinuxino-lime2 sun8i-a33-olinuxino sun8i-a83t-bananapi-m3 sun8i-h2-plus-libretech-all-h3-cc sun8i-h2-plus-orangepi-r1 sun8i-h2-plus-orangepi-zero sun8i-h3-libretech-all-h3-cc sun8i-h3-bananapi-m2-plus sun8i-h3-orangepi-pc sun8i-r40-bananapi-m2-ultra They were all booting next-20210203 with gcc-8 but not with clang-11. I've run checks on a good share of them with your patch applied and they're now booting with clang-11, just like the rk3288 and imx6q platforms that were used for the bisections. > To be honest, I am slightly annoyed that a change that works fine with > GCC but does not work with Clang version > > 11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158 > > (where exp means experimental, I suppose) is the reason for this Well it's the standard one from the LLVM Debian package repo: deb http://apt.llvm.org/buster/ llvm-toolchain-buster-11 main There's a slightly newer version, I doubt it would make any difference in this respect unless there's a particular fix in ld.lld: # apt policy clang-11 clang-11: Installed: 1:11.1.0~++20210130110826+3a8282376b6c-1~exp1~20210130221445.158 Candidate: 1:11.1.0~++20210204120158+1fdec59bffc1-1~exp1~20210203230823.159 > discussion, especially because the change is in asm code. Is it > possible to build with Clang but use the GNU linker? As mentioned by Nick, it is using everything from LLVM except the assembler - so not the GNU linker. I've now built a new Docker container with the latest LLVM package version (.159) as well as gcc-8-arm-linux-gnueabihf to try with the GNU linker and see if that makes any difference. More on that shortly... Guillaume _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 10:27 ` Ard Biesheuvel 2021-02-04 10:33 ` Guillaume Tucker @ 2021-02-04 10:47 ` Russell King - ARM Linux admin 2021-02-04 10:55 ` Ard Biesheuvel 1 sibling, 1 reply; 28+ messages in thread From: Russell King - ARM Linux admin @ 2021-02-04 10:47 UTC (permalink / raw) To: Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List, Linux ARM On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote: > Hi Russell, > > If Guillaume is willing to do the experiment, and it fixes the issue, > it proves that rk3288 is relying on the flush before the MMU is > disabled, and so in that case, the fix is trivial, and we can just > apply it. > > If the experiment fails (which would mean rk3288 does not tolerate the > cache maintenance being performed after cache off), it is going to be > hairy, and so it will definitely take more time. > > So in the latter case (or if Guillaume does not get back to us), I > think reverting my queued fix is the only sane option. But in that > case, may I suggest that we queue the revert of the original by-VA > change for v5.12 so it gets lots of coverage in -next, and allows us > an opportunity to come up with a proper fix in the same timeframe, and > backport the revert and the subsequent fix as a pair? Otherwise, we'll > end up in the situation where v5.10.x until today has by-va, v5.10.x-y > has set/way, and v5.10y+ has by-va again. (I don't think we care about > anything before that, given that v5.4 predates any of this) I'm suggesting dropping your fix (9052/1) and reverting "ARM: decompressor: switch to by-VA cache maintenance for v7 cores" which gets us to a point where _both_ regressions are fixed. I'm of the opinion that the by-VA patch was incorrect when it was merged (it caused a regression), and it's only a performance improvement. Our attempts so far to fix it are just causing other regressions. So, I think it is reasonable to revert both back to a known good point which has worked over a decade. If doing so causes regressions (which I think is unlikely), then that would be unfortunate but alas is a price that's worth paying to get back to a known good point - since then we're not stacking regression fixes on top of other regression fixes. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 10:47 ` Russell King - ARM Linux admin @ 2021-02-04 10:55 ` Ard Biesheuvel 2021-02-04 12:26 ` Marc Zyngier 0 siblings, 1 reply; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-04 10:55 UTC (permalink / raw) To: Russell King - ARM Linux admin, Marc Zyngier Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List, Linux ARM (cc Marc) On Thu, 4 Feb 2021 at 11:48, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote: > > Hi Russell, > > > > If Guillaume is willing to do the experiment, and it fixes the issue, > > it proves that rk3288 is relying on the flush before the MMU is > > disabled, and so in that case, the fix is trivial, and we can just > > apply it. > > > > If the experiment fails (which would mean rk3288 does not tolerate the > > cache maintenance being performed after cache off), it is going to be > > hairy, and so it will definitely take more time. > > > > So in the latter case (or if Guillaume does not get back to us), I > > think reverting my queued fix is the only sane option. But in that > > case, may I suggest that we queue the revert of the original by-VA > > change for v5.12 so it gets lots of coverage in -next, and allows us > > an opportunity to come up with a proper fix in the same timeframe, and > > backport the revert and the subsequent fix as a pair? Otherwise, we'll > > end up in the situation where v5.10.x until today has by-va, v5.10.x-y > > has set/way, and v5.10y+ has by-va again. (I don't think we care about > > anything before that, given that v5.4 predates any of this) > > I'm suggesting dropping your fix (9052/1) and reverting > "ARM: decompressor: switch to by-VA cache maintenance for v7 cores" > which gets us to a point where _both_ regressions are fixed. > I understand, but we don't know whether doing so might regress other platforms that were added in the mean time. > I'm of the opinion that the by-VA patch was incorrect when it was > merged (it caused a regression), and it's only a performance > improvement. It is a correctness improvement, not a performance improvement. Without that change, the 32-bit ARM kernel cannot boot bare metal on platforms with a system cache such as 8040 or SynQuacer, and can only boot under KVM on such systems because of the special handling of set/way instructions by the host. The performance issue related to set/way ops under KVM was already fixed by describing data and unified caches as 1 set and 1 way when running in 32-bit mode. > Our attempts so far to fix it are just causing other > regressions. So, I think it is reasonable to revert both back to a > known good point which has worked over a decade. If doing so causes > regressions (which I think is unlikely), then that would be unfortunate > but alas is a price that's worth paying to get back to a known good > point - since then we're not stacking regression fixes on top of other > regression fixes. > This is exactly why I am proposing to queue the revert of the original patch for v5.12, and only backport it to v5.10 and v5.11 once we are sure it does not break anything else. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 10:55 ` Ard Biesheuvel @ 2021-02-04 12:26 ` Marc Zyngier 2021-02-04 14:09 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 28+ messages in thread From: Marc Zyngier @ 2021-02-04 12:26 UTC (permalink / raw) To: Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Russell King - ARM Linux admin, Linux Kernel Mailing List, Linux ARM On 2021-02-04 10:55, Ard Biesheuvel wrote: > (cc Marc) > > On Thu, 4 Feb 2021 at 11:48, Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: >> >> On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote: >> > Hi Russell, >> > >> > If Guillaume is willing to do the experiment, and it fixes the issue, >> > it proves that rk3288 is relying on the flush before the MMU is >> > disabled, and so in that case, the fix is trivial, and we can just >> > apply it. >> > >> > If the experiment fails (which would mean rk3288 does not tolerate the >> > cache maintenance being performed after cache off), it is going to be >> > hairy, and so it will definitely take more time. >> > >> > So in the latter case (or if Guillaume does not get back to us), I >> > think reverting my queued fix is the only sane option. But in that >> > case, may I suggest that we queue the revert of the original by-VA >> > change for v5.12 so it gets lots of coverage in -next, and allows us >> > an opportunity to come up with a proper fix in the same timeframe, and >> > backport the revert and the subsequent fix as a pair? Otherwise, we'll >> > end up in the situation where v5.10.x until today has by-va, v5.10.x-y >> > has set/way, and v5.10y+ has by-va again. (I don't think we care about >> > anything before that, given that v5.4 predates any of this) >> >> I'm suggesting dropping your fix (9052/1) and reverting >> "ARM: decompressor: switch to by-VA cache maintenance for v7 cores" >> which gets us to a point where _both_ regressions are fixed. >> > > I understand, but we don't know whether doing so might regress other > platforms that were added in the mean time. > >> I'm of the opinion that the by-VA patch was incorrect when it was >> merged (it caused a regression), and it's only a performance >> improvement. > > It is a correctness improvement, not a performance improvement. > > Without that change, the 32-bit ARM kernel cannot boot bare metal on > platforms with a system cache such as 8040 or SynQuacer, and can only > boot under KVM on such systems because of the special handling of > set/way instructions by the host. I agree. With set/way CMOs, there is no way to reach the PoC if it beyond the system cache, leading to an unbootable kernel. This is actually pretty well documented in the architecture, and it did bite us for the first time on XGene-1, 7 years ago. In retrospect, having KVM to handle set/way CMOs in was a mistake, as it just papered over the problem for the sake of running older 32bit guests. It violated the principle of KVM/arm being strictly architectural and provided unrealistic expectations. I'll take the blame for this. Thanks, M. -- Jazz is not dead. It just smells funny... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 12:26 ` Marc Zyngier @ 2021-02-04 14:09 ` Russell King - ARM Linux admin 2021-02-04 14:25 ` Ard Biesheuvel 0 siblings, 1 reply; 28+ messages in thread From: Russell King - ARM Linux admin @ 2021-02-04 14:09 UTC (permalink / raw) To: Marc Zyngier Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List, Ard Biesheuvel, Linux ARM On Thu, Feb 04, 2021 at 12:26:44PM +0000, Marc Zyngier wrote: > I agree. With set/way CMOs, there is no way to reach the PoC if > it beyond the system cache, leading to an unbootable kernel. > This is actually pretty well documented in the architecture, > and it did bite us for the first time on XGene-1, 7 years ago. That may be, however we still do set/way maintenance to invalidate the L1 cache as that is required for ARMv7 to place the cache into a known state, as stated by the architecture reference manual. Arguably, that should be done by firmware, but when starting secondary CPUs, there are platforms out there which do not bring the L1 cache to a defined state. So we are pretty much stuck with doing set/way operations during CPU initialisation in the main kernel. If ARMv8 decides that this is not supportable, then that's a matter for ARMv8 to address without impacting the requirements of ARMv7. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 14:09 ` Russell King - ARM Linux admin @ 2021-02-04 14:25 ` Ard Biesheuvel 2021-02-04 14:36 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-04 14:25 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Marc Zyngier, Linus Walleij, Linux Kernel Mailing List, Guillaume Tucker, Linux ARM On Thu, 4 Feb 2021 at 15:09, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Thu, Feb 04, 2021 at 12:26:44PM +0000, Marc Zyngier wrote: > > I agree. With set/way CMOs, there is no way to reach the PoC if > > it beyond the system cache, leading to an unbootable kernel. > > This is actually pretty well documented in the architecture, > > and it did bite us for the first time on XGene-1, 7 years ago. > > That may be, however we still do set/way maintenance to invalidate > the L1 cache as that is required for ARMv7 to place the cache into > a known state, as stated by the architecture reference manual. > Getting a certain cache at a certain level into a known state is a valid use of set/way ops, and is simply unnecessary when running under virtualization, but doesn't do any harm. Pushing contents of the cache hierarchy to main memory is *not* a valid use of set/way ops, and so there is no point in pretending that set/way ops will produce the same results as by-VA ops. Only the by-VA ops give the architectural guarantees that we rely on for correctness. > Arguably, that should be done by firmware, but when starting > secondary CPUs, there are platforms out there which do not bring > the L1 cache to a defined state. So we are pretty much stuck with > doing set/way operations during CPU initialisation in the main > kernel. > Indeed. And this is unfortunate, but not the end of the world. > If ARMv8 decides that this is not supportable, then that's a matter > for ARMv8 to address without impacting the requirements of ARMv7. > I'm not sure what you mean here. The v7 architecture is crystal clear about the difference between set/way ops (managing a single cache), and by-VA ops (managing the 'cachedness' state of a memory region). The semantics are radically different, regardless of v7 vs v8 or AArch32 vs AArch64. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 14:25 ` Ard Biesheuvel @ 2021-02-04 14:36 ` Russell King - ARM Linux admin 2021-02-04 15:52 ` Ard Biesheuvel 0 siblings, 1 reply; 28+ messages in thread From: Russell King - ARM Linux admin @ 2021-02-04 14:36 UTC (permalink / raw) To: Ard Biesheuvel Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Marc Zyngier, Linus Walleij, Linux Kernel Mailing List, Guillaume Tucker, Linux ARM On Thu, Feb 04, 2021 at 03:25:20PM +0100, Ard Biesheuvel wrote: > Pushing contents of the cache hierarchy to main memory is *not* a > valid use of set/way ops, and so there is no point in pretending that > set/way ops will produce the same results as by-VA ops. Only the by-VA > ops give the architectural guarantees that we rely on for correctness. ... yet we /were/ doing that, and it worked fine for 13 years - from 1st June 2007 until the by-VA merge into mainline on the 3rd April 2020. You may be right that it wasn't the most correct way, but it worked for those 13 years without issue, and it's only recently that it's become a problem, and trying to "fix" that introduced a regression, and fixing that regression has caused another regression... and I what I'm wondering is how many more regression fixing cycles it's going to take - how many regression fixes on top of other regression fixes are we going to end up seeing here. The fact is, we never properly understood why your patch caused the regression I was seeing. If we don't understand it, then we can never say that we've fixed the problem properly. That is highlighted by the fact that fixing the regression I was seeing has caused another regression. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: next/master bisection: baseline.login on rk3288-rock2-square 2021-02-04 14:36 ` Russell King - ARM Linux admin @ 2021-02-04 15:52 ` Ard Biesheuvel 0 siblings, 0 replies; 28+ messages in thread From: Ard Biesheuvel @ 2021-02-04 15:52 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre, Marc Zyngier, Linus Walleij, Linux Kernel Mailing List, Guillaume Tucker, Linux ARM On Thu, 4 Feb 2021 at 15:36, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Thu, Feb 04, 2021 at 03:25:20PM +0100, Ard Biesheuvel wrote: > > Pushing contents of the cache hierarchy to main memory is *not* a > > valid use of set/way ops, and so there is no point in pretending that > > set/way ops will produce the same results as by-VA ops. Only the by-VA > > ops give the architectural guarantees that we rely on for correctness. > > ... yet we /were/ doing that, and it worked fine for 13 years - from > 1st June 2007 until the by-VA merge into mainline on the 3rd April > 2020. > > You may be right that it wasn't the most correct way, but it worked > for those 13 years without issue, and it's only recently that it's > become a problem, and trying to "fix" that introduced a regression, > and fixing that regression has caused another regression... and I > what I'm wondering is how many more regression fixing cycles it's > going to take - how many regression fixes on top of other regression > fixes are we going to end up seeing here. > > The fact is, we never properly understood why your patch caused the > regression I was seeing. If we don't understand it, then we can never > say that we've fixed the problem properly. That is highlighted by the > fact that fixing the regression I was seeing has caused another > regression. > I agree with all these points. But as I pointed out, reverting the original by-VA change, which has been there for almost a year now, has some risks of its own. If we don't understand the details of how this is broken, how can we be sure we don't break something else if we revert it again? So I'm not arguing that reverting the original patch is unreasonable, just that doing so at this point in the cycle is risky, and that it would be better to queue the revert for v5.12, and only backport it after some soak time in -next. And in a sense, reinstating the cache_clean() before cache_off() already amounts to a partial revert of the queued fix. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2021-02-06 13:14 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <601b773a.1c69fb81.9f381.a32a@mx.google.com> 2021-02-04 8:43 ` next/master bisection: baseline.login on rk3288-rock2-square Guillaume Tucker 2021-02-04 9:07 ` Ard Biesheuvel 2021-02-04 10:06 ` Russell King - ARM Linux admin 2021-02-04 10:27 ` Ard Biesheuvel 2021-02-04 10:33 ` Guillaume Tucker 2021-02-04 11:32 ` Guillaume Tucker 2021-02-04 11:44 ` Russell King - ARM Linux admin 2021-02-04 12:09 ` Ard Biesheuvel 2021-02-04 15:42 ` Ard Biesheuvel 2021-02-04 15:53 ` Guillaume Tucker 2021-02-04 16:01 ` Ard Biesheuvel 2021-02-04 18:06 ` Nick Desaulniers 2021-02-04 18:12 ` Nathan Chancellor 2021-02-04 18:23 ` Nick Desaulniers 2021-02-04 21:31 ` Guillaume Tucker 2021-02-04 21:50 ` Russell King - ARM Linux admin 2021-02-05 8:21 ` Ard Biesheuvel 2021-02-05 12:05 ` Ard Biesheuvel 2021-02-06 13:10 ` Guillaume Tucker 2021-02-06 13:12 ` Ard Biesheuvel 2021-02-04 21:09 ` Guillaume Tucker 2021-02-04 10:47 ` Russell King - ARM Linux admin 2021-02-04 10:55 ` Ard Biesheuvel 2021-02-04 12:26 ` Marc Zyngier 2021-02-04 14:09 ` Russell King - ARM Linux admin 2021-02-04 14:25 ` Ard Biesheuvel 2021-02-04 14:36 ` Russell King - ARM Linux admin 2021-02-04 15:52 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).