All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: next/master bisection: baseline.login on rk3288-rock2-square
       [not found] <601b773a.1c69fb81.9f381.a32a@mx.google.com>
@ 2021-02-04  8:43   ` Guillaume Tucker
  0 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04  8:43 UTC (permalink / raw)
  To: Russell King, Ard Biesheuvel
  Cc: Geert Uytterhoeven, linux-kernel, Russell King, Linus Walleij,
	linux-arm-kernel, Nicolas Pitre, kernelci-results

Hi Ard,

Please see the bisection report below about a boot failure on
rk3288 with next-20210203.  It was also bisected on
imx6q-var-dt6customboard with next-20210202.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The kernel is most likely crashing very early on, so there's
nothing in the logs.  Please let us know if you need some help
with debugging or trying a fix on these platforms.

Best wishes,
Guillaume


On 04/02/2021 04:25, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on rk3288-rock2-square
> 
> Summary:
>   Start:      58b6c0e507b7 Add linux-next specific files for 20210203
>   Plain log:  https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.txt
>   HTML log:   https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.html
>   Result:     5a29552af92d ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> 
> Checks:
>   revert:     PASS
>   verify:     PASS
> 
> Parameters:
>   Tree:       next
>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch:     master
>   Target:     rk3288-rock2-square
>   CPU arch:   arm
>   Lab:        lab-collabora
>   Compiler:   clang-11
>   Config:     multi_v7_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> -------------------------------------------------------------------------------
> commit 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> Author: Ard Biesheuvel <ardb@kernel.org>
> Date:   Sun Jan 24 18:03:45 2021 +0100
> 
>     ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
>     
>     Commit 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance
>     for v7 cores") replaced the by-set/way cache maintenance in the decompressor
>     with by-VA cache maintenance, which is more appropriate for the task at
>     hand, especially under virtualization on hosts with non-architected system
>     caches that are not affected by by-set/way maintenance at all.
>     
>     On such systems, that commit inadvertently removed the cache clean and
>     invalidate of all of the guest's memory that is performed by KVM on behalf
>     of the guest after its MMU is disabled (but only if any by-set/way cache
>     maintenance instructions were issued first). This resulted in various
>     erroneous behaviors observed by Russell, all involving the mini-stack
>     used by the core kernel's v7 boot code, and which resides in BSS. It
>     seems intractable to figure out exactly what goes wrong in each of these
>     cases, but some small experiments did suggest that the lack of a cache
>     clean and invalidate *after* disabling the MMU and caches is what
>     triggers the errors, presumably because cachelines are being allocated
>     or reallocated while the first cache clean and invalidate is in progress.
>     
>     To ensure that no cache lines cover any of the data that is accessed by
>     the booting kernel with the MMU off, include the uncompressed kernel's
>     BSS region in the cache clean operation.
>     
>     Also, to ensure that no cachelines are allocated while the cache is being
>     cleaned, perform the cache clean operation *after* disabling the MMU and
>     caches when running on v7 or later, by making a tail call to the clean
>     routine from the cache_off routine. This requires passing the VA range
>     to cache_off(), which means some care needs to be taken to preserve
>     R0 and R1 across the call to cache_off().
>     
>     Since this makes the first cache clean redundant, call it with the
>     range reduced to zero. This only affects v7, as all other versions
>     ignore R0/R1 entirely.
>     
>     Link: https://lore.kernel.org/linux-arm-kernel/20210122152012.30075-1-ardb@kernel.org
>     
>     Fixes: 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance for v7 cores")
>     Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
>     Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>     Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index d9cce7238a36..5f231b6f0d1a 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -609,11 +609,24 @@ not_relocated:	mov	r0, #0
>  		mov	r3, r7
>  		bl	decompress_kernel
>  
> +		@
> +		@ Perform a cache clean before disabling the MMU entirely.
> +		@ In cases where the MMU needs to be disabled first (v7+),
> +		@ the clean is performed again by cache_off(), using by-VA
> +		@ operations on the range [R0, R1], making this prior call to
> +		@ cache_clean_flush() redundant. In other cases, the clean is
> +		@ performed by set/way and R0/R1 are ignored.
> +		@
> +		mov	r0, #0
> +		mov	r1, #0
> +		bl	cache_clean_flush
> +
>  		get_inflated_image_size	r1, r2, r3
> +		ldr	r2, =_kernel_bss_size
> +		add	r1, r1, r2
>  
> -		mov	r0, r4			@ start of inflated image
> -		add	r1, r1, r0		@ end of inflated image
> -		bl	cache_clean_flush
> +		mov	r0, r4			@ start of decompressed kernel
> +		add	r1, r1, r0		@ end of kernel BSS
>  		bl	cache_off
>  
>  #ifdef CONFIG_ARM_VIRT_EXT
> @@ -1124,12 +1137,14 @@ proc_types:
>   * reading the control register, but ARMv4 does.
>   *
>   * On exit,
> - *  r0, r1, r2, r3, r9, r12 corrupted
> + *  r0, r1, r2, r3, r9, r10, r11, r12 corrupted
>   * This routine must preserve:
>   *  r4, r7, r8
>   */
>  		.align	5
>  cache_off:	mov	r3, #12			@ cache_off function
> +		mov	r10, r0
> +		mov	r11, r1
>  		b	call_cache_fn
>  
>  __armv4_mpu_cache_off:
> @@ -1176,7 +1191,9 @@ __armv7_mmu_cache_off:
>  		mcr	p15, 0, r0, c7, c5, 6	@ invalidate BTC
>  		mcr	p15, 0, r0, c7, c10, 4	@ DSB
>  		mcr	p15, 0, r0, c7, c5, 4	@ ISB
> -		mov	pc, lr
> +
> +		mov	r0, r10
> +		b	__armv7_mmu_cache_flush
>  
>  /*
>   * Clean and flush the cache to maintain consistency.
> -------------------------------------------------------------------------------
> 
> 
> Git bisection log:
> 
> -------------------------------------------------------------------------------
> git bisect start
> # good: [62c31574cdb770c78f67e7aa6e0b0244ad122901] Merge tag 'imx-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
> git bisect good 62c31574cdb770c78f67e7aa6e0b0244ad122901
> # bad: [58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6] Add linux-next specific files for 20210203
> git bisect bad 58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6
> # bad: [18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2] Merge remote-tracking branch 'net-next/master'
> git bisect bad 18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2
> # bad: [58d92989a8d24b6aaaabee52624d891b5103e04a] Merge remote-tracking branch 'parisc-hd/for-next'
> git bisect bad 58d92989a8d24b6aaaabee52624d891b5103e04a
> # bad: [b0b5c935b4dcf824ef30f6ddf719b49f729c2795] Merge remote-tracking branch 'sound-current/for-linus'
> git bisect bad b0b5c935b4dcf824ef30f6ddf719b49f729c2795
> # good: [d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct page for holes in memory layout
> git bisect good d3921cb8be29ce5668c64e23ffdaeec5f8c69399
> # good: [c64396cc36c6e60704ab06c1fb1c4a46179c9120] Merge tag 'locking-urgent-2021-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good c64396cc36c6e60704ab06c1fb1c4a46179c9120
> # good: [2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a] Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block
> git bisect good 2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a
> # good: [88bb507a74ea7d75fa49edd421eaa710a7d80598] Merge tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> git bisect good 88bb507a74ea7d75fa49edd421eaa710a7d80598
> # good: [2e02677e961fd4b96d8cf106b5979e6a3cdb7362] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> git bisect good 2e02677e961fd4b96d8cf106b5979e6a3cdb7362
> # bad: [d3aa3465622d6d96645611b331312b773806d1a7] Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
> git bisect bad d3aa3465622d6d96645611b331312b773806d1a7
> # good: [245a7d47066ac0a266004110bd4d57d0d1329823] scripts: switch some more scripts explicitly to Python 3
> git bisect good 245a7d47066ac0a266004110bd4d57d0d1329823
> # bad: [199a427c3a3da01c5db4784a75b37251e7befa64] ARM: ensure the signal page contains defined contents
> git bisect bad 199a427c3a3da01c5db4784a75b37251e7befa64
> # good: [538eea5362a1179dfa7770dd2b6607dc30cc50c6] ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
> git bisect good 538eea5362a1179dfa7770dd2b6607dc30cc50c6
> # bad: [d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956] ARM: decompressor: tidy up register usage
> git bisect bad d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956
> # bad: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> git bisect bad 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> # first bad commit: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> -------------------------------------------------------------------------------
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> View/Reply Online (#6431): https://groups.io/g/kernelci-results/message/6431
> Mute This Topic: https://groups.io/mt/80373377/924702
> Group Owner: kernelci-results+owner@groups.io
> Unsubscribe: https://groups.io/g/kernelci-results/unsub [guillaume.tucker@collabora.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04  8:43   ` Guillaume Tucker
  0 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04  8:43 UTC (permalink / raw)
  To: Russell King, Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, linux-kernel, Russell King, linux-arm-kernel

Hi Ard,

Please see the bisection report below about a boot failure on
rk3288 with next-20210203.  It was also bisected on
imx6q-var-dt6customboard with next-20210202.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The kernel is most likely crashing very early on, so there's
nothing in the logs.  Please let us know if you need some help
with debugging or trying a fix on these platforms.

Best wishes,
Guillaume


On 04/02/2021 04:25, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has      *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.      *
> *                                                               *
> * If you do send a fix, please include this trailer:            *
> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> *                                                               *
> * Hope this helps!                                              *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on rk3288-rock2-square
> 
> Summary:
>   Start:      58b6c0e507b7 Add linux-next specific files for 20210203
>   Plain log:  https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.txt
>   HTML log:   https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.html
>   Result:     5a29552af92d ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> 
> Checks:
>   revert:     PASS
>   verify:     PASS
> 
> Parameters:
>   Tree:       next
>   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch:     master
>   Target:     rk3288-rock2-square
>   CPU arch:   arm
>   Lab:        lab-collabora
>   Compiler:   clang-11
>   Config:     multi_v7_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> -------------------------------------------------------------------------------
> commit 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> Author: Ard Biesheuvel <ardb@kernel.org>
> Date:   Sun Jan 24 18:03:45 2021 +0100
> 
>     ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
>     
>     Commit 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance
>     for v7 cores") replaced the by-set/way cache maintenance in the decompressor
>     with by-VA cache maintenance, which is more appropriate for the task at
>     hand, especially under virtualization on hosts with non-architected system
>     caches that are not affected by by-set/way maintenance at all.
>     
>     On such systems, that commit inadvertently removed the cache clean and
>     invalidate of all of the guest's memory that is performed by KVM on behalf
>     of the guest after its MMU is disabled (but only if any by-set/way cache
>     maintenance instructions were issued first). This resulted in various
>     erroneous behaviors observed by Russell, all involving the mini-stack
>     used by the core kernel's v7 boot code, and which resides in BSS. It
>     seems intractable to figure out exactly what goes wrong in each of these
>     cases, but some small experiments did suggest that the lack of a cache
>     clean and invalidate *after* disabling the MMU and caches is what
>     triggers the errors, presumably because cachelines are being allocated
>     or reallocated while the first cache clean and invalidate is in progress.
>     
>     To ensure that no cache lines cover any of the data that is accessed by
>     the booting kernel with the MMU off, include the uncompressed kernel's
>     BSS region in the cache clean operation.
>     
>     Also, to ensure that no cachelines are allocated while the cache is being
>     cleaned, perform the cache clean operation *after* disabling the MMU and
>     caches when running on v7 or later, by making a tail call to the clean
>     routine from the cache_off routine. This requires passing the VA range
>     to cache_off(), which means some care needs to be taken to preserve
>     R0 and R1 across the call to cache_off().
>     
>     Since this makes the first cache clean redundant, call it with the
>     range reduced to zero. This only affects v7, as all other versions
>     ignore R0/R1 entirely.
>     
>     Link: https://lore.kernel.org/linux-arm-kernel/20210122152012.30075-1-ardb@kernel.org
>     
>     Fixes: 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance for v7 cores")
>     Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
>     Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>     Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index d9cce7238a36..5f231b6f0d1a 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -609,11 +609,24 @@ not_relocated:	mov	r0, #0
>  		mov	r3, r7
>  		bl	decompress_kernel
>  
> +		@
> +		@ Perform a cache clean before disabling the MMU entirely.
> +		@ In cases where the MMU needs to be disabled first (v7+),
> +		@ the clean is performed again by cache_off(), using by-VA
> +		@ operations on the range [R0, R1], making this prior call to
> +		@ cache_clean_flush() redundant. In other cases, the clean is
> +		@ performed by set/way and R0/R1 are ignored.
> +		@
> +		mov	r0, #0
> +		mov	r1, #0
> +		bl	cache_clean_flush
> +
>  		get_inflated_image_size	r1, r2, r3
> +		ldr	r2, =_kernel_bss_size
> +		add	r1, r1, r2
>  
> -		mov	r0, r4			@ start of inflated image
> -		add	r1, r1, r0		@ end of inflated image
> -		bl	cache_clean_flush
> +		mov	r0, r4			@ start of decompressed kernel
> +		add	r1, r1, r0		@ end of kernel BSS
>  		bl	cache_off
>  
>  #ifdef CONFIG_ARM_VIRT_EXT
> @@ -1124,12 +1137,14 @@ proc_types:
>   * reading the control register, but ARMv4 does.
>   *
>   * On exit,
> - *  r0, r1, r2, r3, r9, r12 corrupted
> + *  r0, r1, r2, r3, r9, r10, r11, r12 corrupted
>   * This routine must preserve:
>   *  r4, r7, r8
>   */
>  		.align	5
>  cache_off:	mov	r3, #12			@ cache_off function
> +		mov	r10, r0
> +		mov	r11, r1
>  		b	call_cache_fn
>  
>  __armv4_mpu_cache_off:
> @@ -1176,7 +1191,9 @@ __armv7_mmu_cache_off:
>  		mcr	p15, 0, r0, c7, c5, 6	@ invalidate BTC
>  		mcr	p15, 0, r0, c7, c10, 4	@ DSB
>  		mcr	p15, 0, r0, c7, c5, 4	@ ISB
> -		mov	pc, lr
> +
> +		mov	r0, r10
> +		b	__armv7_mmu_cache_flush
>  
>  /*
>   * Clean and flush the cache to maintain consistency.
> -------------------------------------------------------------------------------
> 
> 
> Git bisection log:
> 
> -------------------------------------------------------------------------------
> git bisect start
> # good: [62c31574cdb770c78f67e7aa6e0b0244ad122901] Merge tag 'imx-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
> git bisect good 62c31574cdb770c78f67e7aa6e0b0244ad122901
> # bad: [58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6] Add linux-next specific files for 20210203
> git bisect bad 58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6
> # bad: [18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2] Merge remote-tracking branch 'net-next/master'
> git bisect bad 18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2
> # bad: [58d92989a8d24b6aaaabee52624d891b5103e04a] Merge remote-tracking branch 'parisc-hd/for-next'
> git bisect bad 58d92989a8d24b6aaaabee52624d891b5103e04a
> # bad: [b0b5c935b4dcf824ef30f6ddf719b49f729c2795] Merge remote-tracking branch 'sound-current/for-linus'
> git bisect bad b0b5c935b4dcf824ef30f6ddf719b49f729c2795
> # good: [d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct page for holes in memory layout
> git bisect good d3921cb8be29ce5668c64e23ffdaeec5f8c69399
> # good: [c64396cc36c6e60704ab06c1fb1c4a46179c9120] Merge tag 'locking-urgent-2021-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good c64396cc36c6e60704ab06c1fb1c4a46179c9120
> # good: [2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a] Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block
> git bisect good 2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a
> # good: [88bb507a74ea7d75fa49edd421eaa710a7d80598] Merge tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> git bisect good 88bb507a74ea7d75fa49edd421eaa710a7d80598
> # good: [2e02677e961fd4b96d8cf106b5979e6a3cdb7362] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> git bisect good 2e02677e961fd4b96d8cf106b5979e6a3cdb7362
> # bad: [d3aa3465622d6d96645611b331312b773806d1a7] Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
> git bisect bad d3aa3465622d6d96645611b331312b773806d1a7
> # good: [245a7d47066ac0a266004110bd4d57d0d1329823] scripts: switch some more scripts explicitly to Python 3
> git bisect good 245a7d47066ac0a266004110bd4d57d0d1329823
> # bad: [199a427c3a3da01c5db4784a75b37251e7befa64] ARM: ensure the signal page contains defined contents
> git bisect bad 199a427c3a3da01c5db4784a75b37251e7befa64
> # good: [538eea5362a1179dfa7770dd2b6607dc30cc50c6] ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
> git bisect good 538eea5362a1179dfa7770dd2b6607dc30cc50c6
> # bad: [d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956] ARM: decompressor: tidy up register usage
> git bisect bad d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956
> # bad: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> git bisect bad 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> # first bad commit: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> -------------------------------------------------------------------------------
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> View/Reply Online (#6431): https://groups.io/g/kernelci-results/message/6431
> Mute This Topic: https://groups.io/mt/80373377/924702
> Group Owner: kernelci-results+owner@groups.io
> Unsubscribe: https://groups.io/g/kernelci-results/unsub [guillaume.tucker@collabora.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04  8:43   ` Guillaume Tucker
@ 2021-02-04  9:07     ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04  9:07 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Russell King, Geert Uytterhoeven, Linux Kernel Mailing List,
	Russell King, Linus Walleij, Linux ARM, Nicolas Pitre,
	kernelci-results

On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> Hi Ard,
>
> Please see the bisection report below about a boot failure on
> rk3288 with next-20210203.  It was also bisected on
> imx6q-var-dt6customboard with next-20210202.
>
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
>
> The kernel is most likely crashing very early on, so there's
> nothing in the logs.  Please let us know if you need some help
> with debugging or trying a fix on these platforms.
>

Thanks for the report.

Mind trying the following fix?

--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -617,8 +617,10 @@ not_relocated:     mov     r0, #0
                @ cache_clean_flush() redundant. In other cases, the clean is
                @ performed by set/way and R0/R1 are ignored.
                @
-               mov     r0, #0
-               mov     r1, #0
+               get_inflated_image_size r1, r2, r3
+
+               mov     r0, r4                  @ start of decompressed kernel
+               add     r1, r1, r0              @ end of kernel BSS
                bl      cache_clean_flush

                get_inflated_image_size r1, r2, r3




> On 04/02/2021 04:25, KernelCI bot wrote:
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> > * This automated bisection report was sent to you on the basis  *
> > * that you may be involved with the breaking commit it has      *
> > * found.  No manual investigation has been done to verify it,   *
> > * and the root cause of the problem may be somewhere else.      *
> > *                                                               *
> > * If you do send a fix, please include this trailer:            *
> > *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> > *                                                               *
> > * Hope this helps!                                              *
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >
> > next/master bisection: baseline.login on rk3288-rock2-square
> >
> > Summary:
> >   Start:      58b6c0e507b7 Add linux-next specific files for 20210203
> >   Plain log:  https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.txt
> >   HTML log:   https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.html
> >   Result:     5a29552af92d ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> >
> > Checks:
> >   revert:     PASS
> >   verify:     PASS
> >
> > Parameters:
> >   Tree:       next
> >   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> >   Branch:     master
> >   Target:     rk3288-rock2-square
> >   CPU arch:   arm
> >   Lab:        lab-collabora
> >   Compiler:   clang-11
> >   Config:     multi_v7_defconfig
> >   Test case:  baseline.login
> >
> > Breaking commit found:
> >
> > -------------------------------------------------------------------------------
> > commit 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> > Author: Ard Biesheuvel <ardb@kernel.org>
> > Date:   Sun Jan 24 18:03:45 2021 +0100
> >
> >     ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> >
> >     Commit 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance
> >     for v7 cores") replaced the by-set/way cache maintenance in the decompressor
> >     with by-VA cache maintenance, which is more appropriate for the task at
> >     hand, especially under virtualization on hosts with non-architected system
> >     caches that are not affected by by-set/way maintenance at all.
> >
> >     On such systems, that commit inadvertently removed the cache clean and
> >     invalidate of all of the guest's memory that is performed by KVM on behalf
> >     of the guest after its MMU is disabled (but only if any by-set/way cache
> >     maintenance instructions were issued first). This resulted in various
> >     erroneous behaviors observed by Russell, all involving the mini-stack
> >     used by the core kernel's v7 boot code, and which resides in BSS. It
> >     seems intractable to figure out exactly what goes wrong in each of these
> >     cases, but some small experiments did suggest that the lack of a cache
> >     clean and invalidate *after* disabling the MMU and caches is what
> >     triggers the errors, presumably because cachelines are being allocated
> >     or reallocated while the first cache clean and invalidate is in progress.
> >
> >     To ensure that no cache lines cover any of the data that is accessed by
> >     the booting kernel with the MMU off, include the uncompressed kernel's
> >     BSS region in the cache clean operation.
> >
> >     Also, to ensure that no cachelines are allocated while the cache is being
> >     cleaned, perform the cache clean operation *after* disabling the MMU and
> >     caches when running on v7 or later, by making a tail call to the clean
> >     routine from the cache_off routine. This requires passing the VA range
> >     to cache_off(), which means some care needs to be taken to preserve
> >     R0 and R1 across the call to cache_off().
> >
> >     Since this makes the first cache clean redundant, call it with the
> >     range reduced to zero. This only affects v7, as all other versions
> >     ignore R0/R1 entirely.
> >
> >     Link: https://lore.kernel.org/linux-arm-kernel/20210122152012.30075-1-ardb@kernel.org
> >
> >     Fixes: 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance for v7 cores")
> >     Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
> >     Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >     Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> >
> > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> > index d9cce7238a36..5f231b6f0d1a 100644
> > --- a/arch/arm/boot/compressed/head.S
> > +++ b/arch/arm/boot/compressed/head.S
> > @@ -609,11 +609,24 @@ not_relocated:  mov     r0, #0
> >               mov     r3, r7
> >               bl      decompress_kernel
> >
> > +             @
> > +             @ Perform a cache clean before disabling the MMU entirely.
> > +             @ In cases where the MMU needs to be disabled first (v7+),
> > +             @ the clean is performed again by cache_off(), using by-VA
> > +             @ operations on the range [R0, R1], making this prior call to
> > +             @ cache_clean_flush() redundant. In other cases, the clean is
> > +             @ performed by set/way and R0/R1 are ignored.
> > +             @
> > +             mov     r0, #0
> > +             mov     r1, #0
> > +             bl      cache_clean_flush
> > +
> >               get_inflated_image_size r1, r2, r3
> > +             ldr     r2, =_kernel_bss_size
> > +             add     r1, r1, r2
> >
> > -             mov     r0, r4                  @ start of inflated image
> > -             add     r1, r1, r0              @ end of inflated image
> > -             bl      cache_clean_flush
> > +             mov     r0, r4                  @ start of decompressed kernel
> > +             add     r1, r1, r0              @ end of kernel BSS
> >               bl      cache_off
> >
> >  #ifdef CONFIG_ARM_VIRT_EXT
> > @@ -1124,12 +1137,14 @@ proc_types:
> >   * reading the control register, but ARMv4 does.
> >   *
> >   * On exit,
> > - *  r0, r1, r2, r3, r9, r12 corrupted
> > + *  r0, r1, r2, r3, r9, r10, r11, r12 corrupted
> >   * This routine must preserve:
> >   *  r4, r7, r8
> >   */
> >               .align  5
> >  cache_off:   mov     r3, #12                 @ cache_off function
> > +             mov     r10, r0
> > +             mov     r11, r1
> >               b       call_cache_fn
> >
> >  __armv4_mpu_cache_off:
> > @@ -1176,7 +1191,9 @@ __armv7_mmu_cache_off:
> >               mcr     p15, 0, r0, c7, c5, 6   @ invalidate BTC
> >               mcr     p15, 0, r0, c7, c10, 4  @ DSB
> >               mcr     p15, 0, r0, c7, c5, 4   @ ISB
> > -             mov     pc, lr
> > +
> > +             mov     r0, r10
> > +             b       __armv7_mmu_cache_flush
> >
> >  /*
> >   * Clean and flush the cache to maintain consistency.
> > -------------------------------------------------------------------------------
> >
> >
> > Git bisection log:
> >
> > -------------------------------------------------------------------------------
> > git bisect start
> > # good: [62c31574cdb770c78f67e7aa6e0b0244ad122901] Merge tag 'imx-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
> > git bisect good 62c31574cdb770c78f67e7aa6e0b0244ad122901
> > # bad: [58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6] Add linux-next specific files for 20210203
> > git bisect bad 58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6
> > # bad: [18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2] Merge remote-tracking branch 'net-next/master'
> > git bisect bad 18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2
> > # bad: [58d92989a8d24b6aaaabee52624d891b5103e04a] Merge remote-tracking branch 'parisc-hd/for-next'
> > git bisect bad 58d92989a8d24b6aaaabee52624d891b5103e04a
> > # bad: [b0b5c935b4dcf824ef30f6ddf719b49f729c2795] Merge remote-tracking branch 'sound-current/for-linus'
> > git bisect bad b0b5c935b4dcf824ef30f6ddf719b49f729c2795
> > # good: [d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct page for holes in memory layout
> > git bisect good d3921cb8be29ce5668c64e23ffdaeec5f8c69399
> > # good: [c64396cc36c6e60704ab06c1fb1c4a46179c9120] Merge tag 'locking-urgent-2021-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good c64396cc36c6e60704ab06c1fb1c4a46179c9120
> > # good: [2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a] Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block
> > git bisect good 2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a
> > # good: [88bb507a74ea7d75fa49edd421eaa710a7d80598] Merge tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> > git bisect good 88bb507a74ea7d75fa49edd421eaa710a7d80598
> > # good: [2e02677e961fd4b96d8cf106b5979e6a3cdb7362] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> > git bisect good 2e02677e961fd4b96d8cf106b5979e6a3cdb7362
> > # bad: [d3aa3465622d6d96645611b331312b773806d1a7] Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
> > git bisect bad d3aa3465622d6d96645611b331312b773806d1a7
> > # good: [245a7d47066ac0a266004110bd4d57d0d1329823] scripts: switch some more scripts explicitly to Python 3
> > git bisect good 245a7d47066ac0a266004110bd4d57d0d1329823
> > # bad: [199a427c3a3da01c5db4784a75b37251e7befa64] ARM: ensure the signal page contains defined contents
> > git bisect bad 199a427c3a3da01c5db4784a75b37251e7befa64
> > # good: [538eea5362a1179dfa7770dd2b6607dc30cc50c6] ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
> > git bisect good 538eea5362a1179dfa7770dd2b6607dc30cc50c6
> > # bad: [d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956] ARM: decompressor: tidy up register usage
> > git bisect bad d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956
> > # bad: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> > git bisect bad 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> > # first bad commit: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> > -------------------------------------------------------------------------------
> >
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Groups.io Links: You receive all messages sent to this group.
> > View/Reply Online (#6431): https://groups.io/g/kernelci-results/message/6431
> > Mute This Topic: https://groups.io/mt/80373377/924702
> > Group Owner: kernelci-results+owner@groups.io
> > Unsubscribe: https://groups.io/g/kernelci-results/unsub [guillaume.tucker@collabora.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
> >
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04  9:07     ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04  9:07 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Linux Kernel Mailing List, Russell King,
	Russell King, Linux ARM

On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> Hi Ard,
>
> Please see the bisection report below about a boot failure on
> rk3288 with next-20210203.  It was also bisected on
> imx6q-var-dt6customboard with next-20210202.
>
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
>
> The kernel is most likely crashing very early on, so there's
> nothing in the logs.  Please let us know if you need some help
> with debugging or trying a fix on these platforms.
>

Thanks for the report.

Mind trying the following fix?

--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -617,8 +617,10 @@ not_relocated:     mov     r0, #0
                @ cache_clean_flush() redundant. In other cases, the clean is
                @ performed by set/way and R0/R1 are ignored.
                @
-               mov     r0, #0
-               mov     r1, #0
+               get_inflated_image_size r1, r2, r3
+
+               mov     r0, r4                  @ start of decompressed kernel
+               add     r1, r1, r0              @ end of kernel BSS
                bl      cache_clean_flush

                get_inflated_image_size r1, r2, r3




> On 04/02/2021 04:25, KernelCI bot wrote:
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> > * This automated bisection report was sent to you on the basis  *
> > * that you may be involved with the breaking commit it has      *
> > * found.  No manual investigation has been done to verify it,   *
> > * and the root cause of the problem may be somewhere else.      *
> > *                                                               *
> > * If you do send a fix, please include this trailer:            *
> > *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
> > *                                                               *
> > * Hope this helps!                                              *
> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> >
> > next/master bisection: baseline.login on rk3288-rock2-square
> >
> > Summary:
> >   Start:      58b6c0e507b7 Add linux-next specific files for 20210203
> >   Plain log:  https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.txt
> >   HTML log:   https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.html
> >   Result:     5a29552af92d ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> >
> > Checks:
> >   revert:     PASS
> >   verify:     PASS
> >
> > Parameters:
> >   Tree:       next
> >   URL:        https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> >   Branch:     master
> >   Target:     rk3288-rock2-square
> >   CPU arch:   arm
> >   Lab:        lab-collabora
> >   Compiler:   clang-11
> >   Config:     multi_v7_defconfig
> >   Test case:  baseline.login
> >
> > Breaking commit found:
> >
> > -------------------------------------------------------------------------------
> > commit 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> > Author: Ard Biesheuvel <ardb@kernel.org>
> > Date:   Sun Jan 24 18:03:45 2021 +0100
> >
> >     ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> >
> >     Commit 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance
> >     for v7 cores") replaced the by-set/way cache maintenance in the decompressor
> >     with by-VA cache maintenance, which is more appropriate for the task at
> >     hand, especially under virtualization on hosts with non-architected system
> >     caches that are not affected by by-set/way maintenance at all.
> >
> >     On such systems, that commit inadvertently removed the cache clean and
> >     invalidate of all of the guest's memory that is performed by KVM on behalf
> >     of the guest after its MMU is disabled (but only if any by-set/way cache
> >     maintenance instructions were issued first). This resulted in various
> >     erroneous behaviors observed by Russell, all involving the mini-stack
> >     used by the core kernel's v7 boot code, and which resides in BSS. It
> >     seems intractable to figure out exactly what goes wrong in each of these
> >     cases, but some small experiments did suggest that the lack of a cache
> >     clean and invalidate *after* disabling the MMU and caches is what
> >     triggers the errors, presumably because cachelines are being allocated
> >     or reallocated while the first cache clean and invalidate is in progress.
> >
> >     To ensure that no cache lines cover any of the data that is accessed by
> >     the booting kernel with the MMU off, include the uncompressed kernel's
> >     BSS region in the cache clean operation.
> >
> >     Also, to ensure that no cachelines are allocated while the cache is being
> >     cleaned, perform the cache clean operation *after* disabling the MMU and
> >     caches when running on v7 or later, by making a tail call to the clean
> >     routine from the cache_off routine. This requires passing the VA range
> >     to cache_off(), which means some care needs to be taken to preserve
> >     R0 and R1 across the call to cache_off().
> >
> >     Since this makes the first cache clean redundant, call it with the
> >     range reduced to zero. This only affects v7, as all other versions
> >     ignore R0/R1 entirely.
> >
> >     Link: https://lore.kernel.org/linux-arm-kernel/20210122152012.30075-1-ardb@kernel.org
> >
> >     Fixes: 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance for v7 cores")
> >     Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
> >     Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >     Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> >
> > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> > index d9cce7238a36..5f231b6f0d1a 100644
> > --- a/arch/arm/boot/compressed/head.S
> > +++ b/arch/arm/boot/compressed/head.S
> > @@ -609,11 +609,24 @@ not_relocated:  mov     r0, #0
> >               mov     r3, r7
> >               bl      decompress_kernel
> >
> > +             @
> > +             @ Perform a cache clean before disabling the MMU entirely.
> > +             @ In cases where the MMU needs to be disabled first (v7+),
> > +             @ the clean is performed again by cache_off(), using by-VA
> > +             @ operations on the range [R0, R1], making this prior call to
> > +             @ cache_clean_flush() redundant. In other cases, the clean is
> > +             @ performed by set/way and R0/R1 are ignored.
> > +             @
> > +             mov     r0, #0
> > +             mov     r1, #0
> > +             bl      cache_clean_flush
> > +
> >               get_inflated_image_size r1, r2, r3
> > +             ldr     r2, =_kernel_bss_size
> > +             add     r1, r1, r2
> >
> > -             mov     r0, r4                  @ start of inflated image
> > -             add     r1, r1, r0              @ end of inflated image
> > -             bl      cache_clean_flush
> > +             mov     r0, r4                  @ start of decompressed kernel
> > +             add     r1, r1, r0              @ end of kernel BSS
> >               bl      cache_off
> >
> >  #ifdef CONFIG_ARM_VIRT_EXT
> > @@ -1124,12 +1137,14 @@ proc_types:
> >   * reading the control register, but ARMv4 does.
> >   *
> >   * On exit,
> > - *  r0, r1, r2, r3, r9, r12 corrupted
> > + *  r0, r1, r2, r3, r9, r10, r11, r12 corrupted
> >   * This routine must preserve:
> >   *  r4, r7, r8
> >   */
> >               .align  5
> >  cache_off:   mov     r3, #12                 @ cache_off function
> > +             mov     r10, r0
> > +             mov     r11, r1
> >               b       call_cache_fn
> >
> >  __armv4_mpu_cache_off:
> > @@ -1176,7 +1191,9 @@ __armv7_mmu_cache_off:
> >               mcr     p15, 0, r0, c7, c5, 6   @ invalidate BTC
> >               mcr     p15, 0, r0, c7, c10, 4  @ DSB
> >               mcr     p15, 0, r0, c7, c5, 4   @ ISB
> > -             mov     pc, lr
> > +
> > +             mov     r0, r10
> > +             b       __armv7_mmu_cache_flush
> >
> >  /*
> >   * Clean and flush the cache to maintain consistency.
> > -------------------------------------------------------------------------------
> >
> >
> > Git bisection log:
> >
> > -------------------------------------------------------------------------------
> > git bisect start
> > # good: [62c31574cdb770c78f67e7aa6e0b0244ad122901] Merge tag 'imx-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
> > git bisect good 62c31574cdb770c78f67e7aa6e0b0244ad122901
> > # bad: [58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6] Add linux-next specific files for 20210203
> > git bisect bad 58b6c0e507b7421b03b2f2a92bddbb8c6fa1b2f6
> > # bad: [18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2] Merge remote-tracking branch 'net-next/master'
> > git bisect bad 18c1afa6bb9b6277d20910eb7cdc5eb01d9d87f2
> > # bad: [58d92989a8d24b6aaaabee52624d891b5103e04a] Merge remote-tracking branch 'parisc-hd/for-next'
> > git bisect bad 58d92989a8d24b6aaaabee52624d891b5103e04a
> > # bad: [b0b5c935b4dcf824ef30f6ddf719b49f729c2795] Merge remote-tracking branch 'sound-current/for-linus'
> > git bisect bad b0b5c935b4dcf824ef30f6ddf719b49f729c2795
> > # good: [d3921cb8be29ce5668c64e23ffdaeec5f8c69399] mm: fix initialization of struct page for holes in memory layout
> > git bisect good d3921cb8be29ce5668c64e23ffdaeec5f8c69399
> > # good: [c64396cc36c6e60704ab06c1fb1c4a46179c9120] Merge tag 'locking-urgent-2021-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good c64396cc36c6e60704ab06c1fb1c4a46179c9120
> > # good: [2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a] Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block
> > git bisect good 2ba1c4d1a4b5fb9961452286bdcad502b0c8b78a
> > # good: [88bb507a74ea7d75fa49edd421eaa710a7d80598] Merge tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> > git bisect good 88bb507a74ea7d75fa49edd421eaa710a7d80598
> > # good: [2e02677e961fd4b96d8cf106b5979e6a3cdb7362] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> > git bisect good 2e02677e961fd4b96d8cf106b5979e6a3cdb7362
> > # bad: [d3aa3465622d6d96645611b331312b773806d1a7] Merge remote-tracking branch 'arm64-fixes/for-next/fixes'
> > git bisect bad d3aa3465622d6d96645611b331312b773806d1a7
> > # good: [245a7d47066ac0a266004110bd4d57d0d1329823] scripts: switch some more scripts explicitly to Python 3
> > git bisect good 245a7d47066ac0a266004110bd4d57d0d1329823
> > # bad: [199a427c3a3da01c5db4784a75b37251e7befa64] ARM: ensure the signal page contains defined contents
> > git bisect bad 199a427c3a3da01c5db4784a75b37251e7befa64
> > # good: [538eea5362a1179dfa7770dd2b6607dc30cc50c6] ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
> > git bisect good 538eea5362a1179dfa7770dd2b6607dc30cc50c6
> > # bad: [d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956] ARM: decompressor: tidy up register usage
> > git bisect bad d80cd9abcd942eb217b6c68e5bd0d5c3feb2f956
> > # bad: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> > git bisect bad 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> > # first bad commit: [5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2] ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU disable on v7
> > -------------------------------------------------------------------------------
> >
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Groups.io Links: You receive all messages sent to this group.
> > View/Reply Online (#6431): https://groups.io/g/kernelci-results/message/6431
> > Mute This Topic: https://groups.io/mt/80373377/924702
> > Group Owner: kernelci-results+owner@groups.io
> > Unsubscribe: https://groups.io/g/kernelci-results/unsub [guillaume.tucker@collabora.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
> >
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04  9:07     ` Ard Biesheuvel
@ 2021-02-04 10:06       ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 10:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Guillaume Tucker, Geert Uytterhoeven, Linux Kernel Mailing List,
	Linus Walleij, Linux ARM, Nicolas Pitre, kernelci-results

On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > Hi Ard,
> >
> > Please see the bisection report below about a boot failure on
> > rk3288 with next-20210203.  It was also bisected on
> > imx6q-var-dt6customboard with next-20210202.
> >
> > Reports aren't automatically sent to the public while we're
> > trialing new bisection features on kernelci.org but this one
> > looks valid.
> >
> > The kernel is most likely crashing very early on, so there's
> > nothing in the logs.  Please let us know if you need some help
> > with debugging or trying a fix on these platforms.
> >
> 
> Thanks for the report.

Ard,

I want to send my fixes branch today which includes your regression
fix that caused this regression.

As this is proving difficult to fix, I can only drop your fix from
my fixes branch - and given that this seems to be problematical, I'm
tempted to revert the original change at this point which should fix
both of these regressions - and then we have another go at getting rid
of the set/way instructions during the next cycle.

Thoughts?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 10:06       ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 10:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List,
	Linux ARM

On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > Hi Ard,
> >
> > Please see the bisection report below about a boot failure on
> > rk3288 with next-20210203.  It was also bisected on
> > imx6q-var-dt6customboard with next-20210202.
> >
> > Reports aren't automatically sent to the public while we're
> > trialing new bisection features on kernelci.org but this one
> > looks valid.
> >
> > The kernel is most likely crashing very early on, so there's
> > nothing in the logs.  Please let us know if you need some help
> > with debugging or trying a fix on these platforms.
> >
> 
> Thanks for the report.

Ard,

I want to send my fixes branch today which includes your regression
fix that caused this regression.

As this is proving difficult to fix, I can only drop your fix from
my fixes branch - and given that this seems to be problematical, I'm
tempted to revert the original change at this point which should fix
both of these regressions - and then we have another go at getting rid
of the set/way instructions during the next cycle.

Thoughts?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 10:06       ` Russell King - ARM Linux admin
@ 2021-02-04 10:27         ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 10:27 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Guillaume Tucker, Geert Uytterhoeven, Linux Kernel Mailing List,
	Linus Walleij, Linux ARM, Nicolas Pitre, kernelci-results

On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> > On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> > <guillaume.tucker@collabora.com> wrote:
> > >
> > > Hi Ard,
> > >
> > > Please see the bisection report below about a boot failure on
> > > rk3288 with next-20210203.  It was also bisected on
> > > imx6q-var-dt6customboard with next-20210202.
> > >
> > > Reports aren't automatically sent to the public while we're
> > > trialing new bisection features on kernelci.org but this one
> > > looks valid.
> > >
> > > The kernel is most likely crashing very early on, so there's
> > > nothing in the logs.  Please let us know if you need some help
> > > with debugging or trying a fix on these platforms.
> > >
> >
> > Thanks for the report.
>
> Ard,
>
> I want to send my fixes branch today which includes your regression
> fix that caused this regression.
>
> As this is proving difficult to fix, I can only drop your fix from
> my fixes branch - and given that this seems to be problematical, I'm
> tempted to revert the original change at this point which should fix
> both of these regressions - and then we have another go at getting rid
> of the set/way instructions during the next cycle.
>
> Thoughts?
>

Hi Russell,

If Guillaume is willing to do the experiment, and it fixes the issue,
it proves that rk3288 is relying on the flush before the MMU is
disabled, and so in that case, the fix is trivial, and we can just
apply it.

If the experiment fails (which would mean rk3288 does not tolerate the
cache maintenance being performed after cache off), it is going to be
hairy, and so it will definitely take more time.

So in the latter case (or if Guillaume does not get back to us), I
think reverting my queued fix is the only sane option. But in that
case, may I suggest that we queue the revert of the original by-VA
change for v5.12 so it gets lots of coverage in -next, and allows us
an opportunity to come up with a proper fix in the same timeframe, and
backport the revert and the subsequent fix as a pair? Otherwise, we'll
end up in the situation where v5.10.x until today has by-va, v5.10.x-y
has set/way, and v5.10y+ has by-va again. (I don't think we care about
anything before that, given that v5.4 predates any of this)

But in the end, I'm happy to go along with whatever works best for you.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 10:27         ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 10:27 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List,
	Linux ARM

On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> > On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> > <guillaume.tucker@collabora.com> wrote:
> > >
> > > Hi Ard,
> > >
> > > Please see the bisection report below about a boot failure on
> > > rk3288 with next-20210203.  It was also bisected on
> > > imx6q-var-dt6customboard with next-20210202.
> > >
> > > Reports aren't automatically sent to the public while we're
> > > trialing new bisection features on kernelci.org but this one
> > > looks valid.
> > >
> > > The kernel is most likely crashing very early on, so there's
> > > nothing in the logs.  Please let us know if you need some help
> > > with debugging or trying a fix on these platforms.
> > >
> >
> > Thanks for the report.
>
> Ard,
>
> I want to send my fixes branch today which includes your regression
> fix that caused this regression.
>
> As this is proving difficult to fix, I can only drop your fix from
> my fixes branch - and given that this seems to be problematical, I'm
> tempted to revert the original change at this point which should fix
> both of these regressions - and then we have another go at getting rid
> of the set/way instructions during the next cycle.
>
> Thoughts?
>

Hi Russell,

If Guillaume is willing to do the experiment, and it fixes the issue,
it proves that rk3288 is relying on the flush before the MMU is
disabled, and so in that case, the fix is trivial, and we can just
apply it.

If the experiment fails (which would mean rk3288 does not tolerate the
cache maintenance being performed after cache off), it is going to be
hairy, and so it will definitely take more time.

So in the latter case (or if Guillaume does not get back to us), I
think reverting my queued fix is the only sane option. But in that
case, may I suggest that we queue the revert of the original by-VA
change for v5.12 so it gets lots of coverage in -next, and allows us
an opportunity to come up with a proper fix in the same timeframe, and
backport the revert and the subsequent fix as a pair? Otherwise, we'll
end up in the situation where v5.10.x until today has by-va, v5.10.x-y
has set/way, and v5.10y+ has by-va again. (I don't think we care about
anything before that, given that v5.4 predates any of this)

But in the end, I'm happy to go along with whatever works best for you.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 10:27         ` Ard Biesheuvel
@ 2021-02-04 10:33           ` Guillaume Tucker
  -1 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 10:33 UTC (permalink / raw)
  To: Ard Biesheuvel, Russell King - ARM Linux admin
  Cc: Geert Uytterhoeven, Linux Kernel Mailing List, Linus Walleij,
	Linux ARM, Nicolas Pitre, kernelci-results

On 04/02/2021 10:27, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
>>
>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>> <guillaume.tucker@collabora.com> wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>> Please see the bisection report below about a boot failure on
>>>> rk3288 with next-20210203.  It was also bisected on
>>>> imx6q-var-dt6customboard with next-20210202.
>>>>
>>>> Reports aren't automatically sent to the public while we're
>>>> trialing new bisection features on kernelci.org but this one
>>>> looks valid.
>>>>
>>>> The kernel is most likely crashing very early on, so there's
>>>> nothing in the logs.  Please let us know if you need some help
>>>> with debugging or trying a fix on these platforms.
>>>>
>>>
>>> Thanks for the report.
>>
>> Ard,
>>
>> I want to send my fixes branch today which includes your regression
>> fix that caused this regression.
>>
>> As this is proving difficult to fix, I can only drop your fix from
>> my fixes branch - and given that this seems to be problematical, I'm
>> tempted to revert the original change at this point which should fix
>> both of these regressions - and then we have another go at getting rid
>> of the set/way instructions during the next cycle.
>>
>> Thoughts?
>>
> 
> Hi Russell,
> 
> If Guillaume is willing to do the experiment, and it fixes the issue,

Yes, I'm running some tests with that fix now and should have
some results shortly.

> it proves that rk3288 is relying on the flush before the MMU is
> disabled, and so in that case, the fix is trivial, and we can just
> apply it.
> 
> If the experiment fails (which would mean rk3288 does not tolerate the
> cache maintenance being performed after cache off), it is going to be
> hairy, and so it will definitely take more time.
> 
> So in the latter case (or if Guillaume does not get back to us), I
> think reverting my queued fix is the only sane option. But in that
> case, may I suggest that we queue the revert of the original by-VA
> change for v5.12 so it gets lots of coverage in -next, and allows us
> an opportunity to come up with a proper fix in the same timeframe, and
> backport the revert and the subsequent fix as a pair? Otherwise, we'll
> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> has set/way, and v5.10y+ has by-va again. (I don't think we care about
> anything before that, given that v5.4 predates any of this)
> 
> But in the end, I'm happy to go along with whatever works best for you.

Thanks,
Guillaume

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 10:33           ` Guillaume Tucker
  0 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 10:33 UTC (permalink / raw)
  To: Ard Biesheuvel, Russell King - ARM Linux admin
  Cc: Geert Uytterhoeven, kernelci-results, Nicolas Pitre,
	Linus Walleij, Linux Kernel Mailing List, Linux ARM

On 04/02/2021 10:27, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
>>
>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>> <guillaume.tucker@collabora.com> wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>> Please see the bisection report below about a boot failure on
>>>> rk3288 with next-20210203.  It was also bisected on
>>>> imx6q-var-dt6customboard with next-20210202.
>>>>
>>>> Reports aren't automatically sent to the public while we're
>>>> trialing new bisection features on kernelci.org but this one
>>>> looks valid.
>>>>
>>>> The kernel is most likely crashing very early on, so there's
>>>> nothing in the logs.  Please let us know if you need some help
>>>> with debugging or trying a fix on these platforms.
>>>>
>>>
>>> Thanks for the report.
>>
>> Ard,
>>
>> I want to send my fixes branch today which includes your regression
>> fix that caused this regression.
>>
>> As this is proving difficult to fix, I can only drop your fix from
>> my fixes branch - and given that this seems to be problematical, I'm
>> tempted to revert the original change at this point which should fix
>> both of these regressions - and then we have another go at getting rid
>> of the set/way instructions during the next cycle.
>>
>> Thoughts?
>>
> 
> Hi Russell,
> 
> If Guillaume is willing to do the experiment, and it fixes the issue,

Yes, I'm running some tests with that fix now and should have
some results shortly.

> it proves that rk3288 is relying on the flush before the MMU is
> disabled, and so in that case, the fix is trivial, and we can just
> apply it.
> 
> If the experiment fails (which would mean rk3288 does not tolerate the
> cache maintenance being performed after cache off), it is going to be
> hairy, and so it will definitely take more time.
> 
> So in the latter case (or if Guillaume does not get back to us), I
> think reverting my queued fix is the only sane option. But in that
> case, may I suggest that we queue the revert of the original by-VA
> change for v5.12 so it gets lots of coverage in -next, and allows us
> an opportunity to come up with a proper fix in the same timeframe, and
> backport the revert and the subsequent fix as a pair? Otherwise, we'll
> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> has set/way, and v5.10y+ has by-va again. (I don't think we care about
> anything before that, given that v5.4 predates any of this)
> 
> But in the end, I'm happy to go along with whatever works best for you.

Thanks,
Guillaume

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 10:27         ` Ard Biesheuvel
@ 2021-02-04 10:47           ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 10:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Guillaume Tucker, Geert Uytterhoeven, Linux Kernel Mailing List,
	Linus Walleij, Linux ARM, Nicolas Pitre, kernelci-results

On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote:
> Hi Russell,
> 
> If Guillaume is willing to do the experiment, and it fixes the issue,
> it proves that rk3288 is relying on the flush before the MMU is
> disabled, and so in that case, the fix is trivial, and we can just
> apply it.
> 
> If the experiment fails (which would mean rk3288 does not tolerate the
> cache maintenance being performed after cache off), it is going to be
> hairy, and so it will definitely take more time.
> 
> So in the latter case (or if Guillaume does not get back to us), I
> think reverting my queued fix is the only sane option. But in that
> case, may I suggest that we queue the revert of the original by-VA
> change for v5.12 so it gets lots of coverage in -next, and allows us
> an opportunity to come up with a proper fix in the same timeframe, and
> backport the revert and the subsequent fix as a pair? Otherwise, we'll
> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> has set/way, and v5.10y+ has by-va again. (I don't think we care about
> anything before that, given that v5.4 predates any of this)

I'm suggesting dropping your fix (9052/1) and reverting
"ARM: decompressor: switch to by-VA cache maintenance for v7 cores"
which gets us to a point where _both_ regressions are fixed.

I'm of the opinion that the by-VA patch was incorrect when it was
merged (it caused a regression), and it's only a performance
improvement. Our attempts so far to fix it are just causing other
regressions. So, I think it is reasonable to revert both back to a
known good point which has worked over a decade. If doing so causes
regressions (which I think is unlikely), then that would be unfortunate
but alas is a price that's worth paying to get back to a known good
point - since then we're not stacking regression fixes on top of other
regression fixes.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 10:47           ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 10:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List,
	Linux ARM

On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote:
> Hi Russell,
> 
> If Guillaume is willing to do the experiment, and it fixes the issue,
> it proves that rk3288 is relying on the flush before the MMU is
> disabled, and so in that case, the fix is trivial, and we can just
> apply it.
> 
> If the experiment fails (which would mean rk3288 does not tolerate the
> cache maintenance being performed after cache off), it is going to be
> hairy, and so it will definitely take more time.
> 
> So in the latter case (or if Guillaume does not get back to us), I
> think reverting my queued fix is the only sane option. But in that
> case, may I suggest that we queue the revert of the original by-VA
> change for v5.12 so it gets lots of coverage in -next, and allows us
> an opportunity to come up with a proper fix in the same timeframe, and
> backport the revert and the subsequent fix as a pair? Otherwise, we'll
> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> has set/way, and v5.10y+ has by-va again. (I don't think we care about
> anything before that, given that v5.4 predates any of this)

I'm suggesting dropping your fix (9052/1) and reverting
"ARM: decompressor: switch to by-VA cache maintenance for v7 cores"
which gets us to a point where _both_ regressions are fixed.

I'm of the opinion that the by-VA patch was incorrect when it was
merged (it caused a regression), and it's only a performance
improvement. Our attempts so far to fix it are just causing other
regressions. So, I think it is reasonable to revert both back to a
known good point which has worked over a decade. If doing so causes
regressions (which I think is unlikely), then that would be unfortunate
but alas is a price that's worth paying to get back to a known good
point - since then we're not stacking regression fixes on top of other
regression fixes.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 10:47           ` Russell King - ARM Linux admin
@ 2021-02-04 10:55             ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 10:55 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Marc Zyngier
  Cc: Guillaume Tucker, Geert Uytterhoeven, Linux Kernel Mailing List,
	Linus Walleij, Linux ARM, Nicolas Pitre, kernelci-results

(cc Marc)

On Thu, 4 Feb 2021 at 11:48, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote:
> > Hi Russell,
> >
> > If Guillaume is willing to do the experiment, and it fixes the issue,
> > it proves that rk3288 is relying on the flush before the MMU is
> > disabled, and so in that case, the fix is trivial, and we can just
> > apply it.
> >
> > If the experiment fails (which would mean rk3288 does not tolerate the
> > cache maintenance being performed after cache off), it is going to be
> > hairy, and so it will definitely take more time.
> >
> > So in the latter case (or if Guillaume does not get back to us), I
> > think reverting my queued fix is the only sane option. But in that
> > case, may I suggest that we queue the revert of the original by-VA
> > change for v5.12 so it gets lots of coverage in -next, and allows us
> > an opportunity to come up with a proper fix in the same timeframe, and
> > backport the revert and the subsequent fix as a pair? Otherwise, we'll
> > end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> > has set/way, and v5.10y+ has by-va again. (I don't think we care about
> > anything before that, given that v5.4 predates any of this)
>
> I'm suggesting dropping your fix (9052/1) and reverting
> "ARM: decompressor: switch to by-VA cache maintenance for v7 cores"
> which gets us to a point where _both_ regressions are fixed.
>

I understand, but we don't know whether doing so might regress other
platforms that were added in the mean time.

> I'm of the opinion that the by-VA patch was incorrect when it was
> merged (it caused a regression), and it's only a performance
> improvement.

It is a correctness improvement, not a performance improvement.

Without that change, the 32-bit ARM kernel cannot boot bare metal on
platforms with a system cache such as 8040 or SynQuacer, and can only
boot under KVM on such systems because of the special handling of
set/way instructions by the host.

The performance issue related to set/way ops under KVM was already
fixed by describing data and unified caches as 1 set and 1 way when
running in 32-bit mode.


> Our attempts so far to fix it are just causing other
> regressions. So, I think it is reasonable to revert both back to a
> known good point which has worked over a decade. If doing so causes
> regressions (which I think is unlikely), then that would be unfortunate
> but alas is a price that's worth paying to get back to a known good
> point - since then we're not stacking regression fixes on top of other
> regression fixes.
>

This is exactly why I am proposing to queue the revert of the original
patch for v5.12, and only backport it to v5.10 and v5.11 once we are
sure it does not break anything else.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 10:55             ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 10:55 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Marc Zyngier
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List,
	Linux ARM

(cc Marc)

On Thu, 4 Feb 2021 at 11:48, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote:
> > Hi Russell,
> >
> > If Guillaume is willing to do the experiment, and it fixes the issue,
> > it proves that rk3288 is relying on the flush before the MMU is
> > disabled, and so in that case, the fix is trivial, and we can just
> > apply it.
> >
> > If the experiment fails (which would mean rk3288 does not tolerate the
> > cache maintenance being performed after cache off), it is going to be
> > hairy, and so it will definitely take more time.
> >
> > So in the latter case (or if Guillaume does not get back to us), I
> > think reverting my queued fix is the only sane option. But in that
> > case, may I suggest that we queue the revert of the original by-VA
> > change for v5.12 so it gets lots of coverage in -next, and allows us
> > an opportunity to come up with a proper fix in the same timeframe, and
> > backport the revert and the subsequent fix as a pair? Otherwise, we'll
> > end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> > has set/way, and v5.10y+ has by-va again. (I don't think we care about
> > anything before that, given that v5.4 predates any of this)
>
> I'm suggesting dropping your fix (9052/1) and reverting
> "ARM: decompressor: switch to by-VA cache maintenance for v7 cores"
> which gets us to a point where _both_ regressions are fixed.
>

I understand, but we don't know whether doing so might regress other
platforms that were added in the mean time.

> I'm of the opinion that the by-VA patch was incorrect when it was
> merged (it caused a regression), and it's only a performance
> improvement.

It is a correctness improvement, not a performance improvement.

Without that change, the 32-bit ARM kernel cannot boot bare metal on
platforms with a system cache such as 8040 or SynQuacer, and can only
boot under KVM on such systems because of the special handling of
set/way instructions by the host.

The performance issue related to set/way ops under KVM was already
fixed by describing data and unified caches as 1 set and 1 way when
running in 32-bit mode.


> Our attempts so far to fix it are just causing other
> regressions. So, I think it is reasonable to revert both back to a
> known good point which has worked over a decade. If doing so causes
> regressions (which I think is unlikely), then that would be unfortunate
> but alas is a price that's worth paying to get back to a known good
> point - since then we're not stacking regression fixes on top of other
> regression fixes.
>

This is exactly why I am proposing to queue the revert of the original
patch for v5.12, and only backport it to v5.10 and v5.11 once we are
sure it does not break anything else.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 10:33           ` Guillaume Tucker
@ 2021-02-04 11:32             ` Guillaume Tucker
  -1 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 11:32 UTC (permalink / raw)
  To: Ard Biesheuvel, Russell King - ARM Linux admin
  Cc: Geert Uytterhoeven, Linux Kernel Mailing List, Linus Walleij,
	Linux ARM, Nicolas Pitre, kernelci-results, clang-built-linux,
	Nick Desaulniers

On 04/02/2021 10:33, Guillaume Tucker wrote:
> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>> <linux@armlinux.org.uk> wrote:
>>>
>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>
>>>>> Hi Ard,
>>>>>
>>>>> Please see the bisection report below about a boot failure on
>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>
>>>>> Reports aren't automatically sent to the public while we're
>>>>> trialing new bisection features on kernelci.org but this one
>>>>> looks valid.
>>>>>
>>>>> The kernel is most likely crashing very early on, so there's
>>>>> nothing in the logs.  Please let us know if you need some help
>>>>> with debugging or trying a fix on these platforms.
>>>>>
>>>>
>>>> Thanks for the report.
>>>
>>> Ard,
>>>
>>> I want to send my fixes branch today which includes your regression
>>> fix that caused this regression.
>>>
>>> As this is proving difficult to fix, I can only drop your fix from
>>> my fixes branch - and given that this seems to be problematical, I'm
>>> tempted to revert the original change at this point which should fix
>>> both of these regressions - and then we have another go at getting rid
>>> of the set/way instructions during the next cycle.
>>>
>>> Thoughts?
>>>
>>
>> Hi Russell,
>>
>> If Guillaume is willing to do the experiment, and it fixes the issue,
> 
> Yes, I'm running some tests with that fix now and should have
> some results shortly.

Yes it does fix the issue:

  https://lava.collabora.co.uk/scheduler/job/3173819

with Ard's fix applied to this test branch:

  https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/


+clang +Nick

It's worth mentioning that the issue only happens with kernels
built with Clang.  As you can see there are several other arm
platforms failing with clang-11 builds but booting fine with
gcc-8:

  https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/

Here's a sample build log:

  https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log

Essentially:

  make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage

I believe it should be using the GNU assembler as LLVM_IAS=1 is
not defined, but there may be something more subtle about it.

Thanks,
Guillaume


>> it proves that rk3288 is relying on the flush before the MMU is
>> disabled, and so in that case, the fix is trivial, and we can just
>> apply it.
>>
>> If the experiment fails (which would mean rk3288 does not tolerate the
>> cache maintenance being performed after cache off), it is going to be
>> hairy, and so it will definitely take more time.
>>
>> So in the latter case (or if Guillaume does not get back to us), I
>> think reverting my queued fix is the only sane option. But in that
>> case, may I suggest that we queue the revert of the original by-VA
>> change for v5.12 so it gets lots of coverage in -next, and allows us
>> an opportunity to come up with a proper fix in the same timeframe, and
>> backport the revert and the subsequent fix as a pair? Otherwise, we'll
>> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
>> has set/way, and v5.10y+ has by-va again. (I don't think we care about
>> anything before that, given that v5.4 predates any of this)
>>
>> But in the end, I'm happy to go along with whatever works best for you.
> 
> Thanks,
> Guillaume
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 11:32             ` Guillaume Tucker
  0 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 11:32 UTC (permalink / raw)
  To: Ard Biesheuvel, Russell King - ARM Linux admin
  Cc: Geert Uytterhoeven, kernelci-results, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Linux Kernel Mailing List,
	clang-built-linux, Linux ARM

On 04/02/2021 10:33, Guillaume Tucker wrote:
> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>> <linux@armlinux.org.uk> wrote:
>>>
>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>
>>>>> Hi Ard,
>>>>>
>>>>> Please see the bisection report below about a boot failure on
>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>
>>>>> Reports aren't automatically sent to the public while we're
>>>>> trialing new bisection features on kernelci.org but this one
>>>>> looks valid.
>>>>>
>>>>> The kernel is most likely crashing very early on, so there's
>>>>> nothing in the logs.  Please let us know if you need some help
>>>>> with debugging or trying a fix on these platforms.
>>>>>
>>>>
>>>> Thanks for the report.
>>>
>>> Ard,
>>>
>>> I want to send my fixes branch today which includes your regression
>>> fix that caused this regression.
>>>
>>> As this is proving difficult to fix, I can only drop your fix from
>>> my fixes branch - and given that this seems to be problematical, I'm
>>> tempted to revert the original change at this point which should fix
>>> both of these regressions - and then we have another go at getting rid
>>> of the set/way instructions during the next cycle.
>>>
>>> Thoughts?
>>>
>>
>> Hi Russell,
>>
>> If Guillaume is willing to do the experiment, and it fixes the issue,
> 
> Yes, I'm running some tests with that fix now and should have
> some results shortly.

Yes it does fix the issue:

  https://lava.collabora.co.uk/scheduler/job/3173819

with Ard's fix applied to this test branch:

  https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/


+clang +Nick

It's worth mentioning that the issue only happens with kernels
built with Clang.  As you can see there are several other arm
platforms failing with clang-11 builds but booting fine with
gcc-8:

  https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/

Here's a sample build log:

  https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log

Essentially:

  make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage

I believe it should be using the GNU assembler as LLVM_IAS=1 is
not defined, but there may be something more subtle about it.

Thanks,
Guillaume


>> it proves that rk3288 is relying on the flush before the MMU is
>> disabled, and so in that case, the fix is trivial, and we can just
>> apply it.
>>
>> If the experiment fails (which would mean rk3288 does not tolerate the
>> cache maintenance being performed after cache off), it is going to be
>> hairy, and so it will definitely take more time.
>>
>> So in the latter case (or if Guillaume does not get back to us), I
>> think reverting my queued fix is the only sane option. But in that
>> case, may I suggest that we queue the revert of the original by-VA
>> change for v5.12 so it gets lots of coverage in -next, and allows us
>> an opportunity to come up with a proper fix in the same timeframe, and
>> backport the revert and the subsequent fix as a pair? Otherwise, we'll
>> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
>> has set/way, and v5.10y+ has by-va again. (I don't think we care about
>> anything before that, given that v5.4 predates any of this)
>>
>> But in the end, I'm happy to go along with whatever works best for you.
> 
> Thanks,
> Guillaume
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 11:32             ` Guillaume Tucker
@ 2021-02-04 11:44               ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 11:44 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Ard Biesheuvel, Geert Uytterhoeven, Linux Kernel Mailing List,
	Linus Walleij, Linux ARM, Nicolas Pitre, kernelci-results,
	clang-built-linux, Nick Desaulniers

On Thu, Feb 04, 2021 at 11:32:05AM +0000, Guillaume Tucker wrote:
> Yes it does fix the issue:
> 
>   https://lava.collabora.co.uk/scheduler/job/3173819
> 
> with Ard's fix applied to this test branch:
> 
>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
> 
> 
> +clang +Nick
> 
> It's worth mentioning that the issue only happens with kernels
> built with Clang.  As you can see there are several other arm
> platforms failing with clang-11 builds but booting fine with
> gcc-8:

My gut feeling is that it isn't Clang specific - it's likely down to
the exact code/data placement, how things end up during decompression,
and exactly what state the cache ends up in.

That certainly was the case with the original regression.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 11:44               ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 11:44 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Linux Kernel Mailing List,
	clang-built-linux, Ard Biesheuvel, Linux ARM

On Thu, Feb 04, 2021 at 11:32:05AM +0000, Guillaume Tucker wrote:
> Yes it does fix the issue:
> 
>   https://lava.collabora.co.uk/scheduler/job/3173819
> 
> with Ard's fix applied to this test branch:
> 
>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
> 
> 
> +clang +Nick
> 
> It's worth mentioning that the issue only happens with kernels
> built with Clang.  As you can see there are several other arm
> platforms failing with clang-11 builds but booting fine with
> gcc-8:

My gut feeling is that it isn't Clang specific - it's likely down to
the exact code/data placement, how things end up during decompression,
and exactly what state the cache ends up in.

That certainly was the case with the original regression.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 11:44               ` Russell King - ARM Linux admin
@ 2021-02-04 12:09                 ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 12:09 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Guillaume Tucker, Geert Uytterhoeven, Linux Kernel Mailing List,
	Linus Walleij, Linux ARM, Nicolas Pitre, kernelci-results,
	clang-built-linux, Nick Desaulniers

On Thu, 4 Feb 2021 at 12:45, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 11:32:05AM +0000, Guillaume Tucker wrote:
> > Yes it does fix the issue:
> >
> >   https://lava.collabora.co.uk/scheduler/job/3173819
> >
> > with Ard's fix applied to this test branch:
> >
> >   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
> >
> >
> > +clang +Nick
> >
> > It's worth mentioning that the issue only happens with kernels
> > built with Clang.  As you can see there are several other arm
> > platforms failing with clang-11 builds but booting fine with
> > gcc-8:
>
> My gut feeling is that it isn't Clang specific - it's likely down to
> the exact code/data placement, how things end up during decompression,
> and exactly what state the cache ends up in.
>
> That certainly was the case with the original regression.
>

Agreed.

So given that my queued fix turns this

cache_clean
cache_off

into this

cache_off
cache_clean

for v7 only, and considering that turning this into

cache_clean
cache_off
cache_clean

(as the diff tested by Guillaume does) fixes the reported issue, it
seems like the safest option to me at this point.

Reverting both patches, one of which has been in mainline since v5.7,
seems unwise to me at this point in the cycle.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 12:09                 ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 12:09 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Nick Desaulniers,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On Thu, 4 Feb 2021 at 12:45, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 11:32:05AM +0000, Guillaume Tucker wrote:
> > Yes it does fix the issue:
> >
> >   https://lava.collabora.co.uk/scheduler/job/3173819
> >
> > with Ard's fix applied to this test branch:
> >
> >   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
> >
> >
> > +clang +Nick
> >
> > It's worth mentioning that the issue only happens with kernels
> > built with Clang.  As you can see there are several other arm
> > platforms failing with clang-11 builds but booting fine with
> > gcc-8:
>
> My gut feeling is that it isn't Clang specific - it's likely down to
> the exact code/data placement, how things end up during decompression,
> and exactly what state the cache ends up in.
>
> That certainly was the case with the original regression.
>

Agreed.

So given that my queued fix turns this

cache_clean
cache_off

into this

cache_off
cache_clean

for v7 only, and considering that turning this into

cache_clean
cache_off
cache_clean

(as the diff tested by Guillaume does) fixes the reported issue, it
seems like the safest option to me at this point.

Reverting both patches, one of which has been in mainline since v5.7,
seems unwise to me at this point in the cycle.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 10:55             ` Ard Biesheuvel
@ 2021-02-04 12:26               ` Marc Zyngier
  -1 siblings, 0 replies; 56+ messages in thread
From: Marc Zyngier @ 2021-02-04 12:26 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King - ARM Linux admin, Guillaume Tucker,
	Geert Uytterhoeven, Linux Kernel Mailing List, Linus Walleij,
	Linux ARM, Nicolas Pitre, kernelci-results

On 2021-02-04 10:55, Ard Biesheuvel wrote:
> (cc Marc)
> 
> On Thu, 4 Feb 2021 at 11:48, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
>> 
>> On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote:
>> > Hi Russell,
>> >
>> > If Guillaume is willing to do the experiment, and it fixes the issue,
>> > it proves that rk3288 is relying on the flush before the MMU is
>> > disabled, and so in that case, the fix is trivial, and we can just
>> > apply it.
>> >
>> > If the experiment fails (which would mean rk3288 does not tolerate the
>> > cache maintenance being performed after cache off), it is going to be
>> > hairy, and so it will definitely take more time.
>> >
>> > So in the latter case (or if Guillaume does not get back to us), I
>> > think reverting my queued fix is the only sane option. But in that
>> > case, may I suggest that we queue the revert of the original by-VA
>> > change for v5.12 so it gets lots of coverage in -next, and allows us
>> > an opportunity to come up with a proper fix in the same timeframe, and
>> > backport the revert and the subsequent fix as a pair? Otherwise, we'll
>> > end up in the situation where v5.10.x until today has by-va, v5.10.x-y
>> > has set/way, and v5.10y+ has by-va again. (I don't think we care about
>> > anything before that, given that v5.4 predates any of this)
>> 
>> I'm suggesting dropping your fix (9052/1) and reverting
>> "ARM: decompressor: switch to by-VA cache maintenance for v7 cores"
>> which gets us to a point where _both_ regressions are fixed.
>> 
> 
> I understand, but we don't know whether doing so might regress other
> platforms that were added in the mean time.
> 
>> I'm of the opinion that the by-VA patch was incorrect when it was
>> merged (it caused a regression), and it's only a performance
>> improvement.
> 
> It is a correctness improvement, not a performance improvement.
> 
> Without that change, the 32-bit ARM kernel cannot boot bare metal on
> platforms with a system cache such as 8040 or SynQuacer, and can only
> boot under KVM on such systems because of the special handling of
> set/way instructions by the host.

I agree. With set/way CMOs, there is no way to reach the PoC if
it beyond the system cache, leading to an unbootable kernel.
This is actually pretty well documented in the architecture,
and it did bite us for the first time on XGene-1, 7 years ago.

In retrospect, having KVM to handle set/way CMOs in was a mistake,
as it just papered over the problem for the sake of running older
32bit guests. It violated the principle of KVM/arm being strictly
architectural and provided unrealistic expectations. I'll take the
blame for this.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 12:26               ` Marc Zyngier
  0 siblings, 0 replies; 56+ messages in thread
From: Marc Zyngier @ 2021-02-04 12:26 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, Linux ARM

On 2021-02-04 10:55, Ard Biesheuvel wrote:
> (cc Marc)
> 
> On Thu, 4 Feb 2021 at 11:48, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
>> 
>> On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote:
>> > Hi Russell,
>> >
>> > If Guillaume is willing to do the experiment, and it fixes the issue,
>> > it proves that rk3288 is relying on the flush before the MMU is
>> > disabled, and so in that case, the fix is trivial, and we can just
>> > apply it.
>> >
>> > If the experiment fails (which would mean rk3288 does not tolerate the
>> > cache maintenance being performed after cache off), it is going to be
>> > hairy, and so it will definitely take more time.
>> >
>> > So in the latter case (or if Guillaume does not get back to us), I
>> > think reverting my queued fix is the only sane option. But in that
>> > case, may I suggest that we queue the revert of the original by-VA
>> > change for v5.12 so it gets lots of coverage in -next, and allows us
>> > an opportunity to come up with a proper fix in the same timeframe, and
>> > backport the revert and the subsequent fix as a pair? Otherwise, we'll
>> > end up in the situation where v5.10.x until today has by-va, v5.10.x-y
>> > has set/way, and v5.10y+ has by-va again. (I don't think we care about
>> > anything before that, given that v5.4 predates any of this)
>> 
>> I'm suggesting dropping your fix (9052/1) and reverting
>> "ARM: decompressor: switch to by-VA cache maintenance for v7 cores"
>> which gets us to a point where _both_ regressions are fixed.
>> 
> 
> I understand, but we don't know whether doing so might regress other
> platforms that were added in the mean time.
> 
>> I'm of the opinion that the by-VA patch was incorrect when it was
>> merged (it caused a regression), and it's only a performance
>> improvement.
> 
> It is a correctness improvement, not a performance improvement.
> 
> Without that change, the 32-bit ARM kernel cannot boot bare metal on
> platforms with a system cache such as 8040 or SynQuacer, and can only
> boot under KVM on such systems because of the special handling of
> set/way instructions by the host.

I agree. With set/way CMOs, there is no way to reach the PoC if
it beyond the system cache, leading to an unbootable kernel.
This is actually pretty well documented in the architecture,
and it did bite us for the first time on XGene-1, 7 years ago.

In retrospect, having KVM to handle set/way CMOs in was a mistake,
as it just papered over the problem for the sake of running older
32bit guests. It violated the principle of KVM/arm being strictly
architectural and provided unrealistic expectations. I'll take the
blame for this.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 12:26               ` Marc Zyngier
@ 2021-02-04 14:09                 ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 14:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ard Biesheuvel, Guillaume Tucker, Geert Uytterhoeven,
	Linux Kernel Mailing List, Linus Walleij, Linux ARM,
	Nicolas Pitre, kernelci-results

On Thu, Feb 04, 2021 at 12:26:44PM +0000, Marc Zyngier wrote:
> I agree. With set/way CMOs, there is no way to reach the PoC if
> it beyond the system cache, leading to an unbootable kernel.
> This is actually pretty well documented in the architecture,
> and it did bite us for the first time on XGene-1, 7 years ago.

That may be, however we still do set/way maintenance to invalidate
the L1 cache as that is required for ARMv7 to place the cache into
a known state, as stated by the architecture reference manual.

Arguably, that should be done by firmware, but when starting
secondary CPUs, there are platforms out there which do not bring
the L1 cache to a defined state. So we are pretty much stuck with
doing set/way operations during CPU initialisation in the main
kernel.

If ARMv8 decides that this is not supportable, then that's a matter
for ARMv8 to address without impacting the requirements of ARMv7.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 14:09                 ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 14:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List,
	Ard Biesheuvel, Linux ARM

On Thu, Feb 04, 2021 at 12:26:44PM +0000, Marc Zyngier wrote:
> I agree. With set/way CMOs, there is no way to reach the PoC if
> it beyond the system cache, leading to an unbootable kernel.
> This is actually pretty well documented in the architecture,
> and it did bite us for the first time on XGene-1, 7 years ago.

That may be, however we still do set/way maintenance to invalidate
the L1 cache as that is required for ARMv7 to place the cache into
a known state, as stated by the architecture reference manual.

Arguably, that should be done by firmware, but when starting
secondary CPUs, there are platforms out there which do not bring
the L1 cache to a defined state. So we are pretty much stuck with
doing set/way operations during CPU initialisation in the main
kernel.

If ARMv8 decides that this is not supportable, then that's a matter
for ARMv8 to address without impacting the requirements of ARMv7.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 14:09                 ` Russell King - ARM Linux admin
@ 2021-02-04 14:25                   ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 14:25 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Marc Zyngier, Guillaume Tucker, Geert Uytterhoeven,
	Linux Kernel Mailing List, Linus Walleij, Linux ARM,
	Nicolas Pitre, kernelci-results

On Thu, 4 Feb 2021 at 15:09, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 12:26:44PM +0000, Marc Zyngier wrote:
> > I agree. With set/way CMOs, there is no way to reach the PoC if
> > it beyond the system cache, leading to an unbootable kernel.
> > This is actually pretty well documented in the architecture,
> > and it did bite us for the first time on XGene-1, 7 years ago.
>
> That may be, however we still do set/way maintenance to invalidate
> the L1 cache as that is required for ARMv7 to place the cache into
> a known state, as stated by the architecture reference manual.
>

Getting a certain cache at a certain level into a known state is a
valid use of set/way ops, and is simply unnecessary when running under
virtualization, but doesn't do any harm.

Pushing contents of the cache hierarchy to main memory is *not* a
valid use of set/way ops, and so there is no point in pretending that
set/way ops will produce the same results as by-VA ops. Only the by-VA
ops give the architectural guarantees that we rely on for correctness.

> Arguably, that should be done by firmware, but when starting
> secondary CPUs, there are platforms out there which do not bring
> the L1 cache to a defined state. So we are pretty much stuck with
> doing set/way operations during CPU initialisation in the main
> kernel.
>

Indeed. And this is unfortunate, but not the end of the world.

> If ARMv8 decides that this is not supportable, then that's a matter
> for ARMv8 to address without impacting the requirements of ARMv7.
>

I'm not sure what you mean here. The v7 architecture is crystal clear
about the difference between set/way ops (managing a single cache),
and by-VA ops (managing the 'cachedness' state of a memory region).
The semantics are radically different, regardless of v7 vs v8 or
AArch32 vs AArch64.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 14:25                   ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 14:25 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Marc Zyngier, Linus Walleij, Linux Kernel Mailing List,
	Guillaume Tucker, Linux ARM

On Thu, 4 Feb 2021 at 15:09, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 12:26:44PM +0000, Marc Zyngier wrote:
> > I agree. With set/way CMOs, there is no way to reach the PoC if
> > it beyond the system cache, leading to an unbootable kernel.
> > This is actually pretty well documented in the architecture,
> > and it did bite us for the first time on XGene-1, 7 years ago.
>
> That may be, however we still do set/way maintenance to invalidate
> the L1 cache as that is required for ARMv7 to place the cache into
> a known state, as stated by the architecture reference manual.
>

Getting a certain cache at a certain level into a known state is a
valid use of set/way ops, and is simply unnecessary when running under
virtualization, but doesn't do any harm.

Pushing contents of the cache hierarchy to main memory is *not* a
valid use of set/way ops, and so there is no point in pretending that
set/way ops will produce the same results as by-VA ops. Only the by-VA
ops give the architectural guarantees that we rely on for correctness.

> Arguably, that should be done by firmware, but when starting
> secondary CPUs, there are platforms out there which do not bring
> the L1 cache to a defined state. So we are pretty much stuck with
> doing set/way operations during CPU initialisation in the main
> kernel.
>

Indeed. And this is unfortunate, but not the end of the world.

> If ARMv8 decides that this is not supportable, then that's a matter
> for ARMv8 to address without impacting the requirements of ARMv7.
>

I'm not sure what you mean here. The v7 architecture is crystal clear
about the difference between set/way ops (managing a single cache),
and by-VA ops (managing the 'cachedness' state of a memory region).
The semantics are radically different, regardless of v7 vs v8 or
AArch32 vs AArch64.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 14:25                   ` Ard Biesheuvel
@ 2021-02-04 14:36                     ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 14:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Marc Zyngier, Guillaume Tucker, Geert Uytterhoeven,
	Linux Kernel Mailing List, Linus Walleij, Linux ARM,
	Nicolas Pitre, kernelci-results

On Thu, Feb 04, 2021 at 03:25:20PM +0100, Ard Biesheuvel wrote:
> Pushing contents of the cache hierarchy to main memory is *not* a
> valid use of set/way ops, and so there is no point in pretending that
> set/way ops will produce the same results as by-VA ops. Only the by-VA
> ops give the architectural guarantees that we rely on for correctness.

... yet we /were/ doing that, and it worked fine for 13 years - from
1st June 2007 until the by-VA merge into mainline on the 3rd April
2020.

You may be right that it wasn't the most correct way, but it worked
for those 13 years without issue, and it's only recently that it's
become a problem, and trying to "fix" that introduced a regression,
and fixing that regression has caused another regression... and I
what I'm wondering is how many more regression fixing cycles it's
going to take - how many regression fixes on top of other regression
fixes are we going to end up seeing here.

The fact is, we never properly understood why your patch caused the
regression I was seeing. If we don't understand it, then we can never
say that we've fixed the problem properly. That is highlighted by the
fact that fixing the regression I was seeing has caused another
regression.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 14:36                     ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 14:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Marc Zyngier, Linus Walleij, Linux Kernel Mailing List,
	Guillaume Tucker, Linux ARM

On Thu, Feb 04, 2021 at 03:25:20PM +0100, Ard Biesheuvel wrote:
> Pushing contents of the cache hierarchy to main memory is *not* a
> valid use of set/way ops, and so there is no point in pretending that
> set/way ops will produce the same results as by-VA ops. Only the by-VA
> ops give the architectural guarantees that we rely on for correctness.

... yet we /were/ doing that, and it worked fine for 13 years - from
1st June 2007 until the by-VA merge into mainline on the 3rd April
2020.

You may be right that it wasn't the most correct way, but it worked
for those 13 years without issue, and it's only recently that it's
become a problem, and trying to "fix" that introduced a regression,
and fixing that regression has caused another regression... and I
what I'm wondering is how many more regression fixing cycles it's
going to take - how many regression fixes on top of other regression
fixes are we going to end up seeing here.

The fact is, we never properly understood why your patch caused the
regression I was seeing. If we don't understand it, then we can never
say that we've fixed the problem properly. That is highlighted by the
fact that fixing the regression I was seeing has caused another
regression.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 11:32             ` Guillaume Tucker
@ 2021-02-04 15:42               ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 15:42 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Russell King - ARM Linux admin, Geert Uytterhoeven,
	Linux Kernel Mailing List, Linus Walleij, Linux ARM,
	Nicolas Pitre, kernelci-results, clang-built-linux,
	Nick Desaulniers

On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> On 04/02/2021 10:33, Guillaume Tucker wrote:
> > On 04/02/2021 10:27, Ard Biesheuvel wrote:
> >> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
> >> <linux@armlinux.org.uk> wrote:
> >>>
> >>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> >>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> >>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>
> >>>>> Hi Ard,
> >>>>>
> >>>>> Please see the bisection report below about a boot failure on
> >>>>> rk3288 with next-20210203.  It was also bisected on
> >>>>> imx6q-var-dt6customboard with next-20210202.
> >>>>>
> >>>>> Reports aren't automatically sent to the public while we're
> >>>>> trialing new bisection features on kernelci.org but this one
> >>>>> looks valid.
> >>>>>
> >>>>> The kernel is most likely crashing very early on, so there's
> >>>>> nothing in the logs.  Please let us know if you need some help
> >>>>> with debugging or trying a fix on these platforms.
> >>>>>
> >>>>
> >>>> Thanks for the report.
> >>>
> >>> Ard,
> >>>
> >>> I want to send my fixes branch today which includes your regression
> >>> fix that caused this regression.
> >>>
> >>> As this is proving difficult to fix, I can only drop your fix from
> >>> my fixes branch - and given that this seems to be problematical, I'm
> >>> tempted to revert the original change at this point which should fix
> >>> both of these regressions - and then we have another go at getting rid
> >>> of the set/way instructions during the next cycle.
> >>>
> >>> Thoughts?
> >>>
> >>
> >> Hi Russell,
> >>
> >> If Guillaume is willing to do the experiment, and it fixes the issue,
> >
> > Yes, I'm running some tests with that fix now and should have
> > some results shortly.
>
> Yes it does fix the issue:
>
>   https://lava.collabora.co.uk/scheduler/job/3173819
>
> with Ard's fix applied to this test branch:
>
>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
>
>
> +clang +Nick
>
> It's worth mentioning that the issue only happens with kernels
> built with Clang.  As you can see there are several other arm
> platforms failing with clang-11 builds but booting fine with
> gcc-8:
>
>   https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
>
> Here's a sample build log:
>
>   https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
>
> Essentially:
>
>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>
> I believe it should be using the GNU assembler as LLVM_IAS=1 is
> not defined, but there may be something more subtle about it.
>


Do you have a link for a failing zImage built from multi_v7_defconfig?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 15:42               ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 15:42 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> On 04/02/2021 10:33, Guillaume Tucker wrote:
> > On 04/02/2021 10:27, Ard Biesheuvel wrote:
> >> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
> >> <linux@armlinux.org.uk> wrote:
> >>>
> >>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> >>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> >>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>
> >>>>> Hi Ard,
> >>>>>
> >>>>> Please see the bisection report below about a boot failure on
> >>>>> rk3288 with next-20210203.  It was also bisected on
> >>>>> imx6q-var-dt6customboard with next-20210202.
> >>>>>
> >>>>> Reports aren't automatically sent to the public while we're
> >>>>> trialing new bisection features on kernelci.org but this one
> >>>>> looks valid.
> >>>>>
> >>>>> The kernel is most likely crashing very early on, so there's
> >>>>> nothing in the logs.  Please let us know if you need some help
> >>>>> with debugging or trying a fix on these platforms.
> >>>>>
> >>>>
> >>>> Thanks for the report.
> >>>
> >>> Ard,
> >>>
> >>> I want to send my fixes branch today which includes your regression
> >>> fix that caused this regression.
> >>>
> >>> As this is proving difficult to fix, I can only drop your fix from
> >>> my fixes branch - and given that this seems to be problematical, I'm
> >>> tempted to revert the original change at this point which should fix
> >>> both of these regressions - and then we have another go at getting rid
> >>> of the set/way instructions during the next cycle.
> >>>
> >>> Thoughts?
> >>>
> >>
> >> Hi Russell,
> >>
> >> If Guillaume is willing to do the experiment, and it fixes the issue,
> >
> > Yes, I'm running some tests with that fix now and should have
> > some results shortly.
>
> Yes it does fix the issue:
>
>   https://lava.collabora.co.uk/scheduler/job/3173819
>
> with Ard's fix applied to this test branch:
>
>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
>
>
> +clang +Nick
>
> It's worth mentioning that the issue only happens with kernels
> built with Clang.  As you can see there are several other arm
> platforms failing with clang-11 builds but booting fine with
> gcc-8:
>
>   https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
>
> Here's a sample build log:
>
>   https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
>
> Essentially:
>
>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>
> I believe it should be using the GNU assembler as LLVM_IAS=1 is
> not defined, but there may be something more subtle about it.
>


Do you have a link for a failing zImage built from multi_v7_defconfig?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 14:36                     ` Russell King - ARM Linux admin
@ 2021-02-04 15:52                       ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 15:52 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Marc Zyngier, Guillaume Tucker, Geert Uytterhoeven,
	Linux Kernel Mailing List, Linus Walleij, Linux ARM,
	Nicolas Pitre, kernelci-results

On Thu, 4 Feb 2021 at 15:36, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 03:25:20PM +0100, Ard Biesheuvel wrote:
> > Pushing contents of the cache hierarchy to main memory is *not* a
> > valid use of set/way ops, and so there is no point in pretending that
> > set/way ops will produce the same results as by-VA ops. Only the by-VA
> > ops give the architectural guarantees that we rely on for correctness.
>
> ... yet we /were/ doing that, and it worked fine for 13 years - from
> 1st June 2007 until the by-VA merge into mainline on the 3rd April
> 2020.
>
> You may be right that it wasn't the most correct way, but it worked
> for those 13 years without issue, and it's only recently that it's
> become a problem, and trying to "fix" that introduced a regression,
> and fixing that regression has caused another regression... and I
> what I'm wondering is how many more regression fixing cycles it's
> going to take - how many regression fixes on top of other regression
> fixes are we going to end up seeing here.
>
> The fact is, we never properly understood why your patch caused the
> regression I was seeing. If we don't understand it, then we can never
> say that we've fixed the problem properly. That is highlighted by the
> fact that fixing the regression I was seeing has caused another
> regression.
>

I agree with all these points.

But as I pointed out, reverting the original by-VA change, which has
been there for almost a year now, has some risks of its own. If we
don't understand the details of how this is broken, how can we be sure
we don't break something else if we revert it again?

So I'm not arguing that reverting the original patch is unreasonable,
just that doing so at this point in the cycle is risky, and that it
would be better to queue the revert for v5.12, and only backport it
after some soak time in -next. And in a sense, reinstating the
cache_clean() before cache_off() already amounts to a partial revert
of the queued fix.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 15:52                       ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 15:52 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Marc Zyngier, Linus Walleij, Linux Kernel Mailing List,
	Guillaume Tucker, Linux ARM

On Thu, 4 Feb 2021 at 15:36, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Feb 04, 2021 at 03:25:20PM +0100, Ard Biesheuvel wrote:
> > Pushing contents of the cache hierarchy to main memory is *not* a
> > valid use of set/way ops, and so there is no point in pretending that
> > set/way ops will produce the same results as by-VA ops. Only the by-VA
> > ops give the architectural guarantees that we rely on for correctness.
>
> ... yet we /were/ doing that, and it worked fine for 13 years - from
> 1st June 2007 until the by-VA merge into mainline on the 3rd April
> 2020.
>
> You may be right that it wasn't the most correct way, but it worked
> for those 13 years without issue, and it's only recently that it's
> become a problem, and trying to "fix" that introduced a regression,
> and fixing that regression has caused another regression... and I
> what I'm wondering is how many more regression fixing cycles it's
> going to take - how many regression fixes on top of other regression
> fixes are we going to end up seeing here.
>
> The fact is, we never properly understood why your patch caused the
> regression I was seeing. If we don't understand it, then we can never
> say that we've fixed the problem properly. That is highlighted by the
> fact that fixing the regression I was seeing has caused another
> regression.
>

I agree with all these points.

But as I pointed out, reverting the original by-VA change, which has
been there for almost a year now, has some risks of its own. If we
don't understand the details of how this is broken, how can we be sure
we don't break something else if we revert it again?

So I'm not arguing that reverting the original patch is unreasonable,
just that doing so at this point in the cycle is risky, and that it
would be better to queue the revert for v5.12, and only backport it
after some soak time in -next. And in a sense, reinstating the
cache_clean() before cache_off() already amounts to a partial revert
of the queued fix.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 15:42               ` Ard Biesheuvel
@ 2021-02-04 15:53                 ` Guillaume Tucker
  -1 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 15:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King - ARM Linux admin, Geert Uytterhoeven,
	Linux Kernel Mailing List, Linus Walleij, Linux ARM,
	Nicolas Pitre, kernelci-results, clang-built-linux,
	Nick Desaulniers

On 04/02/2021 15:42, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>>
>> On 04/02/2021 10:33, Guillaume Tucker wrote:
>>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>>>> <linux@armlinux.org.uk> wrote:
>>>>>
>>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>
>>>>>>> Hi Ard,
>>>>>>>
>>>>>>> Please see the bisection report below about a boot failure on
>>>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>>>
>>>>>>> Reports aren't automatically sent to the public while we're
>>>>>>> trialing new bisection features on kernelci.org but this one
>>>>>>> looks valid.
>>>>>>>
>>>>>>> The kernel is most likely crashing very early on, so there's
>>>>>>> nothing in the logs.  Please let us know if you need some help
>>>>>>> with debugging or trying a fix on these platforms.
>>>>>>>
>>>>>>
>>>>>> Thanks for the report.
>>>>>
>>>>> Ard,
>>>>>
>>>>> I want to send my fixes branch today which includes your regression
>>>>> fix that caused this regression.
>>>>>
>>>>> As this is proving difficult to fix, I can only drop your fix from
>>>>> my fixes branch - and given that this seems to be problematical, I'm
>>>>> tempted to revert the original change at this point which should fix
>>>>> both of these regressions - and then we have another go at getting rid
>>>>> of the set/way instructions during the next cycle.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>
>>>> Hi Russell,
>>>>
>>>> If Guillaume is willing to do the experiment, and it fixes the issue,
>>>
>>> Yes, I'm running some tests with that fix now and should have
>>> some results shortly.
>>
>> Yes it does fix the issue:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3173819
>>
>> with Ard's fix applied to this test branch:
>>
>>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
>>
>>
>> +clang +Nick
>>
>> It's worth mentioning that the issue only happens with kernels
>> built with Clang.  As you can see there are several other arm
>> platforms failing with clang-11 builds but booting fine with
>> gcc-8:
>>
>>   https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
>>
>> Here's a sample build log:
>>
>>   https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
>>
>> Essentially:
>>
>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>
>> I believe it should be using the GNU assembler as LLVM_IAS=1 is
>> not defined, but there may be something more subtle about it.
>>
> 
> 
> Do you have a link for a failing zImage built from multi_v7_defconfig?

Sure, this one was built from a plain next-20210203:

  http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage

You can also find the dtbs, modules and other things in that same
directory.

For the record, here's the test job that used it:

  https://lava.collabora.co.uk/scheduler/job/3173792

Guillaume

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 15:53                 ` Guillaume Tucker
  0 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 15:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On 04/02/2021 15:42, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>>
>> On 04/02/2021 10:33, Guillaume Tucker wrote:
>>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>>>> <linux@armlinux.org.uk> wrote:
>>>>>
>>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>
>>>>>>> Hi Ard,
>>>>>>>
>>>>>>> Please see the bisection report below about a boot failure on
>>>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>>>
>>>>>>> Reports aren't automatically sent to the public while we're
>>>>>>> trialing new bisection features on kernelci.org but this one
>>>>>>> looks valid.
>>>>>>>
>>>>>>> The kernel is most likely crashing very early on, so there's
>>>>>>> nothing in the logs.  Please let us know if you need some help
>>>>>>> with debugging or trying a fix on these platforms.
>>>>>>>
>>>>>>
>>>>>> Thanks for the report.
>>>>>
>>>>> Ard,
>>>>>
>>>>> I want to send my fixes branch today which includes your regression
>>>>> fix that caused this regression.
>>>>>
>>>>> As this is proving difficult to fix, I can only drop your fix from
>>>>> my fixes branch - and given that this seems to be problematical, I'm
>>>>> tempted to revert the original change at this point which should fix
>>>>> both of these regressions - and then we have another go at getting rid
>>>>> of the set/way instructions during the next cycle.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>
>>>> Hi Russell,
>>>>
>>>> If Guillaume is willing to do the experiment, and it fixes the issue,
>>>
>>> Yes, I'm running some tests with that fix now and should have
>>> some results shortly.
>>
>> Yes it does fix the issue:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3173819
>>
>> with Ard's fix applied to this test branch:
>>
>>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
>>
>>
>> +clang +Nick
>>
>> It's worth mentioning that the issue only happens with kernels
>> built with Clang.  As you can see there are several other arm
>> platforms failing with clang-11 builds but booting fine with
>> gcc-8:
>>
>>   https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
>>
>> Here's a sample build log:
>>
>>   https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
>>
>> Essentially:
>>
>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>
>> I believe it should be using the GNU assembler as LLVM_IAS=1 is
>> not defined, but there may be something more subtle about it.
>>
> 
> 
> Do you have a link for a failing zImage built from multi_v7_defconfig?

Sure, this one was built from a plain next-20210203:

  http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage

You can also find the dtbs, modules and other things in that same
directory.

For the record, here's the test job that used it:

  https://lava.collabora.co.uk/scheduler/job/3173792

Guillaume

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 15:53                 ` Guillaume Tucker
@ 2021-02-04 16:01                   ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 16:01 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Russell King - ARM Linux admin, Geert Uytterhoeven,
	Linux Kernel Mailing List, Linus Walleij, Linux ARM,
	Nicolas Pitre, kernelci-results, clang-built-linux,
	Nick Desaulniers

On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > <guillaume.tucker@collabora.com> wrote:
> >>
> >> On 04/02/2021 10:33, Guillaume Tucker wrote:
> >>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
> >>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
> >>>> <linux@armlinux.org.uk> wrote:
> >>>>>
> >>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> >>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> >>>>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>>>
> >>>>>>> Hi Ard,
> >>>>>>>
> >>>>>>> Please see the bisection report below about a boot failure on
> >>>>>>> rk3288 with next-20210203.  It was also bisected on
> >>>>>>> imx6q-var-dt6customboard with next-20210202.
> >>>>>>>
> >>>>>>> Reports aren't automatically sent to the public while we're
> >>>>>>> trialing new bisection features on kernelci.org but this one
> >>>>>>> looks valid.
> >>>>>>>
> >>>>>>> The kernel is most likely crashing very early on, so there's
> >>>>>>> nothing in the logs.  Please let us know if you need some help
> >>>>>>> with debugging or trying a fix on these platforms.
> >>>>>>>
> >>>>>>
> >>>>>> Thanks for the report.
> >>>>>
> >>>>> Ard,
> >>>>>
> >>>>> I want to send my fixes branch today which includes your regression
> >>>>> fix that caused this regression.
> >>>>>
> >>>>> As this is proving difficult to fix, I can only drop your fix from
> >>>>> my fixes branch - and given that this seems to be problematical, I'm
> >>>>> tempted to revert the original change at this point which should fix
> >>>>> both of these regressions - and then we have another go at getting rid
> >>>>> of the set/way instructions during the next cycle.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>
> >>>> Hi Russell,
> >>>>
> >>>> If Guillaume is willing to do the experiment, and it fixes the issue,
> >>>
> >>> Yes, I'm running some tests with that fix now and should have
> >>> some results shortly.
> >>
> >> Yes it does fix the issue:
> >>
> >>   https://lava.collabora.co.uk/scheduler/job/3173819
> >>
> >> with Ard's fix applied to this test branch:
> >>
> >>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
> >>
> >>
> >> +clang +Nick
> >>
> >> It's worth mentioning that the issue only happens with kernels
> >> built with Clang.  As you can see there are several other arm
> >> platforms failing with clang-11 builds but booting fine with
> >> gcc-8:
> >>
> >>   https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
> >>
> >> Here's a sample build log:
> >>
> >>   https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
> >>
> >> Essentially:
> >>
> >>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>
> >> I believe it should be using the GNU assembler as LLVM_IAS=1 is
> >> not defined, but there may be something more subtle about it.
> >>
> >
> >
> > Do you have a link for a failing zImage built from multi_v7_defconfig?
>
> Sure, this one was built from a plain next-20210203:
>
>   http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage
>
> You can also find the dtbs, modules and other things in that same
> directory.
>
> For the record, here's the test job that used it:
>
>   https://lava.collabora.co.uk/scheduler/job/3173792
>

Thanks.

That zImage boots fine locally. Unfortunately, I don't have rk3288
hardware to reproduce.

Could you please point me to the list of all the other platforms that
failed to boot this image?

To be honest, I am slightly annoyed that a change that works fine with
GCC but does not work with Clang version

11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158

(where exp means experimental, I suppose) is the reason for this
discussion, especially because the change is in asm code. Is it
possible to build with Clang but use the GNU linker?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 16:01                   ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-04 16:01 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > <guillaume.tucker@collabora.com> wrote:
> >>
> >> On 04/02/2021 10:33, Guillaume Tucker wrote:
> >>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
> >>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
> >>>> <linux@armlinux.org.uk> wrote:
> >>>>>
> >>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> >>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> >>>>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>>>
> >>>>>>> Hi Ard,
> >>>>>>>
> >>>>>>> Please see the bisection report below about a boot failure on
> >>>>>>> rk3288 with next-20210203.  It was also bisected on
> >>>>>>> imx6q-var-dt6customboard with next-20210202.
> >>>>>>>
> >>>>>>> Reports aren't automatically sent to the public while we're
> >>>>>>> trialing new bisection features on kernelci.org but this one
> >>>>>>> looks valid.
> >>>>>>>
> >>>>>>> The kernel is most likely crashing very early on, so there's
> >>>>>>> nothing in the logs.  Please let us know if you need some help
> >>>>>>> with debugging or trying a fix on these platforms.
> >>>>>>>
> >>>>>>
> >>>>>> Thanks for the report.
> >>>>>
> >>>>> Ard,
> >>>>>
> >>>>> I want to send my fixes branch today which includes your regression
> >>>>> fix that caused this regression.
> >>>>>
> >>>>> As this is proving difficult to fix, I can only drop your fix from
> >>>>> my fixes branch - and given that this seems to be problematical, I'm
> >>>>> tempted to revert the original change at this point which should fix
> >>>>> both of these regressions - and then we have another go at getting rid
> >>>>> of the set/way instructions during the next cycle.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>
> >>>> Hi Russell,
> >>>>
> >>>> If Guillaume is willing to do the experiment, and it fixes the issue,
> >>>
> >>> Yes, I'm running some tests with that fix now and should have
> >>> some results shortly.
> >>
> >> Yes it does fix the issue:
> >>
> >>   https://lava.collabora.co.uk/scheduler/job/3173819
> >>
> >> with Ard's fix applied to this test branch:
> >>
> >>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
> >>
> >>
> >> +clang +Nick
> >>
> >> It's worth mentioning that the issue only happens with kernels
> >> built with Clang.  As you can see there are several other arm
> >> platforms failing with clang-11 builds but booting fine with
> >> gcc-8:
> >>
> >>   https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
> >>
> >> Here's a sample build log:
> >>
> >>   https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
> >>
> >> Essentially:
> >>
> >>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>
> >> I believe it should be using the GNU assembler as LLVM_IAS=1 is
> >> not defined, but there may be something more subtle about it.
> >>
> >
> >
> > Do you have a link for a failing zImage built from multi_v7_defconfig?
>
> Sure, this one was built from a plain next-20210203:
>
>   http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage
>
> You can also find the dtbs, modules and other things in that same
> directory.
>
> For the record, here's the test job that used it:
>
>   https://lava.collabora.co.uk/scheduler/job/3173792
>

Thanks.

That zImage boots fine locally. Unfortunately, I don't have rk3288
hardware to reproduce.

Could you please point me to the list of all the other platforms that
failed to boot this image?

To be honest, I am slightly annoyed that a change that works fine with
GCC but does not work with Clang version

11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158

(where exp means experimental, I suppose) is the reason for this
discussion, especially because the change is in asm code. Is it
possible to build with Clang but use the GNU linker?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 16:01                   ` Ard Biesheuvel
@ 2021-02-04 18:06                     ` Nick Desaulniers
  -1 siblings, 0 replies; 56+ messages in thread
From: Nick Desaulniers @ 2021-02-04 18:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Guillaume Tucker, Russell King - ARM Linux admin,
	Geert Uytterhoeven, Linux Kernel Mailing List, Linus Walleij,
	Linux ARM, Nicolas Pitre, kernelci-results, clang-built-linux

On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > > <guillaume.tucker@collabora.com> wrote:
> > >>
> > >> Essentially:
> > >>
> > >>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage

This command should link with BFD (and assemble with GAS; it's only
using clang as the compiler.

>
> To be honest, I am slightly annoyed that a change that works fine with
> GCC but does not work with Clang version
>
> 11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158
>
> (where exp means experimental, I suppose) is the reason for this
> discussion, especially because the change is in asm code. Is it
> possible to build with Clang but use the GNU linker?

rk3288 might be the last 32b ARM platform ChromeOS uses. "veyron"
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 18:06                     ` Nick Desaulniers
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Desaulniers @ 2021-02-04 18:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > > <guillaume.tucker@collabora.com> wrote:
> > >>
> > >> Essentially:
> > >>
> > >>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage

This command should link with BFD (and assemble with GAS; it's only
using clang as the compiler.

>
> To be honest, I am slightly annoyed that a change that works fine with
> GCC but does not work with Clang version
>
> 11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158
>
> (where exp means experimental, I suppose) is the reason for this
> discussion, especially because the change is in asm code. Is it
> possible to build with Clang but use the GNU linker?

rk3288 might be the last 32b ARM platform ChromeOS uses. "veyron"
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 18:06                     ` Nick Desaulniers
@ 2021-02-04 18:12                       ` Nathan Chancellor
  -1 siblings, 0 replies; 56+ messages in thread
From: Nathan Chancellor @ 2021-02-04 18:12 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Ard Biesheuvel, Guillaume Tucker, Russell King - ARM Linux admin,
	Geert Uytterhoeven, Linux Kernel Mailing List, Linus Walleij,
	Linux ARM, Nicolas Pitre, kernelci-results, clang-built-linux

On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> > <guillaume.tucker@collabora.com> wrote:
> > >
> > > On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > > > <guillaume.tucker@collabora.com> wrote:
> > > >>
> > > >> Essentially:
> > > >>
> > > >>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> 
> This command should link with BFD (and assemble with GAS; it's only
> using clang as the compiler.

I think you missed the 'LLVM=1' before CC="ccache clang". That should
use all of the LLVM utilities minus the integrated assembler while
wrapping clang with ccache.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 18:12                       ` Nathan Chancellor
  0 siblings, 0 replies; 56+ messages in thread
From: Nathan Chancellor @ 2021-02-04 18:12 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List,
	Russell King - ARM Linux admin, clang-built-linux,
	Ard Biesheuvel, Linux ARM

On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> > <guillaume.tucker@collabora.com> wrote:
> > >
> > > On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > > > <guillaume.tucker@collabora.com> wrote:
> > > >>
> > > >> Essentially:
> > > >>
> > > >>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> 
> This command should link with BFD (and assemble with GAS; it's only
> using clang as the compiler.

I think you missed the 'LLVM=1' before CC="ccache clang". That should
use all of the LLVM utilities minus the integrated assembler while
wrapping clang with ccache.

Cheers,
Nathan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 18:12                       ` Nathan Chancellor
@ 2021-02-04 18:23                         ` Nick Desaulniers
  -1 siblings, 0 replies; 56+ messages in thread
From: Nick Desaulniers @ 2021-02-04 18:23 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Ard Biesheuvel, Guillaume Tucker, Russell King - ARM Linux admin,
	Geert Uytterhoeven, Linux Kernel Mailing List, Linus Walleij,
	Linux ARM, Nicolas Pitre, kernelci-results, clang-built-linux

On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> > On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> > > <guillaume.tucker@collabora.com> wrote:
> > > >
> > > > On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > > > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > > > > <guillaume.tucker@collabora.com> wrote:
> > > > >>
> > > > >> Essentially:
> > > > >>
> > > > >>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >
> > This command should link with BFD (and assemble with GAS; it's only
> > using clang as the compiler.
>
> I think you missed the 'LLVM=1' before CC="ccache clang". That should
> use all of the LLVM utilities minus the integrated assembler while
> wrapping clang with ccache.

You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
permit fallback to BFD.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 18:23                         ` Nick Desaulniers
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Desaulniers @ 2021-02-04 18:23 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Guillaume Tucker, Linus Walleij, Linux Kernel Mailing List,
	Russell King - ARM Linux admin, clang-built-linux,
	Ard Biesheuvel, Linux ARM

On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> > On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> > > <guillaume.tucker@collabora.com> wrote:
> > > >
> > > > On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > > > > On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > > > > <guillaume.tucker@collabora.com> wrote:
> > > > >>
> > > > >> Essentially:
> > > > >>
> > > > >>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >
> > This command should link with BFD (and assemble with GAS; it's only
> > using clang as the compiler.
>
> I think you missed the 'LLVM=1' before CC="ccache clang". That should
> use all of the LLVM utilities minus the integrated assembler while
> wrapping clang with ccache.

You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
permit fallback to BFD.
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 16:01                   ` Ard Biesheuvel
@ 2021-02-04 21:09                     ` Guillaume Tucker
  -1 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 21:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Russell King - ARM Linux admin, Geert Uytterhoeven,
	Linux Kernel Mailing List, Linus Walleij, Linux ARM,
	Nicolas Pitre, kernelci-results, clang-built-linux,
	Nick Desaulniers

On 04/02/2021 16:01, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>>
>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>> <guillaume.tucker@collabora.com> wrote:
>>>>
>>>> On 04/02/2021 10:33, Guillaume Tucker wrote:
>>>>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>>>>>> <linux@armlinux.org.uk> wrote:
>>>>>>>
>>>>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Ard,
>>>>>>>>>
>>>>>>>>> Please see the bisection report below about a boot failure on
>>>>>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>>>>>
>>>>>>>>> Reports aren't automatically sent to the public while we're
>>>>>>>>> trialing new bisection features on kernelci.org but this one
>>>>>>>>> looks valid.
>>>>>>>>>
>>>>>>>>> The kernel is most likely crashing very early on, so there's
>>>>>>>>> nothing in the logs.  Please let us know if you need some help
>>>>>>>>> with debugging or trying a fix on these platforms.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for the report.
>>>>>>>
>>>>>>> Ard,
>>>>>>>
>>>>>>> I want to send my fixes branch today which includes your regression
>>>>>>> fix that caused this regression.
>>>>>>>
>>>>>>> As this is proving difficult to fix, I can only drop your fix from
>>>>>>> my fixes branch - and given that this seems to be problematical, I'm
>>>>>>> tempted to revert the original change at this point which should fix
>>>>>>> both of these regressions - and then we have another go at getting rid
>>>>>>> of the set/way instructions during the next cycle.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>
>>>>>> Hi Russell,
>>>>>>
>>>>>> If Guillaume is willing to do the experiment, and it fixes the issue,
>>>>>
>>>>> Yes, I'm running some tests with that fix now and should have
>>>>> some results shortly.
>>>>
>>>> Yes it does fix the issue:
>>>>
>>>>   https://lava.collabora.co.uk/scheduler/job/3173819
>>>>
>>>> with Ard's fix applied to this test branch:
>>>>
>>>>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
>>>>
>>>>
>>>> +clang +Nick
>>>>
>>>> It's worth mentioning that the issue only happens with kernels
>>>> built with Clang.  As you can see there are several other arm
>>>> platforms failing with clang-11 builds but booting fine with
>>>> gcc-8:
>>>>
>>>>   https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
>>>>
>>>> Here's a sample build log:
>>>>
>>>>   https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
>>>>
>>>> Essentially:
>>>>
>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>>
>>>> I believe it should be using the GNU assembler as LLVM_IAS=1 is
>>>> not defined, but there may be something more subtle about it.
>>>>
>>>
>>>
>>> Do you have a link for a failing zImage built from multi_v7_defconfig?
>>
>> Sure, this one was built from a plain next-20210203:
>>
>>   http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage
>>
>> You can also find the dtbs, modules and other things in that same
>> directory.
>>
>> For the record, here's the test job that used it:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3173792
>>
> 
> Thanks.
> 
> That zImage boots fine locally. Unfortunately, I don't have rk3288
> hardware to reproduce.
> 
> Could you please point me to the list of all the other platforms that
> failed to boot this image?

This is the list of platforms from kernelci.org I've gathered
which appeared to be impacted:

imx6q-sabrelite
imx6q-var-dt6customboard
imx6dl-riotboard
imx6qp-wandboard-revd1
imx7ulp-evk
odroid-xu3
rk3288-rock2-square
rk3288-veyron-jaq
stm32mp157c-dk2
sun4i-a10-olinuxino-lime
sun5i-a13-olinuxino-micro
sun7i-a20-cubieboard2
sun7i-a20-olinuxino-lime2
sun8i-a33-olinuxino
sun8i-a83t-bananapi-m3
sun8i-h2-plus-libretech-all-h3-cc
sun8i-h2-plus-orangepi-r1
sun8i-h2-plus-orangepi-zero
sun8i-h3-libretech-all-h3-cc
sun8i-h3-bananapi-m2-plus
sun8i-h3-orangepi-pc
sun8i-r40-bananapi-m2-ultra

They were all booting next-20210203 with gcc-8 but not with
clang-11.  I've run checks on a good share of them with your
patch applied and they're now booting with clang-11, just like
the rk3288 and imx6q platforms that were used for the bisections.


> To be honest, I am slightly annoyed that a change that works fine with
> GCC but does not work with Clang version
> 
> 11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158
> 
> (where exp means experimental, I suppose) is the reason for this

Well it's the standard one from the LLVM Debian package repo:

  deb http://apt.llvm.org/buster/ llvm-toolchain-buster-11 main

There's a slightly newer version, I doubt it would make any
difference in this respect unless there's a particular fix in
ld.lld:

# apt policy clang-11
clang-11:
  Installed: 1:11.1.0~++20210130110826+3a8282376b6c-1~exp1~20210130221445.158
  Candidate: 1:11.1.0~++20210204120158+1fdec59bffc1-1~exp1~20210203230823.159

> discussion, especially because the change is in asm code. Is it
> possible to build with Clang but use the GNU linker?

As mentioned by Nick, it is using everything from LLVM except the
assembler - so not the GNU linker.  I've now built a new Docker
container with the latest LLVM package version (.159) as well as
gcc-8-arm-linux-gnueabihf to try with the GNU linker and see if
that makes any difference.  More on that shortly...

Guillaume

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 21:09                     ` Guillaume Tucker
  0 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 21:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On 04/02/2021 16:01, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>>
>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>> <guillaume.tucker@collabora.com> wrote:
>>>>
>>>> On 04/02/2021 10:33, Guillaume Tucker wrote:
>>>>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>>>>>> <linux@armlinux.org.uk> wrote:
>>>>>>>
>>>>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Ard,
>>>>>>>>>
>>>>>>>>> Please see the bisection report below about a boot failure on
>>>>>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>>>>>
>>>>>>>>> Reports aren't automatically sent to the public while we're
>>>>>>>>> trialing new bisection features on kernelci.org but this one
>>>>>>>>> looks valid.
>>>>>>>>>
>>>>>>>>> The kernel is most likely crashing very early on, so there's
>>>>>>>>> nothing in the logs.  Please let us know if you need some help
>>>>>>>>> with debugging or trying a fix on these platforms.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for the report.
>>>>>>>
>>>>>>> Ard,
>>>>>>>
>>>>>>> I want to send my fixes branch today which includes your regression
>>>>>>> fix that caused this regression.
>>>>>>>
>>>>>>> As this is proving difficult to fix, I can only drop your fix from
>>>>>>> my fixes branch - and given that this seems to be problematical, I'm
>>>>>>> tempted to revert the original change at this point which should fix
>>>>>>> both of these regressions - and then we have another go at getting rid
>>>>>>> of the set/way instructions during the next cycle.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>
>>>>>> Hi Russell,
>>>>>>
>>>>>> If Guillaume is willing to do the experiment, and it fixes the issue,
>>>>>
>>>>> Yes, I'm running some tests with that fix now and should have
>>>>> some results shortly.
>>>>
>>>> Yes it does fix the issue:
>>>>
>>>>   https://lava.collabora.co.uk/scheduler/job/3173819
>>>>
>>>> with Ard's fix applied to this test branch:
>>>>
>>>>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
>>>>
>>>>
>>>> +clang +Nick
>>>>
>>>> It's worth mentioning that the issue only happens with kernels
>>>> built with Clang.  As you can see there are several other arm
>>>> platforms failing with clang-11 builds but booting fine with
>>>> gcc-8:
>>>>
>>>>   https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
>>>>
>>>> Here's a sample build log:
>>>>
>>>>   https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
>>>>
>>>> Essentially:
>>>>
>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>>
>>>> I believe it should be using the GNU assembler as LLVM_IAS=1 is
>>>> not defined, but there may be something more subtle about it.
>>>>
>>>
>>>
>>> Do you have a link for a failing zImage built from multi_v7_defconfig?
>>
>> Sure, this one was built from a plain next-20210203:
>>
>>   http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage
>>
>> You can also find the dtbs, modules and other things in that same
>> directory.
>>
>> For the record, here's the test job that used it:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3173792
>>
> 
> Thanks.
> 
> That zImage boots fine locally. Unfortunately, I don't have rk3288
> hardware to reproduce.
> 
> Could you please point me to the list of all the other platforms that
> failed to boot this image?

This is the list of platforms from kernelci.org I've gathered
which appeared to be impacted:

imx6q-sabrelite
imx6q-var-dt6customboard
imx6dl-riotboard
imx6qp-wandboard-revd1
imx7ulp-evk
odroid-xu3
rk3288-rock2-square
rk3288-veyron-jaq
stm32mp157c-dk2
sun4i-a10-olinuxino-lime
sun5i-a13-olinuxino-micro
sun7i-a20-cubieboard2
sun7i-a20-olinuxino-lime2
sun8i-a33-olinuxino
sun8i-a83t-bananapi-m3
sun8i-h2-plus-libretech-all-h3-cc
sun8i-h2-plus-orangepi-r1
sun8i-h2-plus-orangepi-zero
sun8i-h3-libretech-all-h3-cc
sun8i-h3-bananapi-m2-plus
sun8i-h3-orangepi-pc
sun8i-r40-bananapi-m2-ultra

They were all booting next-20210203 with gcc-8 but not with
clang-11.  I've run checks on a good share of them with your
patch applied and they're now booting with clang-11, just like
the rk3288 and imx6q platforms that were used for the bisections.


> To be honest, I am slightly annoyed that a change that works fine with
> GCC but does not work with Clang version
> 
> 11.1.0-++20210130110826+3a8282376b6c-1~exp1~20210130221445.158
> 
> (where exp means experimental, I suppose) is the reason for this

Well it's the standard one from the LLVM Debian package repo:

  deb http://apt.llvm.org/buster/ llvm-toolchain-buster-11 main

There's a slightly newer version, I doubt it would make any
difference in this respect unless there's a particular fix in
ld.lld:

# apt policy clang-11
clang-11:
  Installed: 1:11.1.0~++20210130110826+3a8282376b6c-1~exp1~20210130221445.158
  Candidate: 1:11.1.0~++20210204120158+1fdec59bffc1-1~exp1~20210203230823.159

> discussion, especially because the change is in asm code. Is it
> possible to build with Clang but use the GNU linker?

As mentioned by Nick, it is using everything from LLVM except the
assembler - so not the GNU linker.  I've now built a new Docker
container with the latest LLVM package version (.159) as well as
gcc-8-arm-linux-gnueabihf to try with the GNU linker and see if
that makes any difference.  More on that shortly...

Guillaume

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 18:23                         ` Nick Desaulniers
@ 2021-02-04 21:31                           ` Guillaume Tucker
  -1 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 21:31 UTC (permalink / raw)
  To: Nick Desaulniers, Nathan Chancellor, Ard Biesheuvel,
	Russell King - ARM Linux admin
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Linux Kernel Mailing List, clang-built-linux,
	Ard Biesheuvel, Linux ARM

On 04/02/2021 18:23, Nick Desaulniers wrote:
> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
>>
>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>
>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>
>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>
>>>>>>> Essentially:
>>>>>>>
>>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>
>>> This command should link with BFD (and assemble with GAS; it's only
>>> using clang as the compiler.
>>
>> I think you missed the 'LLVM=1' before CC="ccache clang". That should
>> use all of the LLVM utilities minus the integrated assembler while
>> wrapping clang with ccache.
> 
> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> permit fallback to BFD.

That was close, except we're cross-compiling with GCC for arm.
So I've now built a plain next-20210203 (without Ard's fix) using
this command line:

    make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage

I'm using a modified Docker image gtucker/kernelci-build-clang-11
with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
packages added to be able to use the GNU linker.  BTW I guess we
should enable this kind of hybrid build setup on kernelci.org as
well.

Full build log + kernel binaries can be found here:

    https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/

And this booted fine, which confirms it's really down to how
ld.lld puts together the kernel image.  Does it actually solve
the debate whether this is an issue to fix in the assembly code
or at link time?

Full test job details for the record:

    https://lava.collabora.co.uk/scheduler/job/3176004

Hope that helps,
Guillaume

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 21:31                           ` Guillaume Tucker
  0 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-04 21:31 UTC (permalink / raw)
  To: Nick Desaulniers, Nathan Chancellor, Ard Biesheuvel,
	Russell King - ARM Linux admin
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Linux Kernel Mailing List, clang-built-linux,
	Ard Biesheuvel, Linux ARM

On 04/02/2021 18:23, Nick Desaulniers wrote:
> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
>>
>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>
>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>
>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>
>>>>>>> Essentially:
>>>>>>>
>>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>
>>> This command should link with BFD (and assemble with GAS; it's only
>>> using clang as the compiler.
>>
>> I think you missed the 'LLVM=1' before CC="ccache clang". That should
>> use all of the LLVM utilities minus the integrated assembler while
>> wrapping clang with ccache.
> 
> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> permit fallback to BFD.

That was close, except we're cross-compiling with GCC for arm.
So I've now built a plain next-20210203 (without Ard's fix) using
this command line:

    make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage

I'm using a modified Docker image gtucker/kernelci-build-clang-11
with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
packages added to be able to use the GNU linker.  BTW I guess we
should enable this kind of hybrid build setup on kernelci.org as
well.

Full build log + kernel binaries can be found here:

    https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/

And this booted fine, which confirms it's really down to how
ld.lld puts together the kernel image.  Does it actually solve
the debate whether this is an issue to fix in the assembly code
or at link time?

Full test job details for the record:

    https://lava.collabora.co.uk/scheduler/job/3176004

Hope that helps,
Guillaume

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 21:31                           ` Guillaume Tucker
@ 2021-02-04 21:50                             ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 21:50 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Nick Desaulniers, Nathan Chancellor, Ard Biesheuvel,
	kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Linux Kernel Mailing List, clang-built-linux,
	Linux ARM

On Thu, Feb 04, 2021 at 09:31:06PM +0000, Guillaume Tucker wrote:
> On 04/02/2021 18:23, Nick Desaulniers wrote:
> > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> > permit fallback to BFD.
> 
> That was close, except we're cross-compiling with GCC for arm.
> So I've now built a plain next-20210203 (without Ard's fix) using
> this command line:
> 
>     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> 
> I'm using a modified Docker image gtucker/kernelci-build-clang-11
> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
> packages added to be able to use the GNU linker.  BTW I guess we
> should enable this kind of hybrid build setup on kernelci.org as
> well.

...

> And this booted fine, which confirms it's really down to how
> ld.lld puts together the kernel image.  Does it actually solve
> the debate whether this is an issue to fix in the assembly code
> or at link time?

Well... as I mentioned previously, we don't really understand what
is going on between the decompressor running with the caches on,
turning the caches off, jumping into the decompressed kernel, and
then getting to the v7 setup code.

The results from various attempts at solving the problem which lead
to Ard's original patch that caused your breakage were not making a
whole lot of sense (I think I wrote that all up in a previous email
thread, so won't repeat it here.)

So, I was slightly nervous about merging Ard's fix - and your report
suggested that there is indeed more going on here that we don't
understand.

When I was tracking down what was going on, I had this patch applied
(I've had to recreate it, so may not be exactly what I had), with the
DEBUG_LL stuff appropriately enabled. It may be worth applying this
patch, enabling the DEBUG_LL stuff appropriately for one of your
failing boards, and try booting it.

You should get two strings of identical hex numbers that look
something like:

ffffffff480000000900040140003000000000004820071d40008090

If they're looking like instructions, for example:

ee060f37e3a00080ee020f10ee020f30ee030f10e3a00903ee050f30

Then it's likely that you are seeing a very similar problem as I was
without Ard's patch. If you do get instruction-like content, then
you will likely find the sequence of instructions in the decompressor
code.

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 28c9d32fa99a..19fa93ae282c 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -475,7 +475,39 @@ ENDPROC(cpu_pj4b_do_resume)
 	ldr	r12, [r0]
 	add	r12, r12, r0			@ the local stack
 	stmia	r12, {r1-r6, lr}		@ v7_invalidate_l1 touches r0-r6
+	ldr	r0, [r12, #0]
+	bl	printhex8
+	ldr	r0, [r12, #4]
+	bl	printhex8
+	ldr	r0, [r12, #8]
+	bl	printhex8
+	ldr	r0, [r12, #12]
+	bl	printhex8
+	ldr	r0, [r12, #16]
+	bl	printhex8
+	ldr	r0, [r12, #20]
+	bl	printhex8
+	ldr	r0, [r12, #24]
+	bl	printhex8
+	mov	r0, #'\n'
+	bl	printch
 	bl      v7_invalidate_l1
+	ldr	r0, [r12, #0]
+	bl	printhex8
+	ldr	r0, [r12, #4]
+	bl	printhex8
+	ldr	r0, [r12, #8]
+	bl	printhex8
+	ldr	r0, [r12, #12]
+	bl	printhex8
+	ldr	r0, [r12, #16]
+	bl	printhex8
+	ldr	r0, [r12, #20]
+	bl	printhex8
+	ldr	r0, [r12, #24]
+	bl	printhex8
+	mov	r0, #'\n'
+	bl	printch
 	ldmia	r12, {r1-r6, lr}
 
 __v7_setup_cont:

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-04 21:50                             ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 56+ messages in thread
From: Russell King - ARM Linux admin @ 2021-02-04 21:50 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Linux Kernel Mailing List,
	Nathan Chancellor, clang-built-linux, Ard Biesheuvel, Linux ARM

On Thu, Feb 04, 2021 at 09:31:06PM +0000, Guillaume Tucker wrote:
> On 04/02/2021 18:23, Nick Desaulniers wrote:
> > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> > permit fallback to BFD.
> 
> That was close, except we're cross-compiling with GCC for arm.
> So I've now built a plain next-20210203 (without Ard's fix) using
> this command line:
> 
>     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> 
> I'm using a modified Docker image gtucker/kernelci-build-clang-11
> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
> packages added to be able to use the GNU linker.  BTW I guess we
> should enable this kind of hybrid build setup on kernelci.org as
> well.

...

> And this booted fine, which confirms it's really down to how
> ld.lld puts together the kernel image.  Does it actually solve
> the debate whether this is an issue to fix in the assembly code
> or at link time?

Well... as I mentioned previously, we don't really understand what
is going on between the decompressor running with the caches on,
turning the caches off, jumping into the decompressed kernel, and
then getting to the v7 setup code.

The results from various attempts at solving the problem which lead
to Ard's original patch that caused your breakage were not making a
whole lot of sense (I think I wrote that all up in a previous email
thread, so won't repeat it here.)

So, I was slightly nervous about merging Ard's fix - and your report
suggested that there is indeed more going on here that we don't
understand.

When I was tracking down what was going on, I had this patch applied
(I've had to recreate it, so may not be exactly what I had), with the
DEBUG_LL stuff appropriately enabled. It may be worth applying this
patch, enabling the DEBUG_LL stuff appropriately for one of your
failing boards, and try booting it.

You should get two strings of identical hex numbers that look
something like:

ffffffff480000000900040140003000000000004820071d40008090

If they're looking like instructions, for example:

ee060f37e3a00080ee020f10ee020f30ee030f10e3a00903ee050f30

Then it's likely that you are seeing a very similar problem as I was
without Ard's patch. If you do get instruction-like content, then
you will likely find the sequence of instructions in the decompressor
code.

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 28c9d32fa99a..19fa93ae282c 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -475,7 +475,39 @@ ENDPROC(cpu_pj4b_do_resume)
 	ldr	r12, [r0]
 	add	r12, r12, r0			@ the local stack
 	stmia	r12, {r1-r6, lr}		@ v7_invalidate_l1 touches r0-r6
+	ldr	r0, [r12, #0]
+	bl	printhex8
+	ldr	r0, [r12, #4]
+	bl	printhex8
+	ldr	r0, [r12, #8]
+	bl	printhex8
+	ldr	r0, [r12, #12]
+	bl	printhex8
+	ldr	r0, [r12, #16]
+	bl	printhex8
+	ldr	r0, [r12, #20]
+	bl	printhex8
+	ldr	r0, [r12, #24]
+	bl	printhex8
+	mov	r0, #'\n'
+	bl	printch
 	bl      v7_invalidate_l1
+	ldr	r0, [r12, #0]
+	bl	printhex8
+	ldr	r0, [r12, #4]
+	bl	printhex8
+	ldr	r0, [r12, #8]
+	bl	printhex8
+	ldr	r0, [r12, #12]
+	bl	printhex8
+	ldr	r0, [r12, #16]
+	bl	printhex8
+	ldr	r0, [r12, #20]
+	bl	printhex8
+	ldr	r0, [r12, #24]
+	bl	printhex8
+	mov	r0, #'\n'
+	bl	printch
 	ldmia	r12, {r1-r6, lr}
 
 __v7_setup_cont:

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-04 21:31                           ` Guillaume Tucker
@ 2021-02-05  8:21                             ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-05  8:21 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Nick Desaulniers, Nathan Chancellor,
	Russell King - ARM Linux admin, kernelci-results,
	Geert Uytterhoeven, Nicolas Pitre, Linus Walleij,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> On 04/02/2021 18:23, Nick Desaulniers wrote:
> > On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
> >>
> >> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> >>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >>>>
> >>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> >>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>
> >>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> >>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> >>>>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>>>
> >>>>>>> Essentially:
> >>>>>>>
> >>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>>
> >>> This command should link with BFD (and assemble with GAS; it's only
> >>> using clang as the compiler.
> >>
> >> I think you missed the 'LLVM=1' before CC="ccache clang". That should
> >> use all of the LLVM utilities minus the integrated assembler while
> >> wrapping clang with ccache.
> >
> > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> > permit fallback to BFD.
>
> That was close, except we're cross-compiling with GCC for arm.
> So I've now built a plain next-20210203 (without Ard's fix) using
> this command line:
>
>     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>
> I'm using a modified Docker image gtucker/kernelci-build-clang-11
> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
> packages added to be able to use the GNU linker.  BTW I guess we
> should enable this kind of hybrid build setup on kernelci.org as
> well.
>
> Full build log + kernel binaries can be found here:
>
>     https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
>
> And this booted fine, which confirms it's really down to how
> ld.lld puts together the kernel image.  Does it actually solve
> the debate whether this is an issue to fix in the assembly code
> or at link time?
>
> Full test job details for the record:
>
>     https://lava.collabora.co.uk/scheduler/job/3176004
>


So the issue appears to be in the way the linker generates the
_kernel_bss_size symbol, which obviously has an impact, given that the
queued fix takes it into account in the cache_clean operation.

On GNU ld, I see

   479: 00065e14     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size

whereas n LLVM ld.lld, I see

   433: c1c86e98     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size

and adding this value may cause the cache clean to operate on unmapped
addresses, or cause the addition to wrap and not perform a cache clean
at all.

AFAICT, this also breaks the appended DTB case in LLVM, so this needs
a separate fix in any case.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-05  8:21                             ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-05  8:21 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux,
	Linux ARM

On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> On 04/02/2021 18:23, Nick Desaulniers wrote:
> > On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
> >>
> >> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> >>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >>>>
> >>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> >>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>
> >>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> >>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> >>>>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>>>
> >>>>>>> Essentially:
> >>>>>>>
> >>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>>
> >>> This command should link with BFD (and assemble with GAS; it's only
> >>> using clang as the compiler.
> >>
> >> I think you missed the 'LLVM=1' before CC="ccache clang". That should
> >> use all of the LLVM utilities minus the integrated assembler while
> >> wrapping clang with ccache.
> >
> > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> > permit fallback to BFD.
>
> That was close, except we're cross-compiling with GCC for arm.
> So I've now built a plain next-20210203 (without Ard's fix) using
> this command line:
>
>     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>
> I'm using a modified Docker image gtucker/kernelci-build-clang-11
> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
> packages added to be able to use the GNU linker.  BTW I guess we
> should enable this kind of hybrid build setup on kernelci.org as
> well.
>
> Full build log + kernel binaries can be found here:
>
>     https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
>
> And this booted fine, which confirms it's really down to how
> ld.lld puts together the kernel image.  Does it actually solve
> the debate whether this is an issue to fix in the assembly code
> or at link time?
>
> Full test job details for the record:
>
>     https://lava.collabora.co.uk/scheduler/job/3176004
>


So the issue appears to be in the way the linker generates the
_kernel_bss_size symbol, which obviously has an impact, given that the
queued fix takes it into account in the cache_clean operation.

On GNU ld, I see

   479: 00065e14     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size

whereas n LLVM ld.lld, I see

   433: c1c86e98     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size

and adding this value may cause the cache clean to operate on unmapped
addresses, or cause the addition to wrap and not perform a cache clean
at all.

AFAICT, this also breaks the appended DTB case in LLVM, so this needs
a separate fix in any case.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-05  8:21                             ` Ard Biesheuvel
@ 2021-02-05 12:05                               ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-05 12:05 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Nick Desaulniers, Nathan Chancellor,
	Russell King - ARM Linux admin, kernelci-results,
	Geert Uytterhoeven, Nicolas Pitre, Linus Walleij,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > On 04/02/2021 18:23, Nick Desaulniers wrote:
> > > On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
> > >>
> > >> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> > >>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >>>>
> > >>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> > >>>> <guillaume.tucker@collabora.com> wrote:
> > >>>>>
> > >>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > >>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > >>>>>> <guillaume.tucker@collabora.com> wrote:
> > >>>>>>>
> > >>>>>>> Essentially:
> > >>>>>>>
> > >>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> > >>>
> > >>> This command should link with BFD (and assemble with GAS; it's only
> > >>> using clang as the compiler.
> > >>
> > >> I think you missed the 'LLVM=1' before CC="ccache clang". That should
> > >> use all of the LLVM utilities minus the integrated assembler while
> > >> wrapping clang with ccache.
> > >
> > > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> > > permit fallback to BFD.
> >
> > That was close, except we're cross-compiling with GCC for arm.
> > So I've now built a plain next-20210203 (without Ard's fix) using
> > this command line:
> >
> >     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >
> > I'm using a modified Docker image gtucker/kernelci-build-clang-11
> > with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
> > packages added to be able to use the GNU linker.  BTW I guess we
> > should enable this kind of hybrid build setup on kernelci.org as
> > well.
> >
> > Full build log + kernel binaries can be found here:
> >
> >     https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
> >
> > And this booted fine, which confirms it's really down to how
> > ld.lld puts together the kernel image.  Does it actually solve
> > the debate whether this is an issue to fix in the assembly code
> > or at link time?
> >
> > Full test job details for the record:
> >
> >     https://lava.collabora.co.uk/scheduler/job/3176004
> >
>
>
> So the issue appears to be in the way the linker generates the
> _kernel_bss_size symbol, which obviously has an impact, given that the
> queued fix takes it into account in the cache_clean operation.
>
> On GNU ld, I see
>
>    479: 00065e14     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>
> whereas n LLVM ld.lld, I see
>
>    433: c1c86e98     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>
> and adding this value may cause the cache clean to operate on unmapped
> addresses, or cause the addition to wrap and not perform a cache clean
> at all.
>
> AFAICT, this also breaks the appended DTB case in LLVM, so this needs
> a separate fix in any case.

I pushed a combined branch of torvalds/master, rmk/fixes (still
containing my 9052/1 fix) and this patch to my for-kernelci branch

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/

Guillaume,

It seems there is no Clang-11 coverage there, right? Mind giving this
branch a spin? If this fixes the regressions, we can get these queued
up.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-05 12:05                               ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-05 12:05 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux,
	Linux ARM

On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
> >
> > On 04/02/2021 18:23, Nick Desaulniers wrote:
> > > On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
> > >>
> > >> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> > >>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >>>>
> > >>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> > >>>> <guillaume.tucker@collabora.com> wrote:
> > >>>>>
> > >>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> > >>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> > >>>>>> <guillaume.tucker@collabora.com> wrote:
> > >>>>>>>
> > >>>>>>> Essentially:
> > >>>>>>>
> > >>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> > >>>
> > >>> This command should link with BFD (and assemble with GAS; it's only
> > >>> using clang as the compiler.
> > >>
> > >> I think you missed the 'LLVM=1' before CC="ccache clang". That should
> > >> use all of the LLVM utilities minus the integrated assembler while
> > >> wrapping clang with ccache.
> > >
> > > You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> > > permit fallback to BFD.
> >
> > That was close, except we're cross-compiling with GCC for arm.
> > So I've now built a plain next-20210203 (without Ard's fix) using
> > this command line:
> >
> >     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >
> > I'm using a modified Docker image gtucker/kernelci-build-clang-11
> > with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
> > packages added to be able to use the GNU linker.  BTW I guess we
> > should enable this kind of hybrid build setup on kernelci.org as
> > well.
> >
> > Full build log + kernel binaries can be found here:
> >
> >     https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
> >
> > And this booted fine, which confirms it's really down to how
> > ld.lld puts together the kernel image.  Does it actually solve
> > the debate whether this is an issue to fix in the assembly code
> > or at link time?
> >
> > Full test job details for the record:
> >
> >     https://lava.collabora.co.uk/scheduler/job/3176004
> >
>
>
> So the issue appears to be in the way the linker generates the
> _kernel_bss_size symbol, which obviously has an impact, given that the
> queued fix takes it into account in the cache_clean operation.
>
> On GNU ld, I see
>
>    479: 00065e14     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>
> whereas n LLVM ld.lld, I see
>
>    433: c1c86e98     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>
> and adding this value may cause the cache clean to operate on unmapped
> addresses, or cause the addition to wrap and not perform a cache clean
> at all.
>
> AFAICT, this also breaks the appended DTB case in LLVM, so this needs
> a separate fix in any case.

I pushed a combined branch of torvalds/master, rmk/fixes (still
containing my 9052/1 fix) and this patch to my for-kernelci branch

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/

Guillaume,

It seems there is no Clang-11 coverage there, right? Mind giving this
branch a spin? If this fixes the regressions, we can get these queued
up.

Thanks,
Ard.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-05 12:05                               ` Ard Biesheuvel
@ 2021-02-06 13:10                                 ` Guillaume Tucker
  -1 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-06 13:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nick Desaulniers, Nathan Chancellor,
	Russell King - ARM Linux admin, kernelci-results,
	Geert Uytterhoeven, Nicolas Pitre, Linus Walleij,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On 05/02/2021 12:05, Ard Biesheuvel wrote:
> On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
>> <guillaume.tucker@collabora.com> wrote:
>>>
>>> On 04/02/2021 18:23, Nick Desaulniers wrote:
>>>> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
>>>>>
>>>>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
>>>>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>>>>
>>>>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
>>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>>
>>>>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>>>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>>>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Essentially:
>>>>>>>>>>
>>>>>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>>>>
>>>>>> This command should link with BFD (and assemble with GAS; it's only
>>>>>> using clang as the compiler.
>>>>>
>>>>> I think you missed the 'LLVM=1' before CC="ccache clang". That should
>>>>> use all of the LLVM utilities minus the integrated assembler while
>>>>> wrapping clang with ccache.
>>>>
>>>> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
>>>> permit fallback to BFD.
>>>
>>> That was close, except we're cross-compiling with GCC for arm.
>>> So I've now built a plain next-20210203 (without Ard's fix) using
>>> this command line:
>>>
>>>     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>
>>> I'm using a modified Docker image gtucker/kernelci-build-clang-11
>>> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
>>> packages added to be able to use the GNU linker.  BTW I guess we
>>> should enable this kind of hybrid build setup on kernelci.org as
>>> well.
>>>
>>> Full build log + kernel binaries can be found here:
>>>
>>>     https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
>>>
>>> And this booted fine, which confirms it's really down to how
>>> ld.lld puts together the kernel image.  Does it actually solve
>>> the debate whether this is an issue to fix in the assembly code
>>> or at link time?
>>>
>>> Full test job details for the record:
>>>
>>>     https://lava.collabora.co.uk/scheduler/job/3176004
>>>
>>
>>
>> So the issue appears to be in the way the linker generates the
>> _kernel_bss_size symbol, which obviously has an impact, given that the
>> queued fix takes it into account in the cache_clean operation.
>>
>> On GNU ld, I see
>>
>>    479: 00065e14     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>>
>> whereas n LLVM ld.lld, I see
>>
>>    433: c1c86e98     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>>
>> and adding this value may cause the cache clean to operate on unmapped
>> addresses, or cause the addition to wrap and not perform a cache clean
>> at all.
>>
>> AFAICT, this also breaks the appended DTB case in LLVM, so this needs
>> a separate fix in any case.
> 
> I pushed a combined branch of torvalds/master, rmk/fixes (still
> containing my 9052/1 fix) and this patch to my for-kernelci branch
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> 
> Guillaume,
> 
> It seems there is no Clang-11 coverage there, right? Mind giving this
> branch a spin? If this fixes the regressions, we can get these queued
> up.

That's right, Clang builds are only enabled on linux-next and
mainline at the moment.  We could enable it on any other branch
where it makes sense.  How about just the main defconfig for arm,
arm64 and x86_64 on your ardb/for-kernelci branch?

For now I've run another set of builds with clang-11 and got the
following test results with your branch on staging:

  https://staging.kernelci.org/test/job/ardb/branch/for-kernelci/kernel/v5.11-rc6-146-g923ca344043a/plan/baseline/

which are all passing.

I'll reply to the thread with your patch to confirm.

Guillaume

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-06 13:10                                 ` Guillaume Tucker
  0 siblings, 0 replies; 56+ messages in thread
From: Guillaume Tucker @ 2021-02-06 13:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux,
	Linux ARM

On 05/02/2021 12:05, Ard Biesheuvel wrote:
> On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
>> <guillaume.tucker@collabora.com> wrote:
>>>
>>> On 04/02/2021 18:23, Nick Desaulniers wrote:
>>>> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
>>>>>
>>>>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
>>>>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>>>>
>>>>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
>>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>>
>>>>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>>>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>>>>>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Essentially:
>>>>>>>>>>
>>>>>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>>>>
>>>>>> This command should link with BFD (and assemble with GAS; it's only
>>>>>> using clang as the compiler.
>>>>>
>>>>> I think you missed the 'LLVM=1' before CC="ccache clang". That should
>>>>> use all of the LLVM utilities minus the integrated assembler while
>>>>> wrapping clang with ccache.
>>>>
>>>> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
>>>> permit fallback to BFD.
>>>
>>> That was close, except we're cross-compiling with GCC for arm.
>>> So I've now built a plain next-20210203 (without Ard's fix) using
>>> this command line:
>>>
>>>     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>
>>> I'm using a modified Docker image gtucker/kernelci-build-clang-11
>>> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
>>> packages added to be able to use the GNU linker.  BTW I guess we
>>> should enable this kind of hybrid build setup on kernelci.org as
>>> well.
>>>
>>> Full build log + kernel binaries can be found here:
>>>
>>>     https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
>>>
>>> And this booted fine, which confirms it's really down to how
>>> ld.lld puts together the kernel image.  Does it actually solve
>>> the debate whether this is an issue to fix in the assembly code
>>> or at link time?
>>>
>>> Full test job details for the record:
>>>
>>>     https://lava.collabora.co.uk/scheduler/job/3176004
>>>
>>
>>
>> So the issue appears to be in the way the linker generates the
>> _kernel_bss_size symbol, which obviously has an impact, given that the
>> queued fix takes it into account in the cache_clean operation.
>>
>> On GNU ld, I see
>>
>>    479: 00065e14     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>>
>> whereas n LLVM ld.lld, I see
>>
>>    433: c1c86e98     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>>
>> and adding this value may cause the cache clean to operate on unmapped
>> addresses, or cause the addition to wrap and not perform a cache clean
>> at all.
>>
>> AFAICT, this also breaks the appended DTB case in LLVM, so this needs
>> a separate fix in any case.
> 
> I pushed a combined branch of torvalds/master, rmk/fixes (still
> containing my 9052/1 fix) and this patch to my for-kernelci branch
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> 
> Guillaume,
> 
> It seems there is no Clang-11 coverage there, right? Mind giving this
> branch a spin? If this fixes the regressions, we can get these queued
> up.

That's right, Clang builds are only enabled on linux-next and
mainline at the moment.  We could enable it on any other branch
where it makes sense.  How about just the main defconfig for arm,
arm64 and x86_64 on your ardb/for-kernelci branch?

For now I've run another set of builds with clang-11 and got the
following test results with your branch on staging:

  https://staging.kernelci.org/test/job/ardb/branch/for-kernelci/kernel/v5.11-rc6-146-g923ca344043a/plan/baseline/

which are all passing.

I'll reply to the thread with your patch to confirm.

Guillaume

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
  2021-02-06 13:10                                 ` Guillaume Tucker
@ 2021-02-06 13:12                                   ` Ard Biesheuvel
  -1 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-06 13:12 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Nick Desaulniers, Nathan Chancellor,
	Russell King - ARM Linux admin, kernelci-results,
	Geert Uytterhoeven, Nicolas Pitre, Linus Walleij,
	Linux Kernel Mailing List, clang-built-linux, Linux ARM

On Sat, 6 Feb 2021 at 14:10, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> On 05/02/2021 12:05, Ard Biesheuvel wrote:
> > On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> >>
> >> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
> >> <guillaume.tucker@collabora.com> wrote:
> >>>
> >>> On 04/02/2021 18:23, Nick Desaulniers wrote:
> >>>> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
> >>>>>
> >>>>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> >>>>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >>>>>>>
> >>>>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> >>>>>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>>>>
> >>>>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> >>>>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> >>>>>>>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Essentially:
> >>>>>>>>>>
> >>>>>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>>>>>
> >>>>>> This command should link with BFD (and assemble with GAS; it's only
> >>>>>> using clang as the compiler.
> >>>>>
> >>>>> I think you missed the 'LLVM=1' before CC="ccache clang". That should
> >>>>> use all of the LLVM utilities minus the integrated assembler while
> >>>>> wrapping clang with ccache.
> >>>>
> >>>> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> >>>> permit fallback to BFD.
> >>>
> >>> That was close, except we're cross-compiling with GCC for arm.
> >>> So I've now built a plain next-20210203 (without Ard's fix) using
> >>> this command line:
> >>>
> >>>     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>>
> >>> I'm using a modified Docker image gtucker/kernelci-build-clang-11
> >>> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
> >>> packages added to be able to use the GNU linker.  BTW I guess we
> >>> should enable this kind of hybrid build setup on kernelci.org as
> >>> well.
> >>>
> >>> Full build log + kernel binaries can be found here:
> >>>
> >>>     https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
> >>>
> >>> And this booted fine, which confirms it's really down to how
> >>> ld.lld puts together the kernel image.  Does it actually solve
> >>> the debate whether this is an issue to fix in the assembly code
> >>> or at link time?
> >>>
> >>> Full test job details for the record:
> >>>
> >>>     https://lava.collabora.co.uk/scheduler/job/3176004
> >>>
> >>
> >>
> >> So the issue appears to be in the way the linker generates the
> >> _kernel_bss_size symbol, which obviously has an impact, given that the
> >> queued fix takes it into account in the cache_clean operation.
> >>
> >> On GNU ld, I see
> >>
> >>    479: 00065e14     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
> >>
> >> whereas n LLVM ld.lld, I see
> >>
> >>    433: c1c86e98     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
> >>
> >> and adding this value may cause the cache clean to operate on unmapped
> >> addresses, or cause the addition to wrap and not perform a cache clean
> >> at all.
> >>
> >> AFAICT, this also breaks the appended DTB case in LLVM, so this needs
> >> a separate fix in any case.
> >
> > I pushed a combined branch of torvalds/master, rmk/fixes (still
> > containing my 9052/1 fix) and this patch to my for-kernelci branch
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> >
> > Guillaume,
> >
> > It seems there is no Clang-11 coverage there, right? Mind giving this
> > branch a spin? If this fixes the regressions, we can get these queued
> > up.
>
> That's right, Clang builds are only enabled on linux-next and
> mainline at the moment.  We could enable it on any other branch
> where it makes sense.  How about just the main defconfig for arm,
> arm64 and x86_64 on your ardb/for-kernelci branch?
>

Yes, please.

> For now I've run another set of builds with clang-11 and got the
> following test results with your branch on staging:
>
>   https://staging.kernelci.org/test/job/ardb/branch/for-kernelci/kernel/v5.11-rc6-146-g923ca344043a/plan/baseline/
>
> which are all passing.
>
> I'll reply to the thread with your patch to confirm.
>

Excellent, thanks a lot.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: next/master bisection: baseline.login on rk3288-rock2-square
@ 2021-02-06 13:12                                   ` Ard Biesheuvel
  0 siblings, 0 replies; 56+ messages in thread
From: Ard Biesheuvel @ 2021-02-06 13:12 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci-results, Geert Uytterhoeven, Nicolas Pitre,
	Linus Walleij, Nick Desaulniers, Russell King - ARM Linux admin,
	Linux Kernel Mailing List, Nathan Chancellor, clang-built-linux,
	Linux ARM

On Sat, 6 Feb 2021 at 14:10, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> On 05/02/2021 12:05, Ard Biesheuvel wrote:
> > On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel <ardb@kernel.org> wrote:
> >>
> >> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
> >> <guillaume.tucker@collabora.com> wrote:
> >>>
> >>> On 04/02/2021 18:23, Nick Desaulniers wrote:
> >>>> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor <nathan@kernel.org> wrote:
> >>>>>
> >>>>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built Linux wrote:
> >>>>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >>>>>>>
> >>>>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
> >>>>>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>>>>
> >>>>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
> >>>>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
> >>>>>>>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Essentially:
> >>>>>>>>>>
> >>>>>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>>>>>
> >>>>>> This command should link with BFD (and assemble with GAS; it's only
> >>>>>> using clang as the compiler.
> >>>>>
> >>>>> I think you missed the 'LLVM=1' before CC="ccache clang". That should
> >>>>> use all of the LLVM utilities minus the integrated assembler while
> >>>>> wrapping clang with ccache.
> >>>>
> >>>> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> >>>> permit fallback to BFD.
> >>>
> >>> That was close, except we're cross-compiling with GCC for arm.
> >>> So I've now built a plain next-20210203 (without Ard's fix) using
> >>> this command line:
> >>>
> >>>     make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
> >>>
> >>> I'm using a modified Docker image gtucker/kernelci-build-clang-11
> >>> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
> >>> packages added to be able to use the GNU linker.  BTW I guess we
> >>> should enable this kind of hybrid build setup on kernelci.org as
> >>> well.
> >>>
> >>> Full build log + kernel binaries can be found here:
> >>>
> >>>     https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
> >>>
> >>> And this booted fine, which confirms it's really down to how
> >>> ld.lld puts together the kernel image.  Does it actually solve
> >>> the debate whether this is an issue to fix in the assembly code
> >>> or at link time?
> >>>
> >>> Full test job details for the record:
> >>>
> >>>     https://lava.collabora.co.uk/scheduler/job/3176004
> >>>
> >>
> >>
> >> So the issue appears to be in the way the linker generates the
> >> _kernel_bss_size symbol, which obviously has an impact, given that the
> >> queued fix takes it into account in the cache_clean operation.
> >>
> >> On GNU ld, I see
> >>
> >>    479: 00065e14     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
> >>
> >> whereas n LLVM ld.lld, I see
> >>
> >>    433: c1c86e98     0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
> >>
> >> and adding this value may cause the cache clean to operate on unmapped
> >> addresses, or cause the addition to wrap and not perform a cache clean
> >> at all.
> >>
> >> AFAICT, this also breaks the appended DTB case in LLVM, so this needs
> >> a separate fix in any case.
> >
> > I pushed a combined branch of torvalds/master, rmk/fixes (still
> > containing my 9052/1 fix) and this patch to my for-kernelci branch
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> >
> > Guillaume,
> >
> > It seems there is no Clang-11 coverage there, right? Mind giving this
> > branch a spin? If this fixes the regressions, we can get these queued
> > up.
>
> That's right, Clang builds are only enabled on linux-next and
> mainline at the moment.  We could enable it on any other branch
> where it makes sense.  How about just the main defconfig for arm,
> arm64 and x86_64 on your ardb/for-kernelci branch?
>

Yes, please.

> For now I've run another set of builds with clang-11 and got the
> following test results with your branch on staging:
>
>   https://staging.kernelci.org/test/job/ardb/branch/for-kernelci/kernel/v5.11-rc6-146-g923ca344043a/plan/baseline/
>
> which are all passing.
>
> I'll reply to the thread with your patch to confirm.
>

Excellent, thanks a lot.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2021-02-06 13:14 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <601b773a.1c69fb81.9f381.a32a@mx.google.com>
2021-02-04  8:43 ` next/master bisection: baseline.login on rk3288-rock2-square Guillaume Tucker
2021-02-04  8:43   ` Guillaume Tucker
2021-02-04  9:07   ` Ard Biesheuvel
2021-02-04  9:07     ` Ard Biesheuvel
2021-02-04 10:06     ` Russell King - ARM Linux admin
2021-02-04 10:06       ` Russell King - ARM Linux admin
2021-02-04 10:27       ` Ard Biesheuvel
2021-02-04 10:27         ` Ard Biesheuvel
2021-02-04 10:33         ` Guillaume Tucker
2021-02-04 10:33           ` Guillaume Tucker
2021-02-04 11:32           ` Guillaume Tucker
2021-02-04 11:32             ` Guillaume Tucker
2021-02-04 11:44             ` Russell King - ARM Linux admin
2021-02-04 11:44               ` Russell King - ARM Linux admin
2021-02-04 12:09               ` Ard Biesheuvel
2021-02-04 12:09                 ` Ard Biesheuvel
2021-02-04 15:42             ` Ard Biesheuvel
2021-02-04 15:42               ` Ard Biesheuvel
2021-02-04 15:53               ` Guillaume Tucker
2021-02-04 15:53                 ` Guillaume Tucker
2021-02-04 16:01                 ` Ard Biesheuvel
2021-02-04 16:01                   ` Ard Biesheuvel
2021-02-04 18:06                   ` Nick Desaulniers
2021-02-04 18:06                     ` Nick Desaulniers
2021-02-04 18:12                     ` Nathan Chancellor
2021-02-04 18:12                       ` Nathan Chancellor
2021-02-04 18:23                       ` Nick Desaulniers
2021-02-04 18:23                         ` Nick Desaulniers
2021-02-04 21:31                         ` Guillaume Tucker
2021-02-04 21:31                           ` Guillaume Tucker
2021-02-04 21:50                           ` Russell King - ARM Linux admin
2021-02-04 21:50                             ` Russell King - ARM Linux admin
2021-02-05  8:21                           ` Ard Biesheuvel
2021-02-05  8:21                             ` Ard Biesheuvel
2021-02-05 12:05                             ` Ard Biesheuvel
2021-02-05 12:05                               ` Ard Biesheuvel
2021-02-06 13:10                               ` Guillaume Tucker
2021-02-06 13:10                                 ` Guillaume Tucker
2021-02-06 13:12                                 ` Ard Biesheuvel
2021-02-06 13:12                                   ` Ard Biesheuvel
2021-02-04 21:09                   ` Guillaume Tucker
2021-02-04 21:09                     ` Guillaume Tucker
2021-02-04 10:47         ` Russell King - ARM Linux admin
2021-02-04 10:47           ` Russell King - ARM Linux admin
2021-02-04 10:55           ` Ard Biesheuvel
2021-02-04 10:55             ` Ard Biesheuvel
2021-02-04 12:26             ` Marc Zyngier
2021-02-04 12:26               ` Marc Zyngier
2021-02-04 14:09               ` Russell King - ARM Linux admin
2021-02-04 14:09                 ` Russell King - ARM Linux admin
2021-02-04 14:25                 ` Ard Biesheuvel
2021-02-04 14:25                   ` Ard Biesheuvel
2021-02-04 14:36                   ` Russell King - ARM Linux admin
2021-02-04 14:36                     ` Russell King - ARM Linux admin
2021-02-04 15:52                     ` Ard Biesheuvel
2021-02-04 15:52                       ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.