All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-05-27 12:43 Will Deacon
  2021-05-27 13:11 ` Catalin Marinas
                   ` (5 more replies)
  0 siblings, 6 replies; 73+ messages in thread
From: Will Deacon @ 2021-05-27 12:43 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kernel-team, Will Deacon, Catalin Marinas, Mark Rutland,
	Ard Biesheuvel, Arnd Bergmann, Vincent Whitchurch

Back in 97303480753e ("arm64: Increase the max granular size"),
ARCH_DMA_MINALIGN was effectively increased to 128 bytes thanks to an
increase in L1_CACHE_BYTES due to an unsubstantiated performance claim
on the now obsolete ThunderX-1. Although this was reverted in
d93277b9839b, ARCH_DMA_MINALIGN was kept at 128 bytes by ebc7e21e0fa2
("arm64: Increase ARCH_DMA_MINALIGN to 128").

During discussion of the original patch, it was reported that the change
also prevented a warning during boot on (again, now obsolete) Qualcomm
server hardware where the cache writeback granule was larger than 64
bytes. The reason for this warning was because non-coherent DMA could
lead to data corruption due to unexpected writeback from the CPU where a
cacheline is shared with other allocations.

Since then, systems have appeared with larger cachelines still, and so
commit 8f5c9037a55b ("arm64/mm: Correct the cache line size warning with
non coherent device") reworked the warning so that it only appears on
systems where non-coherent DMA is actually required and taints the
kernel with TAINT_CPU_OUT_OF_SPEC. We are not aware of any systems, even
including the aforementioned obsolete machines, which have a CWG larger
than 64 bytes and require non-coherent DMA.

More recently, it has been reported that a ARCH_DMA_MINALIGN of 128
bytes wastes considerable memory (~6% immediately after boot on one
system).

Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
indicate if there are machines that unknowingly rely on this.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/linux-arm-kernel/1442944788-17254-1-git-send-email-rric@kernel.org/
Link: https://lore.kernel.org/linux-arm-kernel/CAOZdJXUiRMAguDV+HEJqPg57MyBNqEcTyaH+ya=U93NHb-pdJA@mail.gmail.com/
Link: https://lore.kernel.org/linux-arm-kernel/20190614131141.4428-1-msys.mizuma@gmail.com/
Link: https://lore.kernel.org/r/20210517074332.28280-1-vincent.whitchurch@axis.com
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/cache.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a074459f8f2f..a9c0716e7440 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -47,7 +47,7 @@
  * cache before the transfer is done, causing old data to be seen by
  * the CPU.
  */
-#define ARCH_DMA_MINALIGN	(128)
+#define ARCH_DMA_MINALIGN	L1_CACHE_BYTES
 
 #ifdef CONFIG_KASAN_SW_TAGS
 #define ARCH_SLAB_MINALIGN	(1ULL << KASAN_SHADOW_SCALE_SHIFT)
-- 
2.31.1.818.g46aad6cb9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-05-27 12:43 [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES) Will Deacon
@ 2021-05-27 13:11 ` Catalin Marinas
  2021-05-27 13:19 ` Mark Rutland
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: Catalin Marinas @ 2021-05-27 13:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kernel-team, Mark Rutland, Ard Biesheuvel,
	Arnd Bergmann, Vincent Whitchurch

On Thu, May 27, 2021 at 01:43:56PM +0100, Will Deacon wrote:
> Back in 97303480753e ("arm64: Increase the max granular size"),
> ARCH_DMA_MINALIGN was effectively increased to 128 bytes thanks to an
> increase in L1_CACHE_BYTES due to an unsubstantiated performance claim
> on the now obsolete ThunderX-1. Although this was reverted in
> d93277b9839b, ARCH_DMA_MINALIGN was kept at 128 bytes by ebc7e21e0fa2
> ("arm64: Increase ARCH_DMA_MINALIGN to 128").
> 
> During discussion of the original patch, it was reported that the change
> also prevented a warning during boot on (again, now obsolete) Qualcomm
> server hardware where the cache writeback granule was larger than 64
> bytes. The reason for this warning was because non-coherent DMA could
> lead to data corruption due to unexpected writeback from the CPU where a
> cacheline is shared with other allocations.
> 
> Since then, systems have appeared with larger cachelines still, and so
> commit 8f5c9037a55b ("arm64/mm: Correct the cache line size warning with
> non coherent device") reworked the warning so that it only appears on
> systems where non-coherent DMA is actually required and taints the
> kernel with TAINT_CPU_OUT_OF_SPEC. We are not aware of any systems, even
> including the aforementioned obsolete machines, which have a CWG larger
> than 64 bytes and require non-coherent DMA.
> 
> More recently, it has been reported that a ARCH_DMA_MINALIGN of 128
> bytes wastes considerable memory (~6% immediately after boot on one
> system).
> 
> Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> indicate if there are machines that unknowingly rely on this.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
> Link: https://lore.kernel.org/linux-arm-kernel/1442944788-17254-1-git-send-email-rric@kernel.org/
> Link: https://lore.kernel.org/linux-arm-kernel/CAOZdJXUiRMAguDV+HEJqPg57MyBNqEcTyaH+ya=U93NHb-pdJA@mail.gmail.com/
> Link: https://lore.kernel.org/linux-arm-kernel/20190614131141.4428-1-msys.mizuma@gmail.com/
> Link: https://lore.kernel.org/r/20210517074332.28280-1-vincent.whitchurch@axis.com
> Signed-off-by: Will Deacon <will@kernel.org>

Unless we hear from anyone that the warning is triggered, I think we
should change this in mainline.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-05-27 12:43 [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES) Will Deacon
  2021-05-27 13:11 ` Catalin Marinas
@ 2021-05-27 13:19 ` Mark Rutland
  2021-05-28  9:35   ` Arnd Bergmann
  2021-05-31  5:38 ` Ard Biesheuvel
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Mark Rutland @ 2021-05-27 13:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kernel-team, Catalin Marinas, Ard Biesheuvel,
	Arnd Bergmann, Vincent Whitchurch

On Thu, May 27, 2021 at 01:43:56PM +0100, Will Deacon wrote:
> Back in 97303480753e ("arm64: Increase the max granular size"),
> ARCH_DMA_MINALIGN was effectively increased to 128 bytes thanks to an
> increase in L1_CACHE_BYTES due to an unsubstantiated performance claim
> on the now obsolete ThunderX-1. Although this was reverted in
> d93277b9839b, ARCH_DMA_MINALIGN was kept at 128 bytes by ebc7e21e0fa2
> ("arm64: Increase ARCH_DMA_MINALIGN to 128").
> 
> During discussion of the original patch, it was reported that the change
> also prevented a warning during boot on (again, now obsolete) Qualcomm
> server hardware where the cache writeback granule was larger than 64
> bytes. The reason for this warning was because non-coherent DMA could
> lead to data corruption due to unexpected writeback from the CPU where a
> cacheline is shared with other allocations.
> 
> Since then, systems have appeared with larger cachelines still, and so
> commit 8f5c9037a55b ("arm64/mm: Correct the cache line size warning with
> non coherent device") reworked the warning so that it only appears on
> systems where non-coherent DMA is actually required and taints the
> kernel with TAINT_CPU_OUT_OF_SPEC. We are not aware of any systems, even
> including the aforementioned obsolete machines, which have a CWG larger
> than 64 bytes and require non-coherent DMA.
> 
> More recently, it has been reported that a ARCH_DMA_MINALIGN of 128
> bytes wastes considerable memory (~6% immediately after boot on one
> system).
> 
> Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> indicate if there are machines that unknowingly rely on this.

The rationale above makes sense to me, so:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
> Link: https://lore.kernel.org/linux-arm-kernel/1442944788-17254-1-git-send-email-rric@kernel.org/
> Link: https://lore.kernel.org/linux-arm-kernel/CAOZdJXUiRMAguDV+HEJqPg57MyBNqEcTyaH+ya=U93NHb-pdJA@mail.gmail.com/
> Link: https://lore.kernel.org/linux-arm-kernel/20190614131141.4428-1-msys.mizuma@gmail.com/
> Link: https://lore.kernel.org/r/20210517074332.28280-1-vincent.whitchurch@axis.com
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/cache.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index a074459f8f2f..a9c0716e7440 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -47,7 +47,7 @@
>   * cache before the transfer is done, causing old data to be seen by
>   * the CPU.
>   */
> -#define ARCH_DMA_MINALIGN	(128)
> +#define ARCH_DMA_MINALIGN	L1_CACHE_BYTES
>  
>  #ifdef CONFIG_KASAN_SW_TAGS
>  #define ARCH_SLAB_MINALIGN	(1ULL << KASAN_SHADOW_SCALE_SHIFT)
> -- 
> 2.31.1.818.g46aad6cb9e-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-05-27 13:19 ` Mark Rutland
@ 2021-05-28  9:35   ` Arnd Bergmann
  2021-06-01 10:14     ` Catalin Marinas
  0 siblings, 1 reply; 73+ messages in thread
From: Arnd Bergmann @ 2021-05-28  9:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Linux ARM, Android Kernel Team, Catalin Marinas,
	Ard Biesheuvel, Vincent Whitchurch

On Thu, May 27, 2021 at 3:19 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > More recently, it has been reported that a ARCH_DMA_MINALIGN of 128
> > bytes wastes considerable memory (~6% immediately after boot on one
> > system).
> >
> > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> > indicate if there are machines that unknowingly rely on this.
>
> The rationale above makes sense to me, so:
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>

I think it would make sense to go even further than this in the
future, and allow
setting a smaller minimum alignment depending what hardware is detected
at boot time. That would clearly require more work and testing to do right,
so for the moment, this approach is the best we can do.

Acked-by: Arnd Bergmann <arnd@arndb.de>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-05-27 12:43 [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES) Will Deacon
  2021-05-27 13:11 ` Catalin Marinas
  2021-05-27 13:19 ` Mark Rutland
@ 2021-05-31  5:38 ` Ard Biesheuvel
  2021-06-01 18:21 ` Will Deacon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: Ard Biesheuvel @ 2021-05-31  5:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linux ARM, Android Kernel Team, Catalin Marinas, Mark Rutland,
	Arnd Bergmann, Vincent Whitchurch

On Thu, 27 May 2021 at 14:44, Will Deacon <will@kernel.org> wrote:
>
> Back in 97303480753e ("arm64: Increase the max granular size"),
> ARCH_DMA_MINALIGN was effectively increased to 128 bytes thanks to an
> increase in L1_CACHE_BYTES due to an unsubstantiated performance claim
> on the now obsolete ThunderX-1. Although this was reverted in
> d93277b9839b, ARCH_DMA_MINALIGN was kept at 128 bytes by ebc7e21e0fa2
> ("arm64: Increase ARCH_DMA_MINALIGN to 128").
>
> During discussion of the original patch, it was reported that the change
> also prevented a warning during boot on (again, now obsolete) Qualcomm
> server hardware where the cache writeback granule was larger than 64
> bytes. The reason for this warning was because non-coherent DMA could
> lead to data corruption due to unexpected writeback from the CPU where a
> cacheline is shared with other allocations.
>
> Since then, systems have appeared with larger cachelines still, and so
> commit 8f5c9037a55b ("arm64/mm: Correct the cache line size warning with
> non coherent device") reworked the warning so that it only appears on
> systems where non-coherent DMA is actually required and taints the
> kernel with TAINT_CPU_OUT_OF_SPEC. We are not aware of any systems, even
> including the aforementioned obsolete machines, which have a CWG larger
> than 64 bytes and require non-coherent DMA.
>
> More recently, it has been reported that a ARCH_DMA_MINALIGN of 128
> bytes wastes considerable memory (~6% immediately after boot on one
> system).
>
> Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> indicate if there are machines that unknowingly rely on this.
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
> Link: https://lore.kernel.org/linux-arm-kernel/1442944788-17254-1-git-send-email-rric@kernel.org/
> Link: https://lore.kernel.org/linux-arm-kernel/CAOZdJXUiRMAguDV+HEJqPg57MyBNqEcTyaH+ya=U93NHb-pdJA@mail.gmail.com/
> Link: https://lore.kernel.org/linux-arm-kernel/20190614131141.4428-1-msys.mizuma@gmail.com/
> Link: https://lore.kernel.org/r/20210517074332.28280-1-vincent.whitchurch@axis.com
> Signed-off-by: Will Deacon <will@kernel.org>

Acked-by: Ard Biesheuvel <ardb@kernel.org>

> ---
>  arch/arm64/include/asm/cache.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index a074459f8f2f..a9c0716e7440 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -47,7 +47,7 @@
>   * cache before the transfer is done, causing old data to be seen by
>   * the CPU.
>   */
> -#define ARCH_DMA_MINALIGN      (128)
> +#define ARCH_DMA_MINALIGN      L1_CACHE_BYTES
>
>  #ifdef CONFIG_KASAN_SW_TAGS
>  #define ARCH_SLAB_MINALIGN     (1ULL << KASAN_SHADOW_SCALE_SHIFT)
> --
> 2.31.1.818.g46aad6cb9e-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-05-28  9:35   ` Arnd Bergmann
@ 2021-06-01 10:14     ` Catalin Marinas
  0 siblings, 0 replies; 73+ messages in thread
From: Catalin Marinas @ 2021-06-01 10:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Rutland, Will Deacon, Linux ARM, Android Kernel Team,
	Ard Biesheuvel, Vincent Whitchurch

On Fri, May 28, 2021 at 11:35:32AM +0200, Arnd Bergmann wrote:
> On Thu, May 27, 2021 at 3:19 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > > More recently, it has been reported that a ARCH_DMA_MINALIGN of 128
> > > bytes wastes considerable memory (~6% immediately after boot on one
> > > system).
> > >
> > > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> > > indicate if there are machines that unknowingly rely on this.
> >
> > The rationale above makes sense to me, so:
> >
> > Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> I think it would make sense to go even further than this in the
> future, and allow
> setting a smaller minimum alignment depending what hardware is detected
> at boot time.

Yeah, we talked about this in the past. The problem is that very early
the kernel doesn't know whether it'll have devices that require
non-coherent DMA. So we'd probably need to start with a 64 byte
ARCH_DMA_MINALIGN and populate the slab caches slightly later once the
kernel learns more about the system it's running on.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-05-27 12:43 [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES) Will Deacon
                   ` (2 preceding siblings ...)
  2021-05-31  5:38 ` Ard Biesheuvel
@ 2021-06-01 18:21 ` Will Deacon
       [not found] ` <CGME20210602132541eucas1p17127696041c26c00d1d2f50bef9cfaf0@eucas1p1.samsung.com>
  2021-07-06  9:26 ` Yassine Oudjana
  5 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-06-01 18:21 UTC (permalink / raw)
  To: linux-arm-kernel, Will Deacon
  Cc: catalin.marinas, kernel-team, Mark Rutland, Ard Biesheuvel,
	Vincent Whitchurch, Arnd Bergmann

On Thu, 27 May 2021 13:43:56 +0100, Will Deacon wrote:
> Back in 97303480753e ("arm64: Increase the max granular size"),
> ARCH_DMA_MINALIGN was effectively increased to 128 bytes thanks to an
> increase in L1_CACHE_BYTES due to an unsubstantiated performance claim
> on the now obsolete ThunderX-1. Although this was reverted in
> d93277b9839b, ARCH_DMA_MINALIGN was kept at 128 bytes by ebc7e21e0fa2
> ("arm64: Increase ARCH_DMA_MINALIGN to 128").
> 
> [...]

Applied to arm64 (for-next/mm), thanks!

[1/1] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
      https://git.kernel.org/arm64/c/65688d2a05de

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
       [not found] ` <CGME20210602132541eucas1p17127696041c26c00d1d2f50bef9cfaf0@eucas1p1.samsung.com>
@ 2021-06-02 13:25   ` Marek Szyprowski
  2021-06-02 13:51     ` Mark Rutland
  0 siblings, 1 reply; 73+ messages in thread
From: Marek Szyprowski @ 2021-06-02 13:25 UTC (permalink / raw)
  To: Will Deacon, linux-arm-kernel
  Cc: kernel-team, Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Arnd Bergmann, Vincent Whitchurch, Bartlomiej Zolnierkiewicz

Hi Will,

On 27.05.2021 14:43, Will Deacon wrote:
> Back in 97303480753e ("arm64: Increase the max granular size"),
> ARCH_DMA_MINALIGN was effectively increased to 128 bytes thanks to an
> increase in L1_CACHE_BYTES due to an unsubstantiated performance claim
> on the now obsolete ThunderX-1. Although this was reverted in
> d93277b9839b, ARCH_DMA_MINALIGN was kept at 128 bytes by ebc7e21e0fa2
> ("arm64: Increase ARCH_DMA_MINALIGN to 128").
>
> During discussion of the original patch, it was reported that the change
> also prevented a warning during boot on (again, now obsolete) Qualcomm
> server hardware where the cache writeback granule was larger than 64
> bytes. The reason for this warning was because non-coherent DMA could
> lead to data corruption due to unexpected writeback from the CPU where a
> cacheline is shared with other allocations.
>
> Since then, systems have appeared with larger cachelines still, and so
> commit 8f5c9037a55b ("arm64/mm: Correct the cache line size warning with
> non coherent device") reworked the warning so that it only appears on
> systems where non-coherent DMA is actually required and taints the
> kernel with TAINT_CPU_OUT_OF_SPEC. We are not aware of any systems, even
> including the aforementioned obsolete machines, which have a CWG larger
> than 64 bytes and require non-coherent DMA.
>
> More recently, it has been reported that a ARCH_DMA_MINALIGN of 128
> bytes wastes considerable memory (~6% immediately after boot on one
> system).
>
> Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> indicate if there are machines that unknowingly rely on this.
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
> Link: https://lore.kernel.org/linux-arm-kernel/1442944788-17254-1-git-send-email-rric@kernel.org/
> Link: https://lore.kernel.org/linux-arm-kernel/CAOZdJXUiRMAguDV+HEJqPg57MyBNqEcTyaH+ya=U93NHb-pdJA@mail.gmail.com/
> Link: https://lore.kernel.org/linux-arm-kernel/20190614131141.4428-1-msys.mizuma@gmail.com/
> Link: https://lore.kernel.org/r/20210517074332.28280-1-vincent.whitchurch@axis.com
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
This patch landed in todays linux-next as commit 65688d2a05de ("arm64: 
cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)"). It causes an 
issue on Raspberry Pi 3b board. System boots to userspace fine, but then 
it hangs somewhere during the init scripts after loading the modules. I 
didn't manage to track where it hangs yet though.
>   arch/arm64/include/asm/cache.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index a074459f8f2f..a9c0716e7440 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -47,7 +47,7 @@
>    * cache before the transfer is done, causing old data to be seen by
>    * the CPU.
>    */
> -#define ARCH_DMA_MINALIGN	(128)
> +#define ARCH_DMA_MINALIGN	L1_CACHE_BYTES
>   
>   #ifdef CONFIG_KASAN_SW_TAGS
>   #define ARCH_SLAB_MINALIGN	(1ULL << KASAN_SHADOW_SCALE_SHIFT)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 13:25   ` Marek Szyprowski
@ 2021-06-02 13:51     ` Mark Rutland
  2021-06-02 14:09       ` Marek Szyprowski
  2021-06-02 14:11       ` Arnd Bergmann
  0 siblings, 2 replies; 73+ messages in thread
From: Mark Rutland @ 2021-06-02 13:51 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Will Deacon, linux-arm-kernel, kernel-team, Catalin Marinas,
	Ard Biesheuvel, Arnd Bergmann, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

Hi Marek,

On Wed, Jun 02, 2021 at 03:25:41PM +0200, Marek Szyprowski wrote:
> On 27.05.2021 14:43, Will Deacon wrote:
> This patch landed in todays linux-next as commit 65688d2a05de ("arm64: 
> cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)"). It causes an 
> issue on Raspberry Pi 3b board. System boots to userspace fine, but then 
> it hangs somewhere during the init scripts after loading the modules. I 
> didn't manage to track where it hangs yet though.

Ouch!

I have a 3b in a drawer that I might be able to reproduce the issue
with; can you tell me how you're booting that kernel? e.g. which FW and
DT you're using?

Is your filesystem on the SD card, or some USB storage? I'm guessing
we'll need to stress DMA over one of those.

Thanks,
Mark.

> >   arch/arm64/include/asm/cache.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> > index a074459f8f2f..a9c0716e7440 100644
> > --- a/arch/arm64/include/asm/cache.h
> > +++ b/arch/arm64/include/asm/cache.h
> > @@ -47,7 +47,7 @@
> >    * cache before the transfer is done, causing old data to be seen by
> >    * the CPU.
> >    */
> > -#define ARCH_DMA_MINALIGN	(128)
> > +#define ARCH_DMA_MINALIGN	L1_CACHE_BYTES
> >   
> >   #ifdef CONFIG_KASAN_SW_TAGS
> >   #define ARCH_SLAB_MINALIGN	(1ULL << KASAN_SHADOW_SCALE_SHIFT)
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 13:51     ` Mark Rutland
@ 2021-06-02 14:09       ` Marek Szyprowski
  2021-06-02 14:14         ` Arnd Bergmann
  2021-06-04 10:01         ` Mark Rutland
  2021-06-02 14:11       ` Arnd Bergmann
  1 sibling, 2 replies; 73+ messages in thread
From: Marek Szyprowski @ 2021-06-02 14:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, linux-arm-kernel, kernel-team, Catalin Marinas,
	Ard Biesheuvel, Arnd Bergmann, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

Hi

On 02.06.2021 15:51, Mark Rutland wrote:
> Hi Marek,
>
> On Wed, Jun 02, 2021 at 03:25:41PM +0200, Marek Szyprowski wrote:
>> On 27.05.2021 14:43, Will Deacon wrote:
>> This patch landed in todays linux-next as commit 65688d2a05de ("arm64:
>> cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)"). It causes an
>> issue on Raspberry Pi 3b board. System boots to userspace fine, but then
>> it hangs somewhere during the init scripts after loading the modules. I
>> didn't manage to track where it hangs yet though.
> Ouch!
>
> I have a 3b in a drawer that I might be able to reproduce the issue
> with; can you tell me how you're booting that kernel? e.g. which FW and
> DT you're using?

I'm booting the kernel with the mainline dtb 
(arch/arm64/boot/dts/broadcom/bcm2837-rpi-3-b.dtb) from the u-boot, 
which downloads it via TFTP. I don't remember which firmware version is 
there, but without raspberry specific tools (which I don't have deployed 
there) it is hard to check that now. The rootfs is on SD card, the 
system is some older Debian release. Here is the last part of the boot 
log if it helps:

[    7.906329] Freeing unused kernel memory: 8512K
[    7.911793] Run /sbin/init as init process
INIT: version 2.88 booting
[info] Using makefile-style concurrent boot in runlevel S.
ERROR: could not open /proc/stat: No such file or directory
[....] Starting the hotplug events dispatcher: systemd-udevdstarting 
version 236
. ok
[....] Synthesizing the initial hotplug events...[   13.641287] 
bcm2835-rng 3f104000.rng: hwrng registered
[   13.896575] i2c-bcm2835 3f805000.i2c: Could not read clock-frequency 
property
done.
[   13.938691] debugfs: Directory '3f902000.hdmi' with parent 'vc4-hdmi' 
already present!
[   13.983716] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
[   13.991816] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
[   14.000945] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
[   14.009644] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
[   14.017547] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
[   14.025780] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
[   14.033995] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
[   14.042196] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
[....] Waiting for /dev to be fully populated...[   14.112812] [drm] 
Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
done.
[ ok ] Activating swap...done.
[....] Checking file systems...fsck from util-linux 2.29.2
done.
[ ok ] Cleaning up temporary files... /tmp.
[ ok ] Mounting local filesystems...done.
[ ok ] Activating swapfile swap...done.
[ ok ] Cleaning up temporary files....
[ ok ] Setting kernel variables...done.
[ ok ] Configuring network interfaces...done.
[ ok ] Starting RPC port mapper daemon: rpcbind.
[ ok ] Cleaning up temporary files....
[ ok ] Setting up ALSA...done.
[ ok ] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix.

Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 13:51     ` Mark Rutland
  2021-06-02 14:09       ` Marek Szyprowski
@ 2021-06-02 14:11       ` Arnd Bergmann
  2021-06-02 14:15         ` Marek Szyprowski
  1 sibling, 1 reply; 73+ messages in thread
From: Arnd Bergmann @ 2021-06-02 14:11 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marek Szyprowski, Will Deacon, Linux ARM, Android Kernel Team,
	Catalin Marinas, Ard Biesheuvel, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

On Wed, Jun 2, 2021 at 3:51 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, Jun 02, 2021 at 03:25:41PM +0200, Marek Szyprowski wrote:
> > On 27.05.2021 14:43, Will Deacon wrote:
> > This patch landed in todays linux-next as commit 65688d2a05de ("arm64:
> > cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)"). It causes an
> > issue on Raspberry Pi 3b board. System boots to userspace fine, but then
> > it hangs somewhere during the init scripts after loading the modules. I
> > didn't manage to track where it hangs yet though.
>
> Ouch!
>
> I have a 3b in a drawer that I might be able to reproduce the issue
> with; can you tell me how you're booting that kernel? e.g. which FW and
> DT you're using?
>
> Is your filesystem on the SD card, or some USB storage? I'm guessing
> we'll need to stress DMA over one of those.

It could be something other than DMA in this case, possibly an out-of-bounds
access into a dynamically allocated structure that happens to work when there
is extra data at the end of it. Running that kernel with KASAN should help
rule this out.

Running the same kernel on a Pi 4, or on a virtual machine could also help
figure out if there is something hardware specific.

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 14:09       ` Marek Szyprowski
@ 2021-06-02 14:14         ` Arnd Bergmann
  2021-06-02 14:28           ` Marek Szyprowski
  2021-06-04 10:01         ` Mark Rutland
  1 sibling, 1 reply; 73+ messages in thread
From: Arnd Bergmann @ 2021-06-02 14:14 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Mark Rutland, Will Deacon, Linux ARM, Android Kernel Team,
	Catalin Marinas, Ard Biesheuvel, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

On Wed, Jun 2, 2021 at 4:09 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> On 02.06.2021 15:51, Mark Rutland wrote:
> [   13.938691] debugfs: Directory '3f902000.hdmi' with parent 'vc4-hdmi'
> already present!
> [   13.983716] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
> [   13.991816] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
> [   14.000945] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
> [   14.009644] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
> [   14.017547] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
> [   14.025780] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
> [   14.033995] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
> [   14.042196] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
> [....] Waiting for /dev to be fully populated...[   14.112812] [drm]
> Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> done.

I wonder if the vc4's cache is involved here rather than the CPU cache.
If it's that, then removing the vc4 drivers should make it boot reliably, and
it might be possible to fix it by manually padding any allocations that get
passed to the GPU.

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 14:11       ` Arnd Bergmann
@ 2021-06-02 14:15         ` Marek Szyprowski
  0 siblings, 0 replies; 73+ messages in thread
From: Marek Szyprowski @ 2021-06-02 14:15 UTC (permalink / raw)
  To: Arnd Bergmann, Mark Rutland
  Cc: Will Deacon, Linux ARM, Android Kernel Team, Catalin Marinas,
	Ard Biesheuvel, Vincent Whitchurch, Bartlomiej Zolnierkiewicz

Hi Arnd,

On 02.06.2021 16:11, Arnd Bergmann wrote:
> On Wed, Jun 2, 2021 at 3:51 PM Mark Rutland <mark.rutland@arm.com> wrote:
>> On Wed, Jun 02, 2021 at 03:25:41PM +0200, Marek Szyprowski wrote:
>>> On 27.05.2021 14:43, Will Deacon wrote:
>>> This patch landed in todays linux-next as commit 65688d2a05de ("arm64:
>>> cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)"). It causes an
>>> issue on Raspberry Pi 3b board. System boots to userspace fine, but then
>>> it hangs somewhere during the init scripts after loading the modules. I
>>> didn't manage to track where it hangs yet though.
>> Ouch!
>>
>> I have a 3b in a drawer that I might be able to reproduce the issue
>> with; can you tell me how you're booting that kernel? e.g. which FW and
>> DT you're using?
>>
>> Is your filesystem on the SD card, or some USB storage? I'm guessing
>> we'll need to stress DMA over one of those.
> It could be something other than DMA in this case, possibly an out-of-bounds
> access into a dynamically allocated structure that happens to work when there
> is extra data at the end of it. Running that kernel with KASAN should help
> rule this out.

Okay, I will enable KASAN and give it a try.

> Running the same kernel on a Pi 4, or on a virtual machine could also help
> figure out if there is something hardware specific.

Exactly the same kernel binary boots fine on RPi4, virt (qemu) and a few 
other arm64 boards I have on my test farm.

Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 14:14         ` Arnd Bergmann
@ 2021-06-02 14:28           ` Marek Szyprowski
  2021-06-02 14:52             ` Arnd Bergmann
  0 siblings, 1 reply; 73+ messages in thread
From: Marek Szyprowski @ 2021-06-02 14:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Rutland, Will Deacon, Linux ARM, Android Kernel Team,
	Catalin Marinas, Ard Biesheuvel, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

On 02.06.2021 16:14, Arnd Bergmann wrote:
> On Wed, Jun 2, 2021 at 4:09 PM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 02.06.2021 15:51, Mark Rutland wrote:
>> [   13.938691] debugfs: Directory '3f902000.hdmi' with parent 'vc4-hdmi'
>> already present!
>> [   13.983716] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
>> [   13.991816] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
>> [   14.000945] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
>> [   14.009644] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
>> [   14.017547] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
>> [   14.025780] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
>> [   14.033995] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
>> [   14.042196] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
>> [....] Waiting for /dev to be fully populated...[   14.112812] [drm]
>> Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
>> done.
> I wonder if the vc4's cache is involved here rather than the CPU cache.
> If it's that, then removing the vc4 drivers should make it boot reliably, and
> it might be possible to fix it by manually padding any allocations that get
> passed to the GPU.

Indeed, after disabling DRM_VC4 in the .config, system boots fine and 
seems to be working correctly.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 14:28           ` Marek Szyprowski
@ 2021-06-02 14:52             ` Arnd Bergmann
  2021-06-07 12:17               ` Arnd Bergmann
  0 siblings, 1 reply; 73+ messages in thread
From: Arnd Bergmann @ 2021-06-02 14:52 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Mark Rutland, Will Deacon, Linux ARM, Android Kernel Team,
	Catalin Marinas, Ard Biesheuvel, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

On Wed, Jun 2, 2021 at 4:28 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> On 02.06.2021 16:14, Arnd Bergmann wrote:
> > On Wed, Jun 2, 2021 at 4:09 PM Marek Szyprowski
> > <m.szyprowski@samsung.com> wrote:
> >> On 02.06.2021 15:51, Mark Rutland wrote:
> >> [   13.938691] debugfs: Directory '3f902000.hdmi' with parent 'vc4-hdmi'
> >> already present!
> >> [   13.983716] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
> >> [   13.991816] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
> >> [   14.000945] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
> >> [   14.009644] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
> >> [   14.017547] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
> >> [   14.025780] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
> >> [   14.033995] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
> >> [   14.042196] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
> >> [....] Waiting for /dev to be fully populated...[   14.112812] [drm]
> >> Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> >> done.
> > I wonder if the vc4's cache is involved here rather than the CPU cache.
> > If it's that, then removing the vc4 drivers should make it boot reliably, and
> > it might be possible to fix it by manually padding any allocations that get
> > passed to the GPU.
>
> Indeed, after disabling DRM_VC4 in the .config, system boots fine and
> seems to be working correctly.

Ok, that helps. This means there is a good chance it would be one of these:

$ git grep k[cmz]alloc drivers/gpu/drm/vc4
drivers/gpu/drm/vc4/vc4_bo.c:           new_list =
kmalloc_array(new_size, sizeof(struct list_head),
drivers/gpu/drm/vc4/vc4_bo.c:   bo = kzalloc(sizeof(*bo), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_bo.c:   vc4->bo_labels =
kcalloc(VC4_BO_TYPE_COUNT, sizeof(*vc4->bo_labels),
drivers/gpu/drm/vc4/vc4_crtc.c: flip_state =
kzalloc(sizeof(*flip_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_crtc.c: vc4_state =
kzalloc(sizeof(*vc4_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_crtc.c: vc4_crtc_state =
kzalloc(sizeof(*vc4_crtc_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_crtc.c: vc4_crtc = devm_kzalloc(dev,
sizeof(*vc4_crtc), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_debugfs.c:              devm_kzalloc(dev->dev,
sizeof(*entry), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_dpi.c:  dpi = devm_kzalloc(dev, sizeof(*dpi),
GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_dpi.c:  vc4_dpi_encoder = devm_kzalloc(dev,
sizeof(*vc4_dpi_encoder),
drivers/gpu/drm/vc4/vc4_drv.c:  vc4file = kzalloc(sizeof(*vc4file), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_dsi.c:  dsi->clk_onecell = devm_kzalloc(dev,
drivers/gpu/drm/vc4/vc4_dsi.c:  vc4_dsi_encoder = devm_kzalloc(dev,
sizeof(*vc4_dsi_encoder),
drivers/gpu/drm/vc4/vc4_dsi.c:  dsi = devm_kzalloc(dev, sizeof(*dsi),
GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_gem.c:  bo_state = kcalloc(state->bo_count,
sizeof(*bo_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_gem.c:  kernel_state = kcalloc(1,
sizeof(*kernel_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_gem.c:  kernel_state->bo = kcalloc(state->bo_count,
drivers/gpu/drm/vc4/vc4_gem.c:  fence = kzalloc(sizeof(*fence), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_gem.c:  exec = kcalloc(1, sizeof(*exec), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_hdmi.c:         kzalloc(sizeof(*new_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_hdmi.c: new_state =
kzalloc(sizeof(*new_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_hdmi.c: regs = kcalloc(variant->num_registers,
sizeof(*regs),
drivers/gpu/drm/vc4/vc4_hdmi.c: vc4_hdmi = devm_kzalloc(dev,
sizeof(*vc4_hdmi), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_hvs.c:  hvs = devm_kzalloc(&pdev->dev,
sizeof(*hvs), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_kms.c:  ctm_state =
kzalloc(sizeof(*ctm_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_kms.c:  load_state =
kzalloc(sizeof(*load_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_kms.c:  state = kzalloc(sizeof(*state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_kms.c:  state = kzalloc(sizeof(*state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_perfmon.c:      perfmon =
kzalloc(struct_size(perfmon, counters, req->ncounters),
drivers/gpu/drm/vc4/vc4_plane.c:        vc4_state =
kzalloc(sizeof(*vc4_state), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_plane.c:                u32 *new_dlist =
kmalloc_array(new_size, 4, GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_plane.c:        vc4_plane =
devm_kzalloc(dev->dev, sizeof(*vc4_plane),
drivers/gpu/drm/vc4/vc4_txp.c:  txp = devm_kzalloc(dev, sizeof(*txp),
GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_v3d.c:  v3d = devm_kzalloc(&pdev->dev,
sizeof(*v3d), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_validate_shaders.c:
kcalloc(BITS_TO_LONGS(validation_state.max_ip),
drivers/gpu/drm/vc4/vc4_validate_shaders.c:     validated_shader =
kcalloc(1, sizeof(*validated_shader), GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_vec.c:  vec_connector = devm_kzalloc(dev->dev,
sizeof(*vec_connector),
drivers/gpu/drm/vc4/vc4_vec.c:  vec = devm_kzalloc(dev, sizeof(*vec),
GFP_KERNEL);
drivers/gpu/drm/vc4/vc4_vec.c:  vc4_vec_encoder = devm_kzalloc(dev,
sizeof(*vc4_vec_encoder),

I suppose most can be easily ruled out because they are not shared, or
because they are larger
than 64 bytes. I'll have a quick look if I find any smoking guns.

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 14:09       ` Marek Szyprowski
  2021-06-02 14:14         ` Arnd Bergmann
@ 2021-06-04 10:01         ` Mark Rutland
  2021-06-07  9:58           ` Marek Szyprowski
  1 sibling, 1 reply; 73+ messages in thread
From: Mark Rutland @ 2021-06-04 10:01 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Will Deacon, linux-arm-kernel, kernel-team, Catalin Marinas,
	Ard Biesheuvel, Arnd Bergmann, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

Hi Marek,

On Wed, Jun 02, 2021 at 04:09:19PM +0200, Marek Szyprowski wrote:
> On 02.06.2021 15:51, Mark Rutland wrote:
> > On Wed, Jun 02, 2021 at 03:25:41PM +0200, Marek Szyprowski wrote:
> >> On 27.05.2021 14:43, Will Deacon wrote:
> >> This patch landed in todays linux-next as commit 65688d2a05de ("arm64:
> >> cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)"). It causes an
> >> issue on Raspberry Pi 3b board. System boots to userspace fine, but then
> >> it hangs somewhere during the init scripts after loading the modules. I
> >> didn't manage to track where it hangs yet though.
> > Ouch!
> >
> > I have a 3b in a drawer that I might be able to reproduce the issue
> > with; can you tell me how you're booting that kernel? e.g. which FW and
> > DT you're using?
> 
> I'm booting the kernel with the mainline dtb 
> (arch/arm64/boot/dts/broadcom/bcm2837-rpi-3-b.dtb) from the u-boot, 
> which downloads it via TFTP. I don't remember which firmware version is 
> there, but without raspberry specific tools (which I don't have deployed 
> there) it is hard to check that now.

Thanks, this was enough info to get started.

For comparison, I have a 3Bv1.2 board.

I grabbed the latest RPI firmware, built myself a v2021.07-rc3
rpi_3_defconfig u-boot, and got a kernel booting (off the SD card rather
than over the network).

For the kernel I'm testing commit 65688d2a05de; defconfig with DRM and
VC4 built-in, since passing modules around is painful in my setup.

> The rootfs is on SD card, the system is some older Debian release.

For comparison, I built myself a buildroot 2021.02.2 filesystem.

So far, booting up an running I'm not seeeing issues (and no complaints
from KASAN or similar), but I don't have a good way to stress the VC4
GPU, so I might not be triggering whatever's going wrong.

From the log below I see the last message is about X sockets -- is the
lockup happening when the display manager starts?

Thanks,
Mark.

> Here is the last part of the boot 
> log if it helps:
> 
> [    7.906329] Freeing unused kernel memory: 8512K
> [    7.911793] Run /sbin/init as init process
> INIT: version 2.88 booting
> [info] Using makefile-style concurrent boot in runlevel S.
> ERROR: could not open /proc/stat: No such file or directory
> [....] Starting the hotplug events dispatcher: systemd-udevdstarting 
> version 236
> . ok
> [....] Synthesizing the initial hotplug events...[   13.641287] 
> bcm2835-rng 3f104000.rng: hwrng registered
> [   13.896575] i2c-bcm2835 3f805000.i2c: Could not read clock-frequency 
> property
> done.
> [   13.938691] debugfs: Directory '3f902000.hdmi' with parent 'vc4-hdmi' 
> already present!
> [   13.983716] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
> [   13.991816] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
> [   14.000945] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
> [   14.009644] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
> [   14.017547] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
> [   14.025780] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
> [   14.033995] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
> [   14.042196] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
> [....] Waiting for /dev to be fully populated...[   14.112812] [drm] 
> Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> done.
> [ ok ] Activating swap...done.
> [....] Checking file systems...fsck from util-linux 2.29.2
> done.
> [ ok ] Cleaning up temporary files... /tmp.
> [ ok ] Mounting local filesystems...done.
> [ ok ] Activating swapfile swap...done.
> [ ok ] Cleaning up temporary files....
> [ ok ] Setting kernel variables...done.
> [ ok ] Configuring network interfaces...done.
> [ ok ] Starting RPC port mapper daemon: rpcbind.
> [ ok ] Cleaning up temporary files....
> [ ok ] Setting up ALSA...done.
> [ ok ] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix.
> 
> Best regards
> 
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-04 10:01         ` Mark Rutland
@ 2021-06-07  9:58           ` Marek Szyprowski
  2021-06-07 12:01             ` Mark Rutland
  0 siblings, 1 reply; 73+ messages in thread
From: Marek Szyprowski @ 2021-06-07  9:58 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, linux-arm-kernel, kernel-team, Catalin Marinas,
	Ard Biesheuvel, Arnd Bergmann, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

Hi Mark,

On 04.06.2021 12:01, Mark Rutland wrote:
> On Wed, Jun 02, 2021 at 04:09:19PM +0200, Marek Szyprowski wrote:
>> On 02.06.2021 15:51, Mark Rutland wrote:
>>> On Wed, Jun 02, 2021 at 03:25:41PM +0200, Marek Szyprowski wrote:
>>>> On 27.05.2021 14:43, Will Deacon wrote:
>>>> This patch landed in todays linux-next as commit 65688d2a05de ("arm64:
>>>> cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)"). It causes an
>>>> issue on Raspberry Pi 3b board. System boots to userspace fine, but then
>>>> it hangs somewhere during the init scripts after loading the modules. I
>>>> didn't manage to track where it hangs yet though.
>>> Ouch!
>>>
>>> I have a 3b in a drawer that I might be able to reproduce the issue
>>> with; can you tell me how you're booting that kernel? e.g. which FW and
>>> DT you're using?
>> I'm booting the kernel with the mainline dtb
>> (arch/arm64/boot/dts/broadcom/bcm2837-rpi-3-b.dtb) from the u-boot,
>> which downloads it via TFTP. I don't remember which firmware version is
>> there, but without raspberry specific tools (which I don't have deployed
>> there) it is hard to check that now.
> Thanks, this was enough info to get started.
>
> For comparison, I have a 3Bv1.2 board.
>
> I grabbed the latest RPI firmware, built myself a v2021.07-rc3
> rpi_3_defconfig u-boot, and got a kernel booting (off the SD card rather
> than over the network).
>
> For the kernel I'm testing commit 65688d2a05de; defconfig with DRM and
> VC4 built-in, since passing modules around is painful in my setup.
>
>> The rootfs is on SD card, the system is some older Debian release.
> For comparison, I built myself a buildroot 2021.02.2 filesystem.
>
> So far, booting up an running I'm not seeeing issues (and no complaints
> from KASAN or similar), but I don't have a good way to stress the VC4
> GPU, so I might not be triggering whatever's going wrong.
>
> >From the log below I see the last message is about X sockets -- is the
> lockup happening when the display manager starts?

I've just checked with the latest firmware from 
https://github.com/raspberrypi/firmware (master branch, just copied 
everything to /boot) and the issue is still there.

If you start from arm64/defconfig without modules, please make sure you 
have enabled all RPi drivers, otherwise VC4 DRM won't come up. I've 
managed to reproduce the issue without the modules with the following 
changes to arm64's defconfig:

./scripts/config -e DRM -e DRM_VC4 -e CONFIG_CLK_RASPBERRYPI -e 
CONFIG_SENSORS_RASPBERRYPI_HWMON -e CONFIG_I2C_BCM2835 -e 
CONFIG_ARM_RASPBERRYPI_CPUFREQ

Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07  9:58           ` Marek Szyprowski
@ 2021-06-07 12:01             ` Mark Rutland
  2021-06-07 13:08               ` Mark Rutland
  0 siblings, 1 reply; 73+ messages in thread
From: Mark Rutland @ 2021-06-07 12:01 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Will Deacon, linux-arm-kernel, kernel-team, Catalin Marinas,
	Ard Biesheuvel, Arnd Bergmann, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> Hi Mark,

> I've just checked with the latest firmware from 
> https://github.com/raspberrypi/firmware (master branch, just copied 
> everything to /boot) and the issue is still there.
> 
> If you start from arm64/defconfig without modules, please make sure you 
> have enabled all RPi drivers, otherwise VC4 DRM won't come up. I've 
> managed to reproduce the issue without the modules with the following 
> changes to arm64's defconfig:
> 
> ./scripts/config -e DRM -e DRM_VC4 -e CONFIG_CLK_RASPBERRYPI -e 
> CONFIG_SENSORS_RASPBERRYPI_HWMON -e CONFIG_I2C_BCM2835 -e 
> CONFIG_ARM_RASPBERRYPI_CPUFREQ

Thanks for this!

With that config on commit 65688d2a05deb9f0 I also see a hang at the end
of boot, but before reaching userspace, with the last messages in dmesg
as below.

I'll go check that the ARCH_DMA_MINALIGN affects this, then I'll go play
with debug options.

| [    1.397102] 3f201000.serial: ttyAMA0 at MMIO 0x3f201000 (irq = 99, base_baud = 0) is a PL011 rev2
| [    1.406437] serial serial0: tty port ttyAMA0 registered
| [    1.408914] sdhost-bcm2835 3f202000.mmc: loaded - DMA enabled (>1)
| [    1.430510] raspberrypi-firmware soc:firmware: Attached to firmware from 2021-05-24T19:52:58
| [    1.504953] mmc0: host does not support reading read-only switch, assuming write-enable
| [    1.514908] mmc0: new high speed SDXC card at address 0001
| [    1.521626] mmcblk0: mmc0:0001 00000 59.6 GiB
| [    1.528406]  mmcblk0: p1 p2
| [    1.644694] debugfs: Directory '3f902000.hdmi' with parent 'vc4-hdmi' already present!
| [    1.654544] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
| [    1.661078] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
| [    1.667401] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
| [    1.673760] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
| [    1.680096] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
| [    1.687157] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
| [    1.694172] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
| [    1.701170] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
| [    1.709277] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-02 14:52             ` Arnd Bergmann
@ 2021-06-07 12:17               ` Arnd Bergmann
  0 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-06-07 12:17 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Mark Rutland, Will Deacon, Linux ARM, Android Kernel Team,
	Catalin Marinas, Ard Biesheuvel, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

On Wed, Jun 2, 2021 at 4:52 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Wed, Jun 2, 2021 at 4:28 PM Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> I suppose most can be easily ruled out because they are not shared, or
> because they are larger than 64 bytes. I'll have a quick look if I find any smoking guns.

Sorry for the late follow-up. I did check these last week, but
unfortunately none of
the allocations in the vc4 driver itself  looked suspicious to me in
the end. It might
still be an indirect allocation though, where the vc4 driver calls
into a drivers/drm/
API or something else that allocates a small DMA buffer.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07 12:01             ` Mark Rutland
@ 2021-06-07 13:08               ` Mark Rutland
  2021-06-07 13:39                   ` Will Deacon
  0 siblings, 1 reply; 73+ messages in thread
From: Mark Rutland @ 2021-06-07 13:08 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Will Deacon, linux-arm-kernel, kernel-team, Catalin Marinas,
	Ard Biesheuvel, Arnd Bergmann, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz

On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > Hi Mark,
> 
> > I've just checked with the latest firmware from 
> > https://github.com/raspberrypi/firmware (master branch, just copied 
> > everything to /boot) and the issue is still there.
> > 
> > If you start from arm64/defconfig without modules, please make sure you 
> > have enabled all RPi drivers, otherwise VC4 DRM won't come up. I've 
> > managed to reproduce the issue without the modules with the following 
> > changes to arm64's defconfig:
> > 
> > ./scripts/config -e DRM -e DRM_VC4 -e CONFIG_CLK_RASPBERRYPI -e 
> > CONFIG_SENSORS_RASPBERRYPI_HWMON -e CONFIG_I2C_BCM2835 -e 
> > CONFIG_ARM_RASPBERRYPI_CPUFREQ
> 
> Thanks for this!
> 
> With that config on commit 65688d2a05deb9f0 I also see a hang at the end
> of boot, but before reaching userspace, with the last messages in dmesg
> as below.
> 
> I'll go check that the ARCH_DMA_MINALIGN affects this, then I'll go play
> with debug options.

I can confirm that with the ARCH_DMA_MINALIGN change reverted, the hang
goes away. Running with that reverted andwith KASAN, I get the
slab-out-of-bounds splat below, which occurs at the time the hang would
otherwise occur, and is possibly the problem:

[    3.609515] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
[    3.621451] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
[    3.628344] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
[    3.635904] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
[    3.643351] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
[    3.651238] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
[    3.659167] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
[    3.666499] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
[    3.688560] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
[    3.728010] ==================================================================
[    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
[    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7
[    3.728153]
[    3.728169] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.13.0-rc3-00009-g694c523e7267 #3
[    3.728203] Hardware name: Raspberry Pi 3 Model B (DT)
[    3.728225] Workqueue: events_unbound deferred_probe_work_func
[    3.728290] Call trace:
[    3.728301]  dump_backtrace+0x0/0x2b4
[    3.728358]  show_stack+0x1c/0x30
[    3.728407]  dump_stack+0xfc/0x168
[    3.728445]  print_address_description.constprop.0+0x2c/0x2c0
[    3.728495]  kasan_report+0x1dc/0x240
[    3.728529]  __asan_load8+0x98/0xd4
[    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
[    3.728621]  commit_tail+0x100/0x210
[    3.728675]  drm_atomic_helper_commit+0x1c4/0x3dc
[    3.728730]  drm_atomic_commit+0x80/0x94
[    3.728768]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
[    3.728821]  drm_client_modeset_commit_locked+0x8c/0x230
[    3.728872]  drm_fb_helper_pan_display+0x164/0x3a0
[    3.728924]  fb_pan_display+0x12c/0x1fc
[    3.728963]  bit_update_start+0x34/0xa0
[    3.729013]  fbcon_switch+0x678/0x920
[    3.729058]  redraw_screen+0x17c/0x35c
[    3.729095]  fbcon_prepare_logo+0x484/0x5bc
[    3.729143]  fbcon_init+0x77c/0x970
[    3.729187]  visual_init+0x14c/0x1e4
[    3.729239]  do_bind_con_driver.isra.0+0x2c4/0x530
[    3.729279]  do_take_over_console+0x200/0x2e0
[    3.729317]  do_fbcon_takeover+0x90/0x120
[    3.729363]  fbcon_fb_registered+0x14c/0x164
[    3.729412]  register_framebuffer+0x308/0x4e0
[    3.729451]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
[    3.729506]  drm_fbdev_client_hotplug+0x204/0x374
[    3.729556]  drm_fbdev_generic_setup+0xf4/0x24c
[    3.729604]  vc4_drm_bind+0x1d4/0x1f0
[    3.729654]  try_to_bring_up_master+0x254/0x2dc
[    3.729709]  __component_add+0x10c/0x240
[    3.729759]  component_add+0x18/0x24
[    3.729807]  vc4_v3d_dev_probe+0x20/0x30
[    3.729854]  platform_probe+0x90/0x110
[    3.729907]  really_probe+0x148/0x744
[    3.729952]  driver_probe_device+0x8c/0xfc
[    3.729998]  __device_attach_driver+0x120/0x180
[    3.730048]  bus_for_each_drv+0xf4/0x15c
[    3.730091]  __device_attach+0x168/0x250
[    3.730137]  device_initial_probe+0x18/0x24
[    3.730186]  bus_probe_device+0xec/0x100
[    3.730230]  deferred_probe_work_func+0xe8/0x130
[    3.730279]  process_one_work+0x3b8/0x650
[    3.730319]  worker_thread+0x3cc/0x72c
[    3.730356]  kthread+0x21c/0x224
[    3.730402]  ret_from_fork+0x10/0x38
[    3.730442]
[    3.730453] Allocated by task 7:
[    3.730470]  kasan_save_stack+0x2c/0x60
[    3.730526]  __kasan_kmalloc+0x90/0xb4
[    3.730577]  vc4_hvs_channels_duplicate_state+0x60/0x1a0
[    3.730637]  drm_atomic_get_private_obj_state+0x144/0x230
[    3.730680]  vc4_atomic_check+0x40/0x73c
[    3.730732]  drm_atomic_check_only+0x998/0xe60
[    3.730769]  drm_atomic_commit+0x34/0x94
[    3.730804]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
[    3.730854]  drm_client_modeset_commit_locked+0x8c/0x230
[    3.730904]  drm_client_modeset_commit+0x38/0x60
[    3.730951]  drm_fb_helper_set_par+0x104/0x17c
[    3.730998]  fbcon_init+0x43c/0x970
[    3.731041]  visual_init+0x14c/0x1e4
[    3.731090]  do_bind_con_driver.isra.0+0x2c4/0x530
[    3.731128]  do_take_over_console+0x200/0x2e0
[    3.731165]  do_fbcon_takeover+0x90/0x120
[    3.731210]  fbcon_fb_registered+0x14c/0x164
[    3.731258]  register_framebuffer+0x308/0x4e0
[    3.731296]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
[    3.731349]  drm_fbdev_client_hotplug+0x204/0x374
[    3.731398]  drm_fbdev_generic_setup+0xf4/0x24c
[    3.731446]  vc4_drm_bind+0x1d4/0x1f0
[    3.731493]  try_to_bring_up_master+0x254/0x2dc
[    3.731546]  __component_add+0x10c/0x240
[    3.731594]  component_add+0x18/0x24
[    3.731642]  vc4_v3d_dev_probe+0x20/0x30
[    3.731686]  platform_probe+0x90/0x110
[    3.731737]  really_probe+0x148/0x744
[    3.731781]  driver_probe_device+0x8c/0xfc
[    3.731827]  __device_attach_driver+0x120/0x180
[    3.731875]  bus_for_each_drv+0xf4/0x15c
[    3.731916]  __device_attach+0x168/0x250
[    3.731962]  device_initial_probe+0x18/0x24
[    3.732009]  bus_probe_device+0xec/0x100
[    3.732052]  deferred_probe_work_func+0xe8/0x130
[    3.732100]  process_one_work+0x3b8/0x650
[    3.732137]  worker_thread+0x3cc/0x72c
[    3.732172]  kthread+0x21c/0x224
[    3.732215]  ret_from_fork+0x10/0x38
[    3.732253]
[    3.732262] The buggy address belongs to the object at ffff000007360400
[    3.732262]  which belongs to the cache kmalloc-128 of size 128
[    3.732293] The buggy address is located 64 bytes inside of
[    3.732293]  128-byte region [ffff000007360400, ffff000007360480)
[    3.732329] The buggy address belongs to the page:
[    3.732344] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7360
[    3.732380] flags: 0x3fffc0000000200(slab|node=0|zone=0|lastcpupid=0xffff)
[    3.732442] raw: 03fffc0000000200 dead000000000100 dead000000000122 ffff000004c02300
[    3.732478] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[    3.732501] page dumped because: kasan: bad access detected
[    3.732518]
[    3.732527] Memory state around the buggy address:
[    3.732549]  ffff000007360300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    3.732579]  ffff000007360380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[    3.732608] >ffff000007360400: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
[    3.732629]                                            ^
[    3.732652]  ffff000007360480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[    3.732682]  ffff000007360500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    3.732703] ==================================================================
[    3.732718] Disabling lock debugging due to kernel taint
[    3.769129] Console: switching to colour frame buffer device 90x30
[    5.148699] vc4-drm soc:gpu: [drm] fb0: vc4drmfb frame buffer device

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07 13:08               ` Mark Rutland
@ 2021-06-07 13:39                   ` Will Deacon
  0 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-06-07 13:39 UTC (permalink / raw)
  To: Mark Rutland, emma, mripard
  Cc: Marek Szyprowski, linux-arm-kernel, kernel-team, Catalin Marinas,
	Ard Biesheuvel, Arnd Bergmann, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz, dri-devel

[Adding VC4 folks -- please see the KASAN splat below!]

Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
-next) is causing vc4 to hang on Rpi3b due to a probable driver bug.

Will

On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > > I've just checked with the latest firmware from 
> > > https://github.com/raspberrypi/firmware (master branch, just copied 
> > > everything to /boot) and the issue is still there.
> > > 
> > > If you start from arm64/defconfig without modules, please make sure you 
> > > have enabled all RPi drivers, otherwise VC4 DRM won't come up. I've 
> > > managed to reproduce the issue without the modules with the following 
> > > changes to arm64's defconfig:
> > > 
> > > ./scripts/config -e DRM -e DRM_VC4 -e CONFIG_CLK_RASPBERRYPI -e 
> > > CONFIG_SENSORS_RASPBERRYPI_HWMON -e CONFIG_I2C_BCM2835 -e 
> > > CONFIG_ARM_RASPBERRYPI_CPUFREQ
> > 
> > Thanks for this!
> > 
> > With that config on commit 65688d2a05deb9f0 I also see a hang at the end
> > of boot, but before reaching userspace, with the last messages in dmesg
> > as below.
> > 
> > I'll go check that the ARCH_DMA_MINALIGN affects this, then I'll go play
> > with debug options.
> 
> I can confirm that with the ARCH_DMA_MINALIGN change reverted, the hang
> goes away. Running with that reverted andwith KASAN, I get the
> slab-out-of-bounds splat below, which occurs at the time the hang would
> otherwise occur, and is possibly the problem:
> 
> [    3.609515] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
> [    3.621451] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
> [    3.628344] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
> [    3.635904] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
> [    3.643351] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
> [    3.651238] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
> [    3.659167] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
> [    3.666499] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
> [    3.688560] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> [    3.728010] ==================================================================
> [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7
> [    3.728153]
> [    3.728169] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.13.0-rc3-00009-g694c523e7267 #3
> [    3.728203] Hardware name: Raspberry Pi 3 Model B (DT)
> [    3.728225] Workqueue: events_unbound deferred_probe_work_func
> [    3.728290] Call trace:
> [    3.728301]  dump_backtrace+0x0/0x2b4
> [    3.728358]  show_stack+0x1c/0x30
> [    3.728407]  dump_stack+0xfc/0x168
> [    3.728445]  print_address_description.constprop.0+0x2c/0x2c0
> [    3.728495]  kasan_report+0x1dc/0x240
> [    3.728529]  __asan_load8+0x98/0xd4
> [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
> [    3.728621]  commit_tail+0x100/0x210
> [    3.728675]  drm_atomic_helper_commit+0x1c4/0x3dc
> [    3.728730]  drm_atomic_commit+0x80/0x94
> [    3.728768]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
> [    3.728821]  drm_client_modeset_commit_locked+0x8c/0x230
> [    3.728872]  drm_fb_helper_pan_display+0x164/0x3a0
> [    3.728924]  fb_pan_display+0x12c/0x1fc
> [    3.728963]  bit_update_start+0x34/0xa0
> [    3.729013]  fbcon_switch+0x678/0x920
> [    3.729058]  redraw_screen+0x17c/0x35c
> [    3.729095]  fbcon_prepare_logo+0x484/0x5bc
> [    3.729143]  fbcon_init+0x77c/0x970
> [    3.729187]  visual_init+0x14c/0x1e4
> [    3.729239]  do_bind_con_driver.isra.0+0x2c4/0x530
> [    3.729279]  do_take_over_console+0x200/0x2e0
> [    3.729317]  do_fbcon_takeover+0x90/0x120
> [    3.729363]  fbcon_fb_registered+0x14c/0x164
> [    3.729412]  register_framebuffer+0x308/0x4e0
> [    3.729451]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
> [    3.729506]  drm_fbdev_client_hotplug+0x204/0x374
> [    3.729556]  drm_fbdev_generic_setup+0xf4/0x24c
> [    3.729604]  vc4_drm_bind+0x1d4/0x1f0
> [    3.729654]  try_to_bring_up_master+0x254/0x2dc
> [    3.729709]  __component_add+0x10c/0x240
> [    3.729759]  component_add+0x18/0x24
> [    3.729807]  vc4_v3d_dev_probe+0x20/0x30
> [    3.729854]  platform_probe+0x90/0x110
> [    3.729907]  really_probe+0x148/0x744
> [    3.729952]  driver_probe_device+0x8c/0xfc
> [    3.729998]  __device_attach_driver+0x120/0x180
> [    3.730048]  bus_for_each_drv+0xf4/0x15c
> [    3.730091]  __device_attach+0x168/0x250
> [    3.730137]  device_initial_probe+0x18/0x24
> [    3.730186]  bus_probe_device+0xec/0x100
> [    3.730230]  deferred_probe_work_func+0xe8/0x130
> [    3.730279]  process_one_work+0x3b8/0x650
> [    3.730319]  worker_thread+0x3cc/0x72c
> [    3.730356]  kthread+0x21c/0x224
> [    3.730402]  ret_from_fork+0x10/0x38
> [    3.730442]
> [    3.730453] Allocated by task 7:
> [    3.730470]  kasan_save_stack+0x2c/0x60
> [    3.730526]  __kasan_kmalloc+0x90/0xb4
> [    3.730577]  vc4_hvs_channels_duplicate_state+0x60/0x1a0
> [    3.730637]  drm_atomic_get_private_obj_state+0x144/0x230
> [    3.730680]  vc4_atomic_check+0x40/0x73c
> [    3.730732]  drm_atomic_check_only+0x998/0xe60
> [    3.730769]  drm_atomic_commit+0x34/0x94
> [    3.730804]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
> [    3.730854]  drm_client_modeset_commit_locked+0x8c/0x230
> [    3.730904]  drm_client_modeset_commit+0x38/0x60
> [    3.730951]  drm_fb_helper_set_par+0x104/0x17c
> [    3.730998]  fbcon_init+0x43c/0x970
> [    3.731041]  visual_init+0x14c/0x1e4
> [    3.731090]  do_bind_con_driver.isra.0+0x2c4/0x530
> [    3.731128]  do_take_over_console+0x200/0x2e0
> [    3.731165]  do_fbcon_takeover+0x90/0x120
> [    3.731210]  fbcon_fb_registered+0x14c/0x164
> [    3.731258]  register_framebuffer+0x308/0x4e0
> [    3.731296]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
> [    3.731349]  drm_fbdev_client_hotplug+0x204/0x374
> [    3.731398]  drm_fbdev_generic_setup+0xf4/0x24c
> [    3.731446]  vc4_drm_bind+0x1d4/0x1f0
> [    3.731493]  try_to_bring_up_master+0x254/0x2dc
> [    3.731546]  __component_add+0x10c/0x240
> [    3.731594]  component_add+0x18/0x24
> [    3.731642]  vc4_v3d_dev_probe+0x20/0x30
> [    3.731686]  platform_probe+0x90/0x110
> [    3.731737]  really_probe+0x148/0x744
> [    3.731781]  driver_probe_device+0x8c/0xfc
> [    3.731827]  __device_attach_driver+0x120/0x180
> [    3.731875]  bus_for_each_drv+0xf4/0x15c
> [    3.731916]  __device_attach+0x168/0x250
> [    3.731962]  device_initial_probe+0x18/0x24
> [    3.732009]  bus_probe_device+0xec/0x100
> [    3.732052]  deferred_probe_work_func+0xe8/0x130
> [    3.732100]  process_one_work+0x3b8/0x650
> [    3.732137]  worker_thread+0x3cc/0x72c
> [    3.732172]  kthread+0x21c/0x224
> [    3.732215]  ret_from_fork+0x10/0x38
> [    3.732253]
> [    3.732262] The buggy address belongs to the object at ffff000007360400
> [    3.732262]  which belongs to the cache kmalloc-128 of size 128
> [    3.732293] The buggy address is located 64 bytes inside of
> [    3.732293]  128-byte region [ffff000007360400, ffff000007360480)
> [    3.732329] The buggy address belongs to the page:
> [    3.732344] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7360
> [    3.732380] flags: 0x3fffc0000000200(slab|node=0|zone=0|lastcpupid=0xffff)
> [    3.732442] raw: 03fffc0000000200 dead000000000100 dead000000000122 ffff000004c02300
> [    3.732478] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> [    3.732501] page dumped because: kasan: bad access detected
> [    3.732518]
> [    3.732527] Memory state around the buggy address:
> [    3.732549]  ffff000007360300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    3.732579]  ffff000007360380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    3.732608] >ffff000007360400: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
> [    3.732629]                                            ^
> [    3.732652]  ffff000007360480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    3.732682]  ffff000007360500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    3.732703] ==================================================================
> [    3.732718] Disabling lock debugging due to kernel taint
> [    3.769129] Console: switching to colour frame buffer device 90x30
> [    5.148699] vc4-drm soc:gpu: [drm] fb0: vc4drmfb frame buffer device

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-06-07 13:39                   ` Will Deacon
  0 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-06-07 13:39 UTC (permalink / raw)
  To: Mark Rutland, emma, mripard
  Cc: Arnd Bergmann, Bartlomiej Zolnierkiewicz, Catalin Marinas,
	Vincent Whitchurch, dri-devel, kernel-team, Ard Biesheuvel,
	linux-arm-kernel, Marek Szyprowski

[Adding VC4 folks -- please see the KASAN splat below!]

Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
-next) is causing vc4 to hang on Rpi3b due to a probable driver bug.

Will

On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > > I've just checked with the latest firmware from 
> > > https://github.com/raspberrypi/firmware (master branch, just copied 
> > > everything to /boot) and the issue is still there.
> > > 
> > > If you start from arm64/defconfig without modules, please make sure you 
> > > have enabled all RPi drivers, otherwise VC4 DRM won't come up. I've 
> > > managed to reproduce the issue without the modules with the following 
> > > changes to arm64's defconfig:
> > > 
> > > ./scripts/config -e DRM -e DRM_VC4 -e CONFIG_CLK_RASPBERRYPI -e 
> > > CONFIG_SENSORS_RASPBERRYPI_HWMON -e CONFIG_I2C_BCM2835 -e 
> > > CONFIG_ARM_RASPBERRYPI_CPUFREQ
> > 
> > Thanks for this!
> > 
> > With that config on commit 65688d2a05deb9f0 I also see a hang at the end
> > of boot, but before reaching userspace, with the last messages in dmesg
> > as below.
> > 
> > I'll go check that the ARCH_DMA_MINALIGN affects this, then I'll go play
> > with debug options.
> 
> I can confirm that with the ARCH_DMA_MINALIGN change reverted, the hang
> goes away. Running with that reverted andwith KASAN, I get the
> slab-out-of-bounds splat below, which occurs at the time the hang would
> otherwise occur, and is possibly the problem:
> 
> [    3.609515] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
> [    3.621451] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
> [    3.628344] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
> [    3.635904] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
> [    3.643351] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
> [    3.651238] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
> [    3.659167] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
> [    3.666499] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
> [    3.688560] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> [    3.728010] ==================================================================
> [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7
> [    3.728153]
> [    3.728169] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.13.0-rc3-00009-g694c523e7267 #3
> [    3.728203] Hardware name: Raspberry Pi 3 Model B (DT)
> [    3.728225] Workqueue: events_unbound deferred_probe_work_func
> [    3.728290] Call trace:
> [    3.728301]  dump_backtrace+0x0/0x2b4
> [    3.728358]  show_stack+0x1c/0x30
> [    3.728407]  dump_stack+0xfc/0x168
> [    3.728445]  print_address_description.constprop.0+0x2c/0x2c0
> [    3.728495]  kasan_report+0x1dc/0x240
> [    3.728529]  __asan_load8+0x98/0xd4
> [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
> [    3.728621]  commit_tail+0x100/0x210
> [    3.728675]  drm_atomic_helper_commit+0x1c4/0x3dc
> [    3.728730]  drm_atomic_commit+0x80/0x94
> [    3.728768]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
> [    3.728821]  drm_client_modeset_commit_locked+0x8c/0x230
> [    3.728872]  drm_fb_helper_pan_display+0x164/0x3a0
> [    3.728924]  fb_pan_display+0x12c/0x1fc
> [    3.728963]  bit_update_start+0x34/0xa0
> [    3.729013]  fbcon_switch+0x678/0x920
> [    3.729058]  redraw_screen+0x17c/0x35c
> [    3.729095]  fbcon_prepare_logo+0x484/0x5bc
> [    3.729143]  fbcon_init+0x77c/0x970
> [    3.729187]  visual_init+0x14c/0x1e4
> [    3.729239]  do_bind_con_driver.isra.0+0x2c4/0x530
> [    3.729279]  do_take_over_console+0x200/0x2e0
> [    3.729317]  do_fbcon_takeover+0x90/0x120
> [    3.729363]  fbcon_fb_registered+0x14c/0x164
> [    3.729412]  register_framebuffer+0x308/0x4e0
> [    3.729451]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
> [    3.729506]  drm_fbdev_client_hotplug+0x204/0x374
> [    3.729556]  drm_fbdev_generic_setup+0xf4/0x24c
> [    3.729604]  vc4_drm_bind+0x1d4/0x1f0
> [    3.729654]  try_to_bring_up_master+0x254/0x2dc
> [    3.729709]  __component_add+0x10c/0x240
> [    3.729759]  component_add+0x18/0x24
> [    3.729807]  vc4_v3d_dev_probe+0x20/0x30
> [    3.729854]  platform_probe+0x90/0x110
> [    3.729907]  really_probe+0x148/0x744
> [    3.729952]  driver_probe_device+0x8c/0xfc
> [    3.729998]  __device_attach_driver+0x120/0x180
> [    3.730048]  bus_for_each_drv+0xf4/0x15c
> [    3.730091]  __device_attach+0x168/0x250
> [    3.730137]  device_initial_probe+0x18/0x24
> [    3.730186]  bus_probe_device+0xec/0x100
> [    3.730230]  deferred_probe_work_func+0xe8/0x130
> [    3.730279]  process_one_work+0x3b8/0x650
> [    3.730319]  worker_thread+0x3cc/0x72c
> [    3.730356]  kthread+0x21c/0x224
> [    3.730402]  ret_from_fork+0x10/0x38
> [    3.730442]
> [    3.730453] Allocated by task 7:
> [    3.730470]  kasan_save_stack+0x2c/0x60
> [    3.730526]  __kasan_kmalloc+0x90/0xb4
> [    3.730577]  vc4_hvs_channels_duplicate_state+0x60/0x1a0
> [    3.730637]  drm_atomic_get_private_obj_state+0x144/0x230
> [    3.730680]  vc4_atomic_check+0x40/0x73c
> [    3.730732]  drm_atomic_check_only+0x998/0xe60
> [    3.730769]  drm_atomic_commit+0x34/0x94
> [    3.730804]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
> [    3.730854]  drm_client_modeset_commit_locked+0x8c/0x230
> [    3.730904]  drm_client_modeset_commit+0x38/0x60
> [    3.730951]  drm_fb_helper_set_par+0x104/0x17c
> [    3.730998]  fbcon_init+0x43c/0x970
> [    3.731041]  visual_init+0x14c/0x1e4
> [    3.731090]  do_bind_con_driver.isra.0+0x2c4/0x530
> [    3.731128]  do_take_over_console+0x200/0x2e0
> [    3.731165]  do_fbcon_takeover+0x90/0x120
> [    3.731210]  fbcon_fb_registered+0x14c/0x164
> [    3.731258]  register_framebuffer+0x308/0x4e0
> [    3.731296]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
> [    3.731349]  drm_fbdev_client_hotplug+0x204/0x374
> [    3.731398]  drm_fbdev_generic_setup+0xf4/0x24c
> [    3.731446]  vc4_drm_bind+0x1d4/0x1f0
> [    3.731493]  try_to_bring_up_master+0x254/0x2dc
> [    3.731546]  __component_add+0x10c/0x240
> [    3.731594]  component_add+0x18/0x24
> [    3.731642]  vc4_v3d_dev_probe+0x20/0x30
> [    3.731686]  platform_probe+0x90/0x110
> [    3.731737]  really_probe+0x148/0x744
> [    3.731781]  driver_probe_device+0x8c/0xfc
> [    3.731827]  __device_attach_driver+0x120/0x180
> [    3.731875]  bus_for_each_drv+0xf4/0x15c
> [    3.731916]  __device_attach+0x168/0x250
> [    3.731962]  device_initial_probe+0x18/0x24
> [    3.732009]  bus_probe_device+0xec/0x100
> [    3.732052]  deferred_probe_work_func+0xe8/0x130
> [    3.732100]  process_one_work+0x3b8/0x650
> [    3.732137]  worker_thread+0x3cc/0x72c
> [    3.732172]  kthread+0x21c/0x224
> [    3.732215]  ret_from_fork+0x10/0x38
> [    3.732253]
> [    3.732262] The buggy address belongs to the object at ffff000007360400
> [    3.732262]  which belongs to the cache kmalloc-128 of size 128
> [    3.732293] The buggy address is located 64 bytes inside of
> [    3.732293]  128-byte region [ffff000007360400, ffff000007360480)
> [    3.732329] The buggy address belongs to the page:
> [    3.732344] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7360
> [    3.732380] flags: 0x3fffc0000000200(slab|node=0|zone=0|lastcpupid=0xffff)
> [    3.732442] raw: 03fffc0000000200 dead000000000100 dead000000000122 ffff000004c02300
> [    3.732478] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> [    3.732501] page dumped because: kasan: bad access detected
> [    3.732518]
> [    3.732527] Memory state around the buggy address:
> [    3.732549]  ffff000007360300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    3.732579]  ffff000007360380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    3.732608] >ffff000007360400: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
> [    3.732629]                                            ^
> [    3.732652]  ffff000007360480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    3.732682]  ffff000007360500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    3.732703] ==================================================================
> [    3.732718] Disabling lock debugging due to kernel taint
> [    3.769129] Console: switching to colour frame buffer device 90x30
> [    5.148699] vc4-drm soc:gpu: [drm] fb0: vc4drmfb frame buffer device

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07 13:39                   ` Will Deacon
@ 2021-06-07 13:56                     ` Mark Rutland
  -1 siblings, 0 replies; 73+ messages in thread
From: Mark Rutland @ 2021-06-07 13:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: emma, mripard, Marek Szyprowski, linux-arm-kernel, kernel-team,
	Catalin Marinas, Ard Biesheuvel, Arnd Bergmann,
	Vincent Whitchurch, Bartlomiej Zolnierkiewicz, dri-devel

On Mon, Jun 07, 2021 at 02:39:54PM +0100, Will Deacon wrote:
> [Adding VC4 folks -- please see the KASAN splat below!]
> 
> Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
> -next) is causing vc4 to hang on Rpi3b due to a probable driver bug.
> 
> Will
> 
> On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > > > I've just checked with the latest firmware from 
> > > > https://github.com/raspberrypi/firmware (master branch, just copied 
> > > > everything to /boot) and the issue is still there.
> > > > 
> > > > If you start from arm64/defconfig without modules, please make sure you 
> > > > have enabled all RPi drivers, otherwise VC4 DRM won't come up. I've 
> > > > managed to reproduce the issue without the modules with the following 
> > > > changes to arm64's defconfig:
> > > > 
> > > > ./scripts/config -e DRM -e DRM_VC4 -e CONFIG_CLK_RASPBERRYPI -e 
> > > > CONFIG_SENSORS_RASPBERRYPI_HWMON -e CONFIG_I2C_BCM2835 -e 
> > > > CONFIG_ARM_RASPBERRYPI_CPUFREQ
> > > 
> > > Thanks for this!
> > > 
> > > With that config on commit 65688d2a05deb9f0 I also see a hang at the end
> > > of boot, but before reaching userspace, with the last messages in dmesg
> > > as below.
> > > 
> > > I'll go check that the ARCH_DMA_MINALIGN affects this, then I'll go play
> > > with debug options.
> > 
> > I can confirm that with the ARCH_DMA_MINALIGN change reverted, the hang
> > goes away. Running with that reverted andwith KASAN, I get the
> > slab-out-of-bounds splat below, which occurs at the time the hang would
> > otherwise occur, and is possibly the problem:
> > 
> > [    3.609515] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
> > [    3.621451] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
> > [    3.628344] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
> > [    3.635904] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
> > [    3.643351] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
> > [    3.651238] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
> > [    3.659167] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
> > [    3.666499] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
> > [    3.688560] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> > [    3.728010] ==================================================================
> > [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> > [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7

FWIW, faddr2line tells me this is:

[mark@lakrids:~/src/linux]% ./scripts/faddr2line vmlinux vc4_atomic_commit_tail+0x1cc/0x910
vc4_atomic_commit_tail+0x1cc/0x910:
vc4_atomic_commit_tail at drivers/gpu/drm/vc4/vc4_kms.c:375

... which is:

| ret = drm_crtc_commit_wait(old_hvs_state->fifo_state[i].pending_commit);

Thanks,
Mark.

> > [    3.728153]
> > [    3.728169] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.13.0-rc3-00009-g694c523e7267 #3
> > [    3.728203] Hardware name: Raspberry Pi 3 Model B (DT)
> > [    3.728225] Workqueue: events_unbound deferred_probe_work_func
> > [    3.728290] Call trace:
> > [    3.728301]  dump_backtrace+0x0/0x2b4
> > [    3.728358]  show_stack+0x1c/0x30
> > [    3.728407]  dump_stack+0xfc/0x168
> > [    3.728445]  print_address_description.constprop.0+0x2c/0x2c0
> > [    3.728495]  kasan_report+0x1dc/0x240
> > [    3.728529]  __asan_load8+0x98/0xd4
> > [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
> > [    3.728621]  commit_tail+0x100/0x210
> > [    3.728675]  drm_atomic_helper_commit+0x1c4/0x3dc
> > [    3.728730]  drm_atomic_commit+0x80/0x94
> > [    3.728768]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
> > [    3.728821]  drm_client_modeset_commit_locked+0x8c/0x230
> > [    3.728872]  drm_fb_helper_pan_display+0x164/0x3a0
> > [    3.728924]  fb_pan_display+0x12c/0x1fc
> > [    3.728963]  bit_update_start+0x34/0xa0
> > [    3.729013]  fbcon_switch+0x678/0x920
> > [    3.729058]  redraw_screen+0x17c/0x35c
> > [    3.729095]  fbcon_prepare_logo+0x484/0x5bc
> > [    3.729143]  fbcon_init+0x77c/0x970
> > [    3.729187]  visual_init+0x14c/0x1e4
> > [    3.729239]  do_bind_con_driver.isra.0+0x2c4/0x530
> > [    3.729279]  do_take_over_console+0x200/0x2e0
> > [    3.729317]  do_fbcon_takeover+0x90/0x120
> > [    3.729363]  fbcon_fb_registered+0x14c/0x164
> > [    3.729412]  register_framebuffer+0x308/0x4e0
> > [    3.729451]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
> > [    3.729506]  drm_fbdev_client_hotplug+0x204/0x374
> > [    3.729556]  drm_fbdev_generic_setup+0xf4/0x24c
> > [    3.729604]  vc4_drm_bind+0x1d4/0x1f0
> > [    3.729654]  try_to_bring_up_master+0x254/0x2dc
> > [    3.729709]  __component_add+0x10c/0x240
> > [    3.729759]  component_add+0x18/0x24
> > [    3.729807]  vc4_v3d_dev_probe+0x20/0x30
> > [    3.729854]  platform_probe+0x90/0x110
> > [    3.729907]  really_probe+0x148/0x744
> > [    3.729952]  driver_probe_device+0x8c/0xfc
> > [    3.729998]  __device_attach_driver+0x120/0x180
> > [    3.730048]  bus_for_each_drv+0xf4/0x15c
> > [    3.730091]  __device_attach+0x168/0x250
> > [    3.730137]  device_initial_probe+0x18/0x24
> > [    3.730186]  bus_probe_device+0xec/0x100
> > [    3.730230]  deferred_probe_work_func+0xe8/0x130
> > [    3.730279]  process_one_work+0x3b8/0x650
> > [    3.730319]  worker_thread+0x3cc/0x72c
> > [    3.730356]  kthread+0x21c/0x224
> > [    3.730402]  ret_from_fork+0x10/0x38
> > [    3.730442]
> > [    3.730453] Allocated by task 7:
> > [    3.730470]  kasan_save_stack+0x2c/0x60
> > [    3.730526]  __kasan_kmalloc+0x90/0xb4
> > [    3.730577]  vc4_hvs_channels_duplicate_state+0x60/0x1a0
> > [    3.730637]  drm_atomic_get_private_obj_state+0x144/0x230
> > [    3.730680]  vc4_atomic_check+0x40/0x73c
> > [    3.730732]  drm_atomic_check_only+0x998/0xe60
> > [    3.730769]  drm_atomic_commit+0x34/0x94
> > [    3.730804]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
> > [    3.730854]  drm_client_modeset_commit_locked+0x8c/0x230
> > [    3.730904]  drm_client_modeset_commit+0x38/0x60
> > [    3.730951]  drm_fb_helper_set_par+0x104/0x17c
> > [    3.730998]  fbcon_init+0x43c/0x970
> > [    3.731041]  visual_init+0x14c/0x1e4
> > [    3.731090]  do_bind_con_driver.isra.0+0x2c4/0x530
> > [    3.731128]  do_take_over_console+0x200/0x2e0
> > [    3.731165]  do_fbcon_takeover+0x90/0x120
> > [    3.731210]  fbcon_fb_registered+0x14c/0x164
> > [    3.731258]  register_framebuffer+0x308/0x4e0
> > [    3.731296]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
> > [    3.731349]  drm_fbdev_client_hotplug+0x204/0x374
> > [    3.731398]  drm_fbdev_generic_setup+0xf4/0x24c
> > [    3.731446]  vc4_drm_bind+0x1d4/0x1f0
> > [    3.731493]  try_to_bring_up_master+0x254/0x2dc
> > [    3.731546]  __component_add+0x10c/0x240
> > [    3.731594]  component_add+0x18/0x24
> > [    3.731642]  vc4_v3d_dev_probe+0x20/0x30
> > [    3.731686]  platform_probe+0x90/0x110
> > [    3.731737]  really_probe+0x148/0x744
> > [    3.731781]  driver_probe_device+0x8c/0xfc
> > [    3.731827]  __device_attach_driver+0x120/0x180
> > [    3.731875]  bus_for_each_drv+0xf4/0x15c
> > [    3.731916]  __device_attach+0x168/0x250
> > [    3.731962]  device_initial_probe+0x18/0x24
> > [    3.732009]  bus_probe_device+0xec/0x100
> > [    3.732052]  deferred_probe_work_func+0xe8/0x130
> > [    3.732100]  process_one_work+0x3b8/0x650
> > [    3.732137]  worker_thread+0x3cc/0x72c
> > [    3.732172]  kthread+0x21c/0x224
> > [    3.732215]  ret_from_fork+0x10/0x38
> > [    3.732253]
> > [    3.732262] The buggy address belongs to the object at ffff000007360400
> > [    3.732262]  which belongs to the cache kmalloc-128 of size 128
> > [    3.732293] The buggy address is located 64 bytes inside of
> > [    3.732293]  128-byte region [ffff000007360400, ffff000007360480)
> > [    3.732329] The buggy address belongs to the page:
> > [    3.732344] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7360
> > [    3.732380] flags: 0x3fffc0000000200(slab|node=0|zone=0|lastcpupid=0xffff)
> > [    3.732442] raw: 03fffc0000000200 dead000000000100 dead000000000122 ffff000004c02300
> > [    3.732478] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> > [    3.732501] page dumped because: kasan: bad access detected
> > [    3.732518]
> > [    3.732527] Memory state around the buggy address:
> > [    3.732549]  ffff000007360300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [    3.732579]  ffff000007360380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [    3.732608] >ffff000007360400: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
> > [    3.732629]                                            ^
> > [    3.732652]  ffff000007360480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [    3.732682]  ffff000007360500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [    3.732703] ==================================================================
> > [    3.732718] Disabling lock debugging due to kernel taint
> > [    3.769129] Console: switching to colour frame buffer device 90x30
> > [    5.148699] vc4-drm soc:gpu: [drm] fb0: vc4drmfb frame buffer device

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-06-07 13:56                     ` Mark Rutland
  0 siblings, 0 replies; 73+ messages in thread
From: Mark Rutland @ 2021-06-07 13:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnd Bergmann, emma, Bartlomiej Zolnierkiewicz, Catalin Marinas,
	Vincent Whitchurch, dri-devel, kernel-team, Ard Biesheuvel,
	linux-arm-kernel, Marek Szyprowski

On Mon, Jun 07, 2021 at 02:39:54PM +0100, Will Deacon wrote:
> [Adding VC4 folks -- please see the KASAN splat below!]
> 
> Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
> -next) is causing vc4 to hang on Rpi3b due to a probable driver bug.
> 
> Will
> 
> On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > > > I've just checked with the latest firmware from 
> > > > https://github.com/raspberrypi/firmware (master branch, just copied 
> > > > everything to /boot) and the issue is still there.
> > > > 
> > > > If you start from arm64/defconfig without modules, please make sure you 
> > > > have enabled all RPi drivers, otherwise VC4 DRM won't come up. I've 
> > > > managed to reproduce the issue without the modules with the following 
> > > > changes to arm64's defconfig:
> > > > 
> > > > ./scripts/config -e DRM -e DRM_VC4 -e CONFIG_CLK_RASPBERRYPI -e 
> > > > CONFIG_SENSORS_RASPBERRYPI_HWMON -e CONFIG_I2C_BCM2835 -e 
> > > > CONFIG_ARM_RASPBERRYPI_CPUFREQ
> > > 
> > > Thanks for this!
> > > 
> > > With that config on commit 65688d2a05deb9f0 I also see a hang at the end
> > > of boot, but before reaching userspace, with the last messages in dmesg
> > > as below.
> > > 
> > > I'll go check that the ARCH_DMA_MINALIGN affects this, then I'll go play
> > > with debug options.
> > 
> > I can confirm that with the ARCH_DMA_MINALIGN change reverted, the hang
> > goes away. Running with that reverted andwith KASAN, I get the
> > slab-out-of-bounds splat below, which occurs at the time the hang would
> > otherwise occur, and is possibly the problem:
> > 
> > [    3.609515] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
> > [    3.621451] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
> > [    3.628344] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
> > [    3.635904] vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
> > [    3.643351] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
> > [    3.651238] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
> > [    3.659167] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
> > [    3.666499] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
> > [    3.688560] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> > [    3.728010] ==================================================================
> > [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> > [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7

FWIW, faddr2line tells me this is:

[mark@lakrids:~/src/linux]% ./scripts/faddr2line vmlinux vc4_atomic_commit_tail+0x1cc/0x910
vc4_atomic_commit_tail+0x1cc/0x910:
vc4_atomic_commit_tail at drivers/gpu/drm/vc4/vc4_kms.c:375

... which is:

| ret = drm_crtc_commit_wait(old_hvs_state->fifo_state[i].pending_commit);

Thanks,
Mark.

> > [    3.728153]
> > [    3.728169] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.13.0-rc3-00009-g694c523e7267 #3
> > [    3.728203] Hardware name: Raspberry Pi 3 Model B (DT)
> > [    3.728225] Workqueue: events_unbound deferred_probe_work_func
> > [    3.728290] Call trace:
> > [    3.728301]  dump_backtrace+0x0/0x2b4
> > [    3.728358]  show_stack+0x1c/0x30
> > [    3.728407]  dump_stack+0xfc/0x168
> > [    3.728445]  print_address_description.constprop.0+0x2c/0x2c0
> > [    3.728495]  kasan_report+0x1dc/0x240
> > [    3.728529]  __asan_load8+0x98/0xd4
> > [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
> > [    3.728621]  commit_tail+0x100/0x210
> > [    3.728675]  drm_atomic_helper_commit+0x1c4/0x3dc
> > [    3.728730]  drm_atomic_commit+0x80/0x94
> > [    3.728768]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
> > [    3.728821]  drm_client_modeset_commit_locked+0x8c/0x230
> > [    3.728872]  drm_fb_helper_pan_display+0x164/0x3a0
> > [    3.728924]  fb_pan_display+0x12c/0x1fc
> > [    3.728963]  bit_update_start+0x34/0xa0
> > [    3.729013]  fbcon_switch+0x678/0x920
> > [    3.729058]  redraw_screen+0x17c/0x35c
> > [    3.729095]  fbcon_prepare_logo+0x484/0x5bc
> > [    3.729143]  fbcon_init+0x77c/0x970
> > [    3.729187]  visual_init+0x14c/0x1e4
> > [    3.729239]  do_bind_con_driver.isra.0+0x2c4/0x530
> > [    3.729279]  do_take_over_console+0x200/0x2e0
> > [    3.729317]  do_fbcon_takeover+0x90/0x120
> > [    3.729363]  fbcon_fb_registered+0x14c/0x164
> > [    3.729412]  register_framebuffer+0x308/0x4e0
> > [    3.729451]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
> > [    3.729506]  drm_fbdev_client_hotplug+0x204/0x374
> > [    3.729556]  drm_fbdev_generic_setup+0xf4/0x24c
> > [    3.729604]  vc4_drm_bind+0x1d4/0x1f0
> > [    3.729654]  try_to_bring_up_master+0x254/0x2dc
> > [    3.729709]  __component_add+0x10c/0x240
> > [    3.729759]  component_add+0x18/0x24
> > [    3.729807]  vc4_v3d_dev_probe+0x20/0x30
> > [    3.729854]  platform_probe+0x90/0x110
> > [    3.729907]  really_probe+0x148/0x744
> > [    3.729952]  driver_probe_device+0x8c/0xfc
> > [    3.729998]  __device_attach_driver+0x120/0x180
> > [    3.730048]  bus_for_each_drv+0xf4/0x15c
> > [    3.730091]  __device_attach+0x168/0x250
> > [    3.730137]  device_initial_probe+0x18/0x24
> > [    3.730186]  bus_probe_device+0xec/0x100
> > [    3.730230]  deferred_probe_work_func+0xe8/0x130
> > [    3.730279]  process_one_work+0x3b8/0x650
> > [    3.730319]  worker_thread+0x3cc/0x72c
> > [    3.730356]  kthread+0x21c/0x224
> > [    3.730402]  ret_from_fork+0x10/0x38
> > [    3.730442]
> > [    3.730453] Allocated by task 7:
> > [    3.730470]  kasan_save_stack+0x2c/0x60
> > [    3.730526]  __kasan_kmalloc+0x90/0xb4
> > [    3.730577]  vc4_hvs_channels_duplicate_state+0x60/0x1a0
> > [    3.730637]  drm_atomic_get_private_obj_state+0x144/0x230
> > [    3.730680]  vc4_atomic_check+0x40/0x73c
> > [    3.730732]  drm_atomic_check_only+0x998/0xe60
> > [    3.730769]  drm_atomic_commit+0x34/0x94
> > [    3.730804]  drm_client_modeset_commit_atomic+0x2f4/0x3a0
> > [    3.730854]  drm_client_modeset_commit_locked+0x8c/0x230
> > [    3.730904]  drm_client_modeset_commit+0x38/0x60
> > [    3.730951]  drm_fb_helper_set_par+0x104/0x17c
> > [    3.730998]  fbcon_init+0x43c/0x970
> > [    3.731041]  visual_init+0x14c/0x1e4
> > [    3.731090]  do_bind_con_driver.isra.0+0x2c4/0x530
> > [    3.731128]  do_take_over_console+0x200/0x2e0
> > [    3.731165]  do_fbcon_takeover+0x90/0x120
> > [    3.731210]  fbcon_fb_registered+0x14c/0x164
> > [    3.731258]  register_framebuffer+0x308/0x4e0
> > [    3.731296]  __drm_fb_helper_initial_config_and_unlock+0x538/0x7d0
> > [    3.731349]  drm_fbdev_client_hotplug+0x204/0x374
> > [    3.731398]  drm_fbdev_generic_setup+0xf4/0x24c
> > [    3.731446]  vc4_drm_bind+0x1d4/0x1f0
> > [    3.731493]  try_to_bring_up_master+0x254/0x2dc
> > [    3.731546]  __component_add+0x10c/0x240
> > [    3.731594]  component_add+0x18/0x24
> > [    3.731642]  vc4_v3d_dev_probe+0x20/0x30
> > [    3.731686]  platform_probe+0x90/0x110
> > [    3.731737]  really_probe+0x148/0x744
> > [    3.731781]  driver_probe_device+0x8c/0xfc
> > [    3.731827]  __device_attach_driver+0x120/0x180
> > [    3.731875]  bus_for_each_drv+0xf4/0x15c
> > [    3.731916]  __device_attach+0x168/0x250
> > [    3.731962]  device_initial_probe+0x18/0x24
> > [    3.732009]  bus_probe_device+0xec/0x100
> > [    3.732052]  deferred_probe_work_func+0xe8/0x130
> > [    3.732100]  process_one_work+0x3b8/0x650
> > [    3.732137]  worker_thread+0x3cc/0x72c
> > [    3.732172]  kthread+0x21c/0x224
> > [    3.732215]  ret_from_fork+0x10/0x38
> > [    3.732253]
> > [    3.732262] The buggy address belongs to the object at ffff000007360400
> > [    3.732262]  which belongs to the cache kmalloc-128 of size 128
> > [    3.732293] The buggy address is located 64 bytes inside of
> > [    3.732293]  128-byte region [ffff000007360400, ffff000007360480)
> > [    3.732329] The buggy address belongs to the page:
> > [    3.732344] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7360
> > [    3.732380] flags: 0x3fffc0000000200(slab|node=0|zone=0|lastcpupid=0xffff)
> > [    3.732442] raw: 03fffc0000000200 dead000000000100 dead000000000122 ffff000004c02300
> > [    3.732478] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> > [    3.732501] page dumped because: kasan: bad access detected
> > [    3.732518]
> > [    3.732527] Memory state around the buggy address:
> > [    3.732549]  ffff000007360300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [    3.732579]  ffff000007360380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [    3.732608] >ffff000007360400: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
> > [    3.732629]                                            ^
> > [    3.732652]  ffff000007360480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [    3.732682]  ffff000007360500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [    3.732703] ==================================================================
> > [    3.732718] Disabling lock debugging due to kernel taint
> > [    3.769129] Console: switching to colour frame buffer device 90x30
> > [    5.148699] vc4-drm soc:gpu: [drm] fb0: vc4drmfb frame buffer device

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07 13:39                   ` Will Deacon
@ 2021-06-07 13:57                     ` Arnd Bergmann
  -1 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-06-07 13:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, emma, mripard, Marek Szyprowski, linux-arm-kernel,
	kernel-team, Catalin Marinas, Ard Biesheuvel, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz, dri-devel

On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
>
> [Adding VC4 folks -- please see the KASAN splat below!]
>
> Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
> -next) is causing vc4 to hang on Rpi3b due to a probable driver bug.

The great news for the patch that caused it is that this has nothing to
do with DMA alignment.

> On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:

> > [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> > [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7

This is offset 0x40 into struct vc4_hvs_state, which is the
'pending_commit' pointer
for the array index 4, i.e. one after the end of the structure.

> > [    3.728495]  kasan_report+0x1dc/0x240
> > [    3.728529]  __asan_load8+0x98/0xd4
> > [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910

It seems to be this loop:

        for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) {
                struct vc4_crtc_state *vc4_crtc_state =
                        to_vc4_crtc_state(old_crtc_state);
                unsigned int channel = vc4_crtc_state->assigned_channel;
                int ret;

                if (channel == VC4_HVS_CHANNEL_DISABLED)
                        continue;

                if (!old_hvs_state->fifo_state[channel].in_use)
                        continue;

                ret =
drm_crtc_commit_wait(old_hvs_state->fifo_state[i].pending_commit);
                if (ret)
                        drm_err(dev, "Timed out waiting for commit\n");
        }

I notice that it checks index 'fifos_state[channel].in_use', but then
uses a different index 'i' for looking at the 'pending_commit' field
beyond the end of the array.

This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
 ("drm/vc4: kms: Wait on previous FIFO users before a commit").

    Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-06-07 13:57                     ` Arnd Bergmann
  0 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-06-07 13:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, emma, Bartlomiej Zolnierkiewicz, Catalin Marinas,
	Vincent Whitchurch, dri-devel, kernel-team, Ard Biesheuvel,
	linux-arm-kernel, Marek Szyprowski

On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
>
> [Adding VC4 folks -- please see the KASAN splat below!]
>
> Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
> -next) is causing vc4 to hang on Rpi3b due to a probable driver bug.

The great news for the patch that caused it is that this has nothing to
do with DMA alignment.

> On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:

> > [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> > [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7

This is offset 0x40 into struct vc4_hvs_state, which is the
'pending_commit' pointer
for the array index 4, i.e. one after the end of the structure.

> > [    3.728495]  kasan_report+0x1dc/0x240
> > [    3.728529]  __asan_load8+0x98/0xd4
> > [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910

It seems to be this loop:

        for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) {
                struct vc4_crtc_state *vc4_crtc_state =
                        to_vc4_crtc_state(old_crtc_state);
                unsigned int channel = vc4_crtc_state->assigned_channel;
                int ret;

                if (channel == VC4_HVS_CHANNEL_DISABLED)
                        continue;

                if (!old_hvs_state->fifo_state[channel].in_use)
                        continue;

                ret =
drm_crtc_commit_wait(old_hvs_state->fifo_state[i].pending_commit);
                if (ret)
                        drm_err(dev, "Timed out waiting for commit\n");
        }

I notice that it checks index 'fifos_state[channel].in_use', but then
uses a different index 'i' for looking at the 'pending_commit' field
beyond the end of the array.

This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
 ("drm/vc4: kms: Wait on previous FIFO users before a commit").

    Arnd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07 13:57                     ` Arnd Bergmann
@ 2021-06-07 15:17                       ` Maxime Ripard
  -1 siblings, 0 replies; 73+ messages in thread
From: Maxime Ripard @ 2021-06-07 15:17 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, Mark Rutland, emma, Marek Szyprowski,
	linux-arm-kernel, kernel-team, Catalin Marinas, Ard Biesheuvel,
	Vincent Whitchurch, Bartlomiej Zolnierkiewicz, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2482 bytes --]

On Mon, Jun 07, 2021 at 03:57:41PM +0200, Arnd Bergmann wrote:
> On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
> >
> > [Adding VC4 folks -- please see the KASAN splat below!]
> >
> > Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
> > -next) is causing vc4 to hang on Rpi3b due to a probable driver bug.
> 
> The great news for the patch that caused it is that this has nothing to
> do with DMA alignment.
> 
> > On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> 
> > > [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> > > [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7
> 
> This is offset 0x40 into struct vc4_hvs_state, which is the
> 'pending_commit' pointer
> for the array index 4, i.e. one after the end of the structure.
> 
> > > [    3.728495]  kasan_report+0x1dc/0x240
> > > [    3.728529]  __asan_load8+0x98/0xd4
> > > [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
> 
> It seems to be this loop:
> 
>         for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) {
>                 struct vc4_crtc_state *vc4_crtc_state =
>                         to_vc4_crtc_state(old_crtc_state);
>                 unsigned int channel = vc4_crtc_state->assigned_channel;
>                 int ret;
> 
>                 if (channel == VC4_HVS_CHANNEL_DISABLED)
>                         continue;
> 
>                 if (!old_hvs_state->fifo_state[channel].in_use)
>                         continue;
> 
>                 ret =
> drm_crtc_commit_wait(old_hvs_state->fifo_state[i].pending_commit);
>                 if (ret)
>                         drm_err(dev, "Timed out waiting for commit\n");
>         }
> 
> I notice that it checks index 'fifos_state[channel].in_use', but then
> uses a different index 'i' for looking at the 'pending_commit' field
> beyond the end of the array.
> 
> This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
>  ("drm/vc4: kms: Wait on previous FIFO users before a commit").

Awesome, I tried to find out that bug a few weeks ago but couldn't
reproduce the KASAN spat. You're right, it should be channel here
instead of i. Since you did the whole work, do you want to send the
patch?

maxime

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-06-07 15:17                       ` Maxime Ripard
  0 siblings, 0 replies; 73+ messages in thread
From: Maxime Ripard @ 2021-06-07 15:17 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Rutland, emma, Bartlomiej Zolnierkiewicz, Will Deacon,
	Vincent Whitchurch, dri-devel, Catalin Marinas, kernel-team,
	Ard Biesheuvel, linux-arm-kernel, Marek Szyprowski

[-- Attachment #1: Type: text/plain, Size: 2482 bytes --]

On Mon, Jun 07, 2021 at 03:57:41PM +0200, Arnd Bergmann wrote:
> On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
> >
> > [Adding VC4 folks -- please see the KASAN splat below!]
> >
> > Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
> > -next) is causing vc4 to hang on Rpi3b due to a probable driver bug.
> 
> The great news for the patch that caused it is that this has nothing to
> do with DMA alignment.
> 
> > On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> 
> > > [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> > > [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7
> 
> This is offset 0x40 into struct vc4_hvs_state, which is the
> 'pending_commit' pointer
> for the array index 4, i.e. one after the end of the structure.
> 
> > > [    3.728495]  kasan_report+0x1dc/0x240
> > > [    3.728529]  __asan_load8+0x98/0xd4
> > > [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
> 
> It seems to be this loop:
> 
>         for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) {
>                 struct vc4_crtc_state *vc4_crtc_state =
>                         to_vc4_crtc_state(old_crtc_state);
>                 unsigned int channel = vc4_crtc_state->assigned_channel;
>                 int ret;
> 
>                 if (channel == VC4_HVS_CHANNEL_DISABLED)
>                         continue;
> 
>                 if (!old_hvs_state->fifo_state[channel].in_use)
>                         continue;
> 
>                 ret =
> drm_crtc_commit_wait(old_hvs_state->fifo_state[i].pending_commit);
>                 if (ret)
>                         drm_err(dev, "Timed out waiting for commit\n");
>         }
> 
> I notice that it checks index 'fifos_state[channel].in_use', but then
> uses a different index 'i' for looking at the 'pending_commit' field
> beyond the end of the array.
> 
> This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
>  ("drm/vc4: kms: Wait on previous FIFO users before a commit").

Awesome, I tried to find out that bug a few weeks ago but couldn't
reproduce the KASAN spat. You're right, it should be channel here
instead of i. Since you did the whole work, do you want to send the
patch?

maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07 13:57                     ` Arnd Bergmann
@ 2021-06-07 15:32                       ` Mark Rutland
  -1 siblings, 0 replies; 73+ messages in thread
From: Mark Rutland @ 2021-06-07 15:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, emma, mripard, Marek Szyprowski, linux-arm-kernel,
	kernel-team, Catalin Marinas, Ard Biesheuvel, Vincent Whitchurch,
	Bartlomiej Zolnierkiewicz, dri-devel

On Mon, Jun 07, 2021 at 03:57:41PM +0200, Arnd Bergmann wrote:
> On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
> >
> > [Adding VC4 folks -- please see the KASAN splat below!]
> >
> > Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
> > -next) is causing vc4 to hang on Rpi3b due to a probable driver bug.
> 
> The great news for the patch that caused it is that this has nothing to
> do with DMA alignment.
> 
> > On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> 
> > > [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> > > [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7
> 
> This is offset 0x40 into struct vc4_hvs_state, which is the
> 'pending_commit' pointer
> for the array index 4, i.e. one after the end of the structure.
> 
> > > [    3.728495]  kasan_report+0x1dc/0x240
> > > [    3.728529]  __asan_load8+0x98/0xd4
> > > [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
> 
> It seems to be this loop:
> 
>         for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) {
>                 struct vc4_crtc_state *vc4_crtc_state =
>                         to_vc4_crtc_state(old_crtc_state);
>                 unsigned int channel = vc4_crtc_state->assigned_channel;
>                 int ret;
> 
>                 if (channel == VC4_HVS_CHANNEL_DISABLED)
>                         continue;
> 
>                 if (!old_hvs_state->fifo_state[channel].in_use)
>                         continue;
> 
>                 ret =
> drm_crtc_commit_wait(old_hvs_state->fifo_state[i].pending_commit);
>                 if (ret)
>                         drm_err(dev, "Timed out waiting for commit\n");
>         }
> 
> I notice that it checks index 'fifos_state[channel].in_use', but then
> uses a different index 'i' for looking at the 'pending_commit' field
> beyond the end of the array.

FWIW, with that drm_crtc_commit_wait() call changed to:

| ret = drm_crtc_commit_wait(old_hvs_state->fifo_state[channel].pending_commit);

... I no longer see a KASAN splat, and I no longer see a hang with
ARCH_DMA_MINALIGN reduced to 64.

Thanks,
Mark.

> 
> This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
>  ("drm/vc4: kms: Wait on previous FIFO users before a commit").
> 
>     Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-06-07 15:32                       ` Mark Rutland
  0 siblings, 0 replies; 73+ messages in thread
From: Mark Rutland @ 2021-06-07 15:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: emma, Bartlomiej Zolnierkiewicz, Will Deacon, Vincent Whitchurch,
	dri-devel, Catalin Marinas, kernel-team, Ard Biesheuvel,
	linux-arm-kernel, Marek Szyprowski

On Mon, Jun 07, 2021 at 03:57:41PM +0200, Arnd Bergmann wrote:
> On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
> >
> > [Adding VC4 folks -- please see the KASAN splat below!]
> >
> > Background here is that reducing ARCH_DMA_MINALIGN to 64 on arm64 (queued in
> > -next) is causing vc4 to hang on Rpi3b due to a probable driver bug.
> 
> The great news for the patch that caused it is that this has nothing to
> do with DMA alignment.
> 
> > On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> 
> > > [    3.728042] BUG: KASAN: slab-out-of-bounds in vc4_atomic_commit_tail+0x1cc/0x910
> > > [    3.728123] Read of size 8 at addr ffff000007360440 by task kworker/u8:0/7
> 
> This is offset 0x40 into struct vc4_hvs_state, which is the
> 'pending_commit' pointer
> for the array index 4, i.e. one after the end of the structure.
> 
> > > [    3.728495]  kasan_report+0x1dc/0x240
> > > [    3.728529]  __asan_load8+0x98/0xd4
> > > [    3.728565]  vc4_atomic_commit_tail+0x1cc/0x910
> 
> It seems to be this loop:
> 
>         for_each_old_crtc_in_state(state, crtc, old_crtc_state, i) {
>                 struct vc4_crtc_state *vc4_crtc_state =
>                         to_vc4_crtc_state(old_crtc_state);
>                 unsigned int channel = vc4_crtc_state->assigned_channel;
>                 int ret;
> 
>                 if (channel == VC4_HVS_CHANNEL_DISABLED)
>                         continue;
> 
>                 if (!old_hvs_state->fifo_state[channel].in_use)
>                         continue;
> 
>                 ret =
> drm_crtc_commit_wait(old_hvs_state->fifo_state[i].pending_commit);
>                 if (ret)
>                         drm_err(dev, "Timed out waiting for commit\n");
>         }
> 
> I notice that it checks index 'fifos_state[channel].in_use', but then
> uses a different index 'i' for looking at the 'pending_commit' field
> beyond the end of the array.

FWIW, with that drm_crtc_commit_wait() call changed to:

| ret = drm_crtc_commit_wait(old_hvs_state->fifo_state[channel].pending_commit);

... I no longer see a KASAN splat, and I no longer see a hang with
ARCH_DMA_MINALIGN reduced to 64.

Thanks,
Mark.

> 
> This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
>  ("drm/vc4: kms: Wait on previous FIFO users before a commit").
> 
>     Arnd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07 15:17                       ` Maxime Ripard
@ 2021-06-07 15:50                         ` Arnd Bergmann
  -1 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-06-07 15:50 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Will Deacon, Mark Rutland, emma, Marek Szyprowski,
	linux-arm-kernel, kernel-team, Catalin Marinas, Ard Biesheuvel,
	Vincent Whitchurch, Bartlomiej Zolnierkiewicz, dri-devel

On Mon, Jun 7, 2021 at 5:17 PM Maxime Ripard <maxime@cerno.tech> wrote:
> On Mon, Jun 07, 2021 at 03:57:41PM +0200, Arnd Bergmann wrote:
> > On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
> > > On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > > > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > I notice that it checks index 'fifos_state[channel].in_use', but then
> > uses a different index 'i' for looking at the 'pending_commit' field
> > beyond the end of the array.
> >
> > This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
> >  ("drm/vc4: kms: Wait on previous FIFO users before a commit").
>
> Awesome, I tried to find out that bug a few weeks ago but couldn't
> reproduce the KASAN spat. You're right, it should be channel here
> instead of i. Since you did the whole work, do you want to send the
> patch?

Marek and Mark did most of the work finding the problem, I just looked
in the right place a few times (and a bit in the wrong place). I'd suggest
you send that patch with the corresponding Reported-by/Analyzed-by/
Tested-by tags.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-06-07 15:50                         ` Arnd Bergmann
  0 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-06-07 15:50 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Mark Rutland, emma, Bartlomiej Zolnierkiewicz, Will Deacon,
	Vincent Whitchurch, dri-devel, Catalin Marinas, kernel-team,
	Ard Biesheuvel, linux-arm-kernel, Marek Szyprowski

On Mon, Jun 7, 2021 at 5:17 PM Maxime Ripard <maxime@cerno.tech> wrote:
> On Mon, Jun 07, 2021 at 03:57:41PM +0200, Arnd Bergmann wrote:
> > On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
> > > On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > > > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > I notice that it checks index 'fifos_state[channel].in_use', but then
> > uses a different index 'i' for looking at the 'pending_commit' field
> > beyond the end of the array.
> >
> > This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
> >  ("drm/vc4: kms: Wait on previous FIFO users before a commit").
>
> Awesome, I tried to find out that bug a few weeks ago but couldn't
> reproduce the KASAN spat. You're right, it should be channel here
> instead of i. Since you did the whole work, do you want to send the
> patch?

Marek and Mark did most of the work finding the problem, I just looked
in the right place a few times (and a bit in the wrong place). I'd suggest
you send that patch with the corresponding Reported-by/Analyzed-by/
Tested-by tags.

        Arnd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-06-07 15:50                         ` Arnd Bergmann
@ 2021-06-08  8:57                           ` Mark Rutland
  -1 siblings, 0 replies; 73+ messages in thread
From: Mark Rutland @ 2021-06-08  8:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Maxime Ripard, Will Deacon, emma, Marek Szyprowski,
	linux-arm-kernel, kernel-team, Catalin Marinas, Ard Biesheuvel,
	Vincent Whitchurch, Bartlomiej Zolnierkiewicz, dri-devel

On Mon, Jun 07, 2021 at 05:50:57PM +0200, Arnd Bergmann wrote:
> On Mon, Jun 7, 2021 at 5:17 PM Maxime Ripard <maxime@cerno.tech> wrote:
> > On Mon, Jun 07, 2021 at 03:57:41PM +0200, Arnd Bergmann wrote:
> > > On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > > > > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > > > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > > I notice that it checks index 'fifos_state[channel].in_use', but then
> > > uses a different index 'i' for looking at the 'pending_commit' field
> > > beyond the end of the array.
> > >
> > > This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
> > >  ("drm/vc4: kms: Wait on previous FIFO users before a commit").
> >
> > Awesome, I tried to find out that bug a few weeks ago but couldn't
> > reproduce the KASAN spat. You're right, it should be channel here
> > instead of i. Since you did the whole work, do you want to send the
> > patch?
> 
> Marek and Mark did most of the work finding the problem, I just looked
> in the right place a few times (and a bit in the wrong place). I'd suggest
> you send that patch with the corresponding Reported-by/Analyzed-by/
> Tested-by tags.

I've sent:

https://lore.kernel.org/r/20210608085513.2069-1-mark.rutland@arm.com

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-06-08  8:57                           ` Mark Rutland
  0 siblings, 0 replies; 73+ messages in thread
From: Mark Rutland @ 2021-06-08  8:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: emma, Bartlomiej Zolnierkiewicz, Will Deacon, Vincent Whitchurch,
	dri-devel, Maxime Ripard, Catalin Marinas, kernel-team,
	Ard Biesheuvel, linux-arm-kernel, Marek Szyprowski

On Mon, Jun 07, 2021 at 05:50:57PM +0200, Arnd Bergmann wrote:
> On Mon, Jun 7, 2021 at 5:17 PM Maxime Ripard <maxime@cerno.tech> wrote:
> > On Mon, Jun 07, 2021 at 03:57:41PM +0200, Arnd Bergmann wrote:
> > > On Mon, Jun 7, 2021 at 3:39 PM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Jun 07, 2021 at 02:08:59PM +0100, Mark Rutland wrote:
> > > > > On Mon, Jun 07, 2021 at 01:01:18PM +0100, Mark Rutland wrote:
> > > > > > On Mon, Jun 07, 2021 at 11:58:32AM +0200, Marek Szyprowski wrote:
> > > I notice that it checks index 'fifos_state[channel].in_use', but then
> > > uses a different index 'i' for looking at the 'pending_commit' field
> > > beyond the end of the array.
> > >
> > > This code was introduced by Maxime Ripard in commit 9ec03d7f1ed3
> > >  ("drm/vc4: kms: Wait on previous FIFO users before a commit").
> >
> > Awesome, I tried to find out that bug a few weeks ago but couldn't
> > reproduce the KASAN spat. You're right, it should be channel here
> > instead of i. Since you did the whole work, do you want to send the
> > patch?
> 
> Marek and Mark did most of the work finding the problem, I just looked
> in the right place a few times (and a bit in the wrong place). I'd suggest
> you send that patch with the corresponding Reported-by/Analyzed-by/
> Tested-by tags.

I've sent:

https://lore.kernel.org/r/20210608085513.2069-1-mark.rutland@arm.com

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06  9:26 ` Yassine Oudjana
  2021-07-06 10:26     ` Catalin Marinas
  0 siblings, 1 reply; 73+ messages in thread
From: Yassine Oudjana @ 2021-07-06  9:26 UTC (permalink / raw)
  To: will
  Cc: ardb, arnd, catalin.marinas, kernel-team, linux-arm-kernel,
	mark.rutland, vincent.whitchurch, linux-arm-msm

In-Reply-To: <20210527124356.22367-1-will@kernel.org>

> Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> indicate if there are machines that unknowingly rely on this.

The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:

------------[ cut here ]------------
rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G S                5.13.0+ #43
Hardware name: Xiaomi Mi Note 2 (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
pc : arch_setup_dma_ops+0xf8/0x10c
lr : arch_setup_dma_ops+0xf8/0x10c
sp : ffff80001003bb50
x29: ffff80001003bb50 x28: 0000000000000000 x27: ffff800010ee04c0
x26: ffff800010f41060 x25: 00000000ffffffff x24: 0000000000000000
x23: 0000000000000080 x22: 0000000000000000 x21: 0000000000000000
x20: 0000000100000000 x19: ffff00008143e010 x18: ffffffffffffffff
x17: 696c206f74205d72 x16: 656874655f675b20 x15: ffff800011212d34
x14: 0000000000000000 x13: ffff8000110d4620 x12: 0000000000001047
x11: 000000000000056d x10: ffff8000110d4620 x9 : ffff8000110d4620
x8 : 00000000ffffefff x7 : ffff80001112c620 x6 : ffff80001112c620
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000080098000
Call trace:
arch_setup_dma_ops+0xf8/0x10c
of_dma_configure_id+0x120/0x2a4
platform_dma_configure+0x24/0x40
really_probe.part.0+0x50/0x2fc
__driver_probe_device+0x98/0x144
driver_probe_device+0xc8/0x15c
__driver_attach+0xf8/0x190
bus_for_each_dev+0x70/0xd0
driver_attach+0x24/0x30
bus_add_driver+0x104/0x1ec
driver_register+0x78/0x130
__platform_driver_register+0x28/0x34
pm8xxx_rtc_driver_init+0x1c/0x28
do_one_initcall+0x50/0x1b0
kernel_init_freeable+0x214/0x298
kernel_init+0x28/0x12c
ret_from_fork+0x10/0x18
---[ end trace 09b7616a3af9b190 ]---

This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06  9:26 ` Yassine Oudjana
@ 2021-07-06 10:26     ` Catalin Marinas
  0 siblings, 0 replies; 73+ messages in thread
From: Catalin Marinas @ 2021-07-06 10:26 UTC (permalink / raw)
  To: Yassine Oudjana
  Cc: will, ardb, arnd, kernel-team, linux-arm-kernel, mark.rutland,
	vincent.whitchurch, linux-arm-msm

On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
> In-Reply-To: <20210527124356.22367-1-will@kernel.org>
> > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> > indicate if there are machines that unknowingly rely on this.
> 
> The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:

Is this booting with ACPI or DT?

> ------------[ cut here ]------------
> rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
> WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
[...]
> This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.

I have a suspicion none of the reported devices actually do any DMA, so
in practice it should be safe but we need to figure out why
arch_setup_dma_ops() gets called.

-- 
Catalin

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 10:26     ` Catalin Marinas
  0 siblings, 0 replies; 73+ messages in thread
From: Catalin Marinas @ 2021-07-06 10:26 UTC (permalink / raw)
  To: Yassine Oudjana
  Cc: will, ardb, arnd, kernel-team, linux-arm-kernel, mark.rutland,
	vincent.whitchurch, linux-arm-msm

On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
> In-Reply-To: <20210527124356.22367-1-will@kernel.org>
> > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> > indicate if there are machines that unknowingly rely on this.
> 
> The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:

Is this booting with ACPI or DT?

> ------------[ cut here ]------------
> rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
> WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
[...]
> This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.

I have a suspicion none of the reported devices actually do any DMA, so
in practice it should be safe but we need to figure out why
arch_setup_dma_ops() gets called.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 10:26     ` Catalin Marinas
@ 2021-07-06 13:29       ` Robin Murphy
  -1 siblings, 0 replies; 73+ messages in thread
From: Robin Murphy @ 2021-07-06 13:29 UTC (permalink / raw)
  To: Catalin Marinas, Yassine Oudjana
  Cc: will, ardb, arnd, kernel-team, linux-arm-kernel, mark.rutland,
	vincent.whitchurch, linux-arm-msm

On 2021-07-06 11:26, Catalin Marinas wrote:
> On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
>> In-Reply-To: <20210527124356.22367-1-will@kernel.org>
>>> Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
>>> indicate if there are machines that unknowingly rely on this.
>>
>> The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:
> 
> Is this booting with ACPI or DT?
> 
>> ------------[ cut here ]------------
>> rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
>> WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
> [...]
>> This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.
> 
> I have a suspicion none of the reported devices actually do any DMA, so
> in practice it should be safe but we need to figure out why
> arch_setup_dma_ops() gets called.

It gets called because there's no straightforward way to know that a 
platform device *isn't* DMA-capable, so we have to assume they are.

I would also assume that in a Qcom SoC there really are at least some 
things doing non-coherent DMA :(

Robin.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 13:29       ` Robin Murphy
  0 siblings, 0 replies; 73+ messages in thread
From: Robin Murphy @ 2021-07-06 13:29 UTC (permalink / raw)
  To: Catalin Marinas, Yassine Oudjana
  Cc: will, ardb, arnd, kernel-team, linux-arm-kernel, mark.rutland,
	vincent.whitchurch, linux-arm-msm

On 2021-07-06 11:26, Catalin Marinas wrote:
> On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
>> In-Reply-To: <20210527124356.22367-1-will@kernel.org>
>>> Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
>>> indicate if there are machines that unknowingly rely on this.
>>
>> The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:
> 
> Is this booting with ACPI or DT?
> 
>> ------------[ cut here ]------------
>> rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
>> WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
> [...]
>> This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.
> 
> I have a suspicion none of the reported devices actually do any DMA, so
> in practice it should be safe but we need to figure out why
> arch_setup_dma_ops() gets called.

It gets called because there's no straightforward way to know that a 
platform device *isn't* DMA-capable, so we have to assume they are.

I would also assume that in a Qcom SoC there really are at least some 
things doing non-coherent DMA :(

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 13:29       ` Robin Murphy
@ 2021-07-06 13:33         ` Will Deacon
  -1 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-07-06 13:33 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Catalin Marinas, Yassine Oudjana, ardb, arnd, kernel-team,
	linux-arm-kernel, mark.rutland, vincent.whitchurch,
	linux-arm-msm

On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
> On 2021-07-06 11:26, Catalin Marinas wrote:
> > On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
> > > In-Reply-To: <20210527124356.22367-1-will@kernel.org>
> > > > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> > > > indicate if there are machines that unknowingly rely on this.
> > > 
> > > The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:
> > 
> > Is this booting with ACPI or DT?
> > 
> > > ------------[ cut here ]------------
> > > rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
> > > WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
> > [...]
> > > This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.
> > 
> > I have a suspicion none of the reported devices actually do any DMA, so
> > in practice it should be safe but we need to figure out why
> > arch_setup_dma_ops() gets called.
> 
> It gets called because there's no straightforward way to know that a
> platform device *isn't* DMA-capable, so we have to assume they are.
> 
> I would also assume that in a Qcom SoC there really are at least some things
> doing non-coherent DMA :(

Agreed, unless this is a CPU erratum and the line size is being reported for
a cache beyond the PoC, then I think we're going to have to revert the patch
reducing ARCH_DMA_MINALIGN after all.

I can't find much information about the original Kryo core at all...

Will

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 13:33         ` Will Deacon
  0 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-07-06 13:33 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Catalin Marinas, Yassine Oudjana, ardb, arnd, kernel-team,
	linux-arm-kernel, mark.rutland, vincent.whitchurch,
	linux-arm-msm

On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
> On 2021-07-06 11:26, Catalin Marinas wrote:
> > On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
> > > In-Reply-To: <20210527124356.22367-1-will@kernel.org>
> > > > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
> > > > indicate if there are machines that unknowingly rely on this.
> > > 
> > > The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:
> > 
> > Is this booting with ACPI or DT?
> > 
> > > ------------[ cut here ]------------
> > > rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
> > > WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
> > [...]
> > > This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.
> > 
> > I have a suspicion none of the reported devices actually do any DMA, so
> > in practice it should be safe but we need to figure out why
> > arch_setup_dma_ops() gets called.
> 
> It gets called because there's no straightforward way to know that a
> platform device *isn't* DMA-capable, so we have to assume they are.
> 
> I would also assume that in a Qcom SoC there really are at least some things
> doing non-coherent DMA :(

Agreed, unless this is a CPU erratum and the line size is being reported for
a cache beyond the PoC, then I think we're going to have to revert the patch
reducing ARCH_DMA_MINALIGN after all.

I can't find much information about the original Kryo core at all...

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 13:33         ` Will Deacon
@ 2021-07-06 13:44           ` Marc Zyngier
  -1 siblings, 0 replies; 73+ messages in thread
From: Marc Zyngier @ 2021-07-06 13:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robin Murphy, Catalin Marinas, Yassine Oudjana, ardb, arnd,
	kernel-team, linux-arm-kernel, mark.rutland, vincent.whitchurch,
	linux-arm-msm

On 2021-07-06 14:33, Will Deacon wrote:
> On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
>> On 2021-07-06 11:26, Catalin Marinas wrote:
>> > On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
>> > > In-Reply-To: <20210527124356.22367-1-will@kernel.org>
>> > > > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
>> > > > indicate if there are machines that unknowingly rely on this.
>> > >
>> > > The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:
>> >
>> > Is this booting with ACPI or DT?
>> >
>> > > ------------[ cut here ]------------
>> > > rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
>> > > WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
>> > [...]
>> > > This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.
>> >
>> > I have a suspicion none of the reported devices actually do any DMA, so
>> > in practice it should be safe but we need to figure out why
>> > arch_setup_dma_ops() gets called.
>> 
>> It gets called because there's no straightforward way to know that a
>> platform device *isn't* DMA-capable, so we have to assume they are.
>> 
>> I would also assume that in a Qcom SoC there really are at least some 
>> things
>> doing non-coherent DMA :(
> 
> Agreed, unless this is a CPU erratum and the line size is being 
> reported for
> a cache beyond the PoC, then I think we're going to have to revert the 
> patch
> reducing ARCH_DMA_MINALIGN after all.
> 
> I can't find much information about the original Kryo core at all...

I have similar issues with my QDF2400. The UART, RTC and DMA controllers
are all screaming at me. I'm confident that the UART doesn't do any
DMA (it is handled by the SBSA driver), but the DMA controllers are
probably doing what it says on the tin.

Do we know whether Falkor and Kryo share any part of their design?

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 13:44           ` Marc Zyngier
  0 siblings, 0 replies; 73+ messages in thread
From: Marc Zyngier @ 2021-07-06 13:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robin Murphy, Catalin Marinas, Yassine Oudjana, ardb, arnd,
	kernel-team, linux-arm-kernel, mark.rutland, vincent.whitchurch,
	linux-arm-msm

On 2021-07-06 14:33, Will Deacon wrote:
> On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
>> On 2021-07-06 11:26, Catalin Marinas wrote:
>> > On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
>> > > In-Reply-To: <20210527124356.22367-1-will@kernel.org>
>> > > > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the warning/taint to
>> > > > indicate if there are machines that unknowingly rely on this.
>> > >
>> > > The warning is being triggered on Qualcomm MSM8996, as well as the out-of-spec taint:
>> >
>> > Is this booting with ACPI or DT?
>> >
>> > > ------------[ cut here ]------------
>> > > rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (64 < 128)
>> > > WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 arch_setup_dma_ops+0xf8/0x10c
>> > [...]
>> > > This warning is triggered with nearly every driver probe, not only rtc-pm8xxx.
>> >
>> > I have a suspicion none of the reported devices actually do any DMA, so
>> > in practice it should be safe but we need to figure out why
>> > arch_setup_dma_ops() gets called.
>> 
>> It gets called because there's no straightforward way to know that a
>> platform device *isn't* DMA-capable, so we have to assume they are.
>> 
>> I would also assume that in a Qcom SoC there really are at least some 
>> things
>> doing non-coherent DMA :(
> 
> Agreed, unless this is a CPU erratum and the line size is being 
> reported for
> a cache beyond the PoC, then I think we're going to have to revert the 
> patch
> reducing ARCH_DMA_MINALIGN after all.
> 
> I can't find much information about the original Kryo core at all...

I have similar issues with my QDF2400. The UART, RTC and DMA controllers
are all screaming at me. I'm confident that the UART doesn't do any
DMA (it is handled by the SBSA driver), but the DMA controllers are
probably doing what it says on the tin.

Do we know whether Falkor and Kryo share any part of their design?

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 13:44           ` Marc Zyngier
@ 2021-07-06 14:21             ` Robin Murphy
  -1 siblings, 0 replies; 73+ messages in thread
From: Robin Murphy @ 2021-07-06 14:21 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Catalin Marinas, Yassine Oudjana, ardb, arnd, kernel-team,
	linux-arm-kernel, mark.rutland, vincent.whitchurch,
	linux-arm-msm

On 2021-07-06 14:44, Marc Zyngier wrote:
> On 2021-07-06 14:33, Will Deacon wrote:
>> On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
>>> On 2021-07-06 11:26, Catalin Marinas wrote:
>>> > On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
>>> > > In-Reply-To: <20210527124356.22367-1-will@kernel.org>
>>> > > > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the 
>>> warning/taint to
>>> > > > indicate if there are machines that unknowingly rely on this.
>>> > >
>>> > > The warning is being triggered on Qualcomm MSM8996, as well as 
>>> the out-of-spec taint:
>>> >
>>> > Is this booting with ACPI or DT?
>>> >
>>> > > ------------[ cut here ]------------
>>> > > rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN 
>>> smaller than CTR_EL0.CWG (64 < 128)
>>> > > WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 
>>> arch_setup_dma_ops+0xf8/0x10c
>>> > [...]
>>> > > This warning is triggered with nearly every driver probe, not 
>>> only rtc-pm8xxx.
>>> >
>>> > I have a suspicion none of the reported devices actually do any 
>>> DMA, so
>>> > in practice it should be safe but we need to figure out why
>>> > arch_setup_dma_ops() gets called.
>>>
>>> It gets called because there's no straightforward way to know that a
>>> platform device *isn't* DMA-capable, so we have to assume they are.
>>>
>>> I would also assume that in a Qcom SoC there really are at least some 
>>> things
>>> doing non-coherent DMA :(
>>
>> Agreed, unless this is a CPU erratum and the line size is being 
>> reported for
>> a cache beyond the PoC, then I think we're going to have to revert the 
>> patch
>> reducing ARCH_DMA_MINALIGN after all.
>>
>> I can't find much information about the original Kryo core at all...
> 
> I have similar issues with my QDF2400. The UART, RTC and DMA controllers
> are all screaming at me. I'm confident that the UART doesn't do any
> DMA (it is handled by the SBSA driver), but the DMA controllers are
> probably doing what it says on the tin.
> 
> Do we know whether Falkor and Kryo share any part of their design?

According to [1], not literally, but some of the same people being 
involved in both could plausibly imply that some basic design decisions 
might have carried over.

Robin.

[1] 
https://www.anandtech.com/show/11737/analyzing-falkors-microarchitecture-a-deep-dive-into-qualcomms-centriq-2400-for-windows-server-and-linux

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 14:21             ` Robin Murphy
  0 siblings, 0 replies; 73+ messages in thread
From: Robin Murphy @ 2021-07-06 14:21 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Catalin Marinas, Yassine Oudjana, ardb, arnd, kernel-team,
	linux-arm-kernel, mark.rutland, vincent.whitchurch,
	linux-arm-msm

On 2021-07-06 14:44, Marc Zyngier wrote:
> On 2021-07-06 14:33, Will Deacon wrote:
>> On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
>>> On 2021-07-06 11:26, Catalin Marinas wrote:
>>> > On Tue, Jul 06, 2021 at 09:26:59AM +0000, Yassine Oudjana wrote:
>>> > > In-Reply-To: <20210527124356.22367-1-will@kernel.org>
>>> > > > Reduce ARCH_DMA_MINALIGN to 64 bytes and allow the 
>>> warning/taint to
>>> > > > indicate if there are machines that unknowingly rely on this.
>>> > >
>>> > > The warning is being triggered on Qualcomm MSM8996, as well as 
>>> the out-of-spec taint:
>>> >
>>> > Is this booting with ACPI or DT?
>>> >
>>> > > ------------[ cut here ]------------
>>> > > rtc-pm8xxx 400f000.qcom,spmi:pmic@0:rtc@6000: ARCH_DMA_MINALIGN 
>>> smaller than CTR_EL0.CWG (64 < 128)
>>> > > WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:45 
>>> arch_setup_dma_ops+0xf8/0x10c
>>> > [...]
>>> > > This warning is triggered with nearly every driver probe, not 
>>> only rtc-pm8xxx.
>>> >
>>> > I have a suspicion none of the reported devices actually do any 
>>> DMA, so
>>> > in practice it should be safe but we need to figure out why
>>> > arch_setup_dma_ops() gets called.
>>>
>>> It gets called because there's no straightforward way to know that a
>>> platform device *isn't* DMA-capable, so we have to assume they are.
>>>
>>> I would also assume that in a Qcom SoC there really are at least some 
>>> things
>>> doing non-coherent DMA :(
>>
>> Agreed, unless this is a CPU erratum and the line size is being 
>> reported for
>> a cache beyond the PoC, then I think we're going to have to revert the 
>> patch
>> reducing ARCH_DMA_MINALIGN after all.
>>
>> I can't find much information about the original Kryo core at all...
> 
> I have similar issues with my QDF2400. The UART, RTC and DMA controllers
> are all screaming at me. I'm confident that the UART doesn't do any
> DMA (it is handled by the SBSA driver), but the DMA controllers are
> probably doing what it says on the tin.
> 
> Do we know whether Falkor and Kryo share any part of their design?

According to [1], not literally, but some of the same people being 
involved in both could plausibly imply that some basic design decisions 
might have carried over.

Robin.

[1] 
https://www.anandtech.com/show/11737/analyzing-falkors-microarchitecture-a-deep-dive-into-qualcomms-centriq-2400-for-windows-server-and-linux

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 13:44           ` Marc Zyngier
@ 2021-07-06 14:30             ` Arnd Bergmann
  -1 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-07-06 14:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Robin Murphy, Catalin Marinas, Yassine Oudjana,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, Jul 6, 2021 at 3:44 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On 2021-07-06 14:33, Will Deacon wrote:
> > On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
> >
> > I can't find much information about the original Kryo core at all...
>
> I have similar issues with my QDF2400. The UART, RTC and DMA controllers
> are all screaming at me. I'm confident that the UART doesn't do any
> DMA (it is handled by the SBSA driver), but the DMA controllers are
> probably doing what it says on the tin.

But that's a server chip, surely the DMA controller is fully cache coherent
as required by SBSA? (please?)

Maybe just a misannotation on the device node?

> Do we know whether Falkor and Kryo share any part of their design?

I'm fairly sure the Snapdragon 821 / msm8996 is not cache coherent.

I can only speculate on how much got reused between the two, but
as Falkor was released only after they had already given up on
the full-custom Kryo core, it's plausible that it incorporates bits from
that one. In particular the cache controller is probably easy to reuse
even if the rest of it was a new design.

      Arnd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 14:30             ` Arnd Bergmann
  0 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-07-06 14:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Robin Murphy, Catalin Marinas, Yassine Oudjana,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, Jul 6, 2021 at 3:44 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On 2021-07-06 14:33, Will Deacon wrote:
> > On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
> >
> > I can't find much information about the original Kryo core at all...
>
> I have similar issues with my QDF2400. The UART, RTC and DMA controllers
> are all screaming at me. I'm confident that the UART doesn't do any
> DMA (it is handled by the SBSA driver), but the DMA controllers are
> probably doing what it says on the tin.

But that's a server chip, surely the DMA controller is fully cache coherent
as required by SBSA? (please?)

Maybe just a misannotation on the device node?

> Do we know whether Falkor and Kryo share any part of their design?

I'm fairly sure the Snapdragon 821 / msm8996 is not cache coherent.

I can only speculate on how much got reused between the two, but
as Falkor was released only after they had already given up on
the full-custom Kryo core, it's plausible that it incorporates bits from
that one. In particular the cache controller is probably easy to reuse
even if the rest of it was a new design.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 14:30             ` Arnd Bergmann
@ 2021-07-06 14:46               ` Marc Zyngier
  -1 siblings, 0 replies; 73+ messages in thread
From: Marc Zyngier @ 2021-07-06 14:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, Robin Murphy, Catalin Marinas, Yassine Oudjana,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, 06 Jul 2021 15:30:34 +0100,
Arnd Bergmann <arnd@arndb.de> wrote:
> 
> On Tue, Jul 6, 2021 at 3:44 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On 2021-07-06 14:33, Will Deacon wrote:
> > > On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
> > >
> > > I can't find much information about the original Kryo core at all...
> >
> > I have similar issues with my QDF2400. The UART, RTC and DMA controllers
> > are all screaming at me. I'm confident that the UART doesn't do any
> > DMA (it is handled by the SBSA driver), but the DMA controllers are
> > probably doing what it says on the tin.
> 
> But that's a server chip, surely the DMA controller is fully cache
> coherent as required by SBSA? (please?)

Remember that there is a SBSA level for each broken implementation
(even my XGene is SBSA compliant!), so that doesn't mean much.

> Maybe just a misannotation on the device node?

Maybe. But given that whoever wrote the ACPI tables made sure that
everything else was annotated as coherent, I doubt the DMA controllers
being advertised as non-coherent is an accident.

> > Do we know whether Falkor and Kryo share any part of their design?
> 
> I'm fairly sure the Snapdragon 821 / msm8996 is not cache coherent.
> 
> I can only speculate on how much got reused between the two, but
> as Falkor was released only after they had already given up on
> the full-custom Kryo core, it's plausible that it incorporates bits from
> that one. In particular the cache controller is probably easy to reuse
> even if the rest of it was a new design.

I guess we'll never find out, and I'm probably one of the few still
having some access to this HW (not even sure for how long anyway).

I won't cry if we decide to pull the plug on it.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 14:46               ` Marc Zyngier
  0 siblings, 0 replies; 73+ messages in thread
From: Marc Zyngier @ 2021-07-06 14:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, Robin Murphy, Catalin Marinas, Yassine Oudjana,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, 06 Jul 2021 15:30:34 +0100,
Arnd Bergmann <arnd@arndb.de> wrote:
> 
> On Tue, Jul 6, 2021 at 3:44 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On 2021-07-06 14:33, Will Deacon wrote:
> > > On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
> > >
> > > I can't find much information about the original Kryo core at all...
> >
> > I have similar issues with my QDF2400. The UART, RTC and DMA controllers
> > are all screaming at me. I'm confident that the UART doesn't do any
> > DMA (it is handled by the SBSA driver), but the DMA controllers are
> > probably doing what it says on the tin.
> 
> But that's a server chip, surely the DMA controller is fully cache
> coherent as required by SBSA? (please?)

Remember that there is a SBSA level for each broken implementation
(even my XGene is SBSA compliant!), so that doesn't mean much.

> Maybe just a misannotation on the device node?

Maybe. But given that whoever wrote the ACPI tables made sure that
everything else was annotated as coherent, I doubt the DMA controllers
being advertised as non-coherent is an accident.

> > Do we know whether Falkor and Kryo share any part of their design?
> 
> I'm fairly sure the Snapdragon 821 / msm8996 is not cache coherent.
> 
> I can only speculate on how much got reused between the two, but
> as Falkor was released only after they had already given up on
> the full-custom Kryo core, it's plausible that it incorporates bits from
> that one. In particular the cache controller is probably easy to reuse
> even if the rest of it was a new design.

I guess we'll never find out, and I'm probably one of the few still
having some access to this HW (not even sure for how long anyway).

I won't cry if we decide to pull the plug on it.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 14:46               ` Marc Zyngier
@ 2021-07-06 15:43                 ` Arnd Bergmann
  -1 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-07-06 15:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Robin Murphy, Catalin Marinas, Yassine Oudjana,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, Jul 6, 2021 at 4:46 PM Marc Zyngier <maz@kernel.org> wrote:
> On Tue, 06 Jul 2021 15:30:34 +0100, Arnd Bergmann <arnd@arndb.de> wrote:
> > I can only speculate on how much got reused between the two, but
> > as Falkor was released only after they had already given up on
> > the full-custom Kryo core, it's plausible that it incorporates bits from
> > that one. In particular the cache controller is probably easy to reuse
> > even if the rest of it was a new design.
>
> I guess we'll never find out, and I'm probably one of the few still
> having some access to this HW (not even sure for how long anyway).
>
> I won't cry if we decide to pull the plug on it.

Sure, but the Snapdragon 820E is one we do need to worry about.

While the internet pretty much agrees on Falkor having 128 bytes
L1 cache line, it might be good to rule out that Kryo just misreports
it before we revert the patch.

Yassine, could you run the 'line' and 'cache' helper from lmbench
to determine what the cache topology appears to be and if that
matches the CTR_EL0 contents?

Something like

numactl -C 0 line -M 1M
numactl -C 3 line -M 1M
numactl -C 0 cache
numactl -C 3 cache

(the numactl command helps run this both on the 'big' and 'little'
cores without running into migration)

      Arnd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 15:43                 ` Arnd Bergmann
  0 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-07-06 15:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Robin Murphy, Catalin Marinas, Yassine Oudjana,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, Jul 6, 2021 at 4:46 PM Marc Zyngier <maz@kernel.org> wrote:
> On Tue, 06 Jul 2021 15:30:34 +0100, Arnd Bergmann <arnd@arndb.de> wrote:
> > I can only speculate on how much got reused between the two, but
> > as Falkor was released only after they had already given up on
> > the full-custom Kryo core, it's plausible that it incorporates bits from
> > that one. In particular the cache controller is probably easy to reuse
> > even if the rest of it was a new design.
>
> I guess we'll never find out, and I'm probably one of the few still
> having some access to this HW (not even sure for how long anyway).
>
> I won't cry if we decide to pull the plug on it.

Sure, but the Snapdragon 820E is one we do need to worry about.

While the internet pretty much agrees on Falkor having 128 bytes
L1 cache line, it might be good to rule out that Kryo just misreports
it before we revert the patch.

Yassine, could you run the 'line' and 'cache' helper from lmbench
to determine what the cache topology appears to be and if that
matches the CTR_EL0 contents?

Something like

numactl -C 0 line -M 1M
numactl -C 3 line -M 1M
numactl -C 0 cache
numactl -C 3 cache

(the numactl command helps run this both on the 'big' and 'little'
cores without running into migration)

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 14:30             ` Arnd Bergmann
@ 2021-07-06 16:20               ` Will Deacon
  -1 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-07-06 16:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marc Zyngier, Robin Murphy, Catalin Marinas, Yassine Oudjana,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, Jul 06, 2021 at 04:30:34PM +0200, Arnd Bergmann wrote:
> On Tue, Jul 6, 2021 at 3:44 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On 2021-07-06 14:33, Will Deacon wrote:
> > > On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
> > >
> > > I can't find much information about the original Kryo core at all...
> >
> > I have similar issues with my QDF2400. The UART, RTC and DMA controllers
> > are all screaming at me. I'm confident that the UART doesn't do any
> > DMA (it is handled by the SBSA driver), but the DMA controllers are
> > probably doing what it says on the tin.
> 
> But that's a server chip, surely the DMA controller is fully cache coherent
> as required by SBSA? (please?)
> 
> Maybe just a misannotation on the device node?
> 
> > Do we know whether Falkor and Kryo share any part of their design?
> 
> I'm fairly sure the Snapdragon 821 / msm8996 is not cache coherent.
> 
> I can only speculate on how much got reused between the two, but
> as Falkor was released only after they had already given up on
> the full-custom Kryo core, it's plausible that it incorporates bits from
> that one. In particular the cache controller is probably easy to reuse
> even if the rest of it was a new design.

I think the million dollar question is whether the 128-byte cache-lines
live in a cache above the PoC or not. If it's just a system level cache
through which all DMA is "coherent", then it doesn't matter.

Digging around on the web, it's unclear whether msm8996 has an L3 or not.

Will

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 16:20               ` Will Deacon
  0 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-07-06 16:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marc Zyngier, Robin Murphy, Catalin Marinas, Yassine Oudjana,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, Jul 06, 2021 at 04:30:34PM +0200, Arnd Bergmann wrote:
> On Tue, Jul 6, 2021 at 3:44 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On 2021-07-06 14:33, Will Deacon wrote:
> > > On Tue, Jul 06, 2021 at 02:29:07PM +0100, Robin Murphy wrote:
> > >
> > > I can't find much information about the original Kryo core at all...
> >
> > I have similar issues with my QDF2400. The UART, RTC and DMA controllers
> > are all screaming at me. I'm confident that the UART doesn't do any
> > DMA (it is handled by the SBSA driver), but the DMA controllers are
> > probably doing what it says on the tin.
> 
> But that's a server chip, surely the DMA controller is fully cache coherent
> as required by SBSA? (please?)
> 
> Maybe just a misannotation on the device node?
> 
> > Do we know whether Falkor and Kryo share any part of their design?
> 
> I'm fairly sure the Snapdragon 821 / msm8996 is not cache coherent.
> 
> I can only speculate on how much got reused between the two, but
> as Falkor was released only after they had already given up on
> the full-custom Kryo core, it's plausible that it incorporates bits from
> that one. In particular the cache controller is probably easy to reuse
> even if the rest of it was a new design.

I think the million dollar question is whether the 128-byte cache-lines
live in a cache above the PoC or not. If it's just a system level cache
through which all DMA is "coherent", then it doesn't matter.

Digging around on the web, it's unclear whether msm8996 has an L3 or not.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 15:43                 ` Arnd Bergmann
@ 2021-07-06 17:15                   ` Yassine Oudjana
  -1 siblings, 0 replies; 73+ messages in thread
From: Yassine Oudjana @ 2021-07-06 17:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marc Zyngier, Will Deacon, Robin Murphy, Catalin Marinas,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tuesday, July 6th, 2021 at 7:43 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Jul 6, 2021 at 4:46 PM Marc Zyngier maz@kernel.org wrote:
> > On Tue, 06 Jul 2021 15:30:34 +0100, Arnd Bergmann arnd@arndb.de wrote:
> > > I can only speculate on how much got reused between the two, but
> > > as Falkor was released only after they had already given up on
> > > the full-custom Kryo core, it's plausible that it incorporates bits from
> > > that one. In particular the cache controller is probably easy to reuse
> > > even if the rest of it was a new design.
> >
> > I guess we'll never find out, and I'm probably one of the few still
> > having some access to this HW (not even sure for how long anyway).
> >
> > I won't cry if we decide to pull the plug on it.
>
> Sure, but the Snapdragon 820E is one we do need to worry about.
> While the internet pretty much agrees on Falkor having 128 bytes
> L1 cache line, it might be good to rule out that Kryo just misreports
> it before we revert the patch.
>
> Yassine, could you run the 'line' and 'cache' helper from lmbench
> to determine what the cache topology appears to be and if that
> matches the CTR_EL0 contents?
>
> Something like
>
> numactl -C 0 line -M 1M
> numactl -C 3 line -M 1M
> numactl -C 0 cache
> numactl -C 3 cache
>
> (the numactl command helps run this both on the 'big' and 'little'
> cores without running into migration)
>
> Arnd

Here are the results:

$ numactl -C 0 line -M 1M
128
$ numactl -C 3 line -M 1M
128
$ numactl -C 0 cache
L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
Memory latency: 145.93 nanoseconds 4.88 parallelism
$ numactl -C 3 cache
L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
Memory latency: 143.29 nanoseconds 5.37 parallelism


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 17:15                   ` Yassine Oudjana
  0 siblings, 0 replies; 73+ messages in thread
From: Yassine Oudjana @ 2021-07-06 17:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marc Zyngier, Will Deacon, Robin Murphy, Catalin Marinas,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tuesday, July 6th, 2021 at 7:43 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Jul 6, 2021 at 4:46 PM Marc Zyngier maz@kernel.org wrote:
> > On Tue, 06 Jul 2021 15:30:34 +0100, Arnd Bergmann arnd@arndb.de wrote:
> > > I can only speculate on how much got reused between the two, but
> > > as Falkor was released only after they had already given up on
> > > the full-custom Kryo core, it's plausible that it incorporates bits from
> > > that one. In particular the cache controller is probably easy to reuse
> > > even if the rest of it was a new design.
> >
> > I guess we'll never find out, and I'm probably one of the few still
> > having some access to this HW (not even sure for how long anyway).
> >
> > I won't cry if we decide to pull the plug on it.
>
> Sure, but the Snapdragon 820E is one we do need to worry about.
> While the internet pretty much agrees on Falkor having 128 bytes
> L1 cache line, it might be good to rule out that Kryo just misreports
> it before we revert the patch.
>
> Yassine, could you run the 'line' and 'cache' helper from lmbench
> to determine what the cache topology appears to be and if that
> matches the CTR_EL0 contents?
>
> Something like
>
> numactl -C 0 line -M 1M
> numactl -C 3 line -M 1M
> numactl -C 0 cache
> numactl -C 3 cache
>
> (the numactl command helps run this both on the 'big' and 'little'
> cores without running into migration)
>
> Arnd

Here are the results:

$ numactl -C 0 line -M 1M
128
$ numactl -C 3 line -M 1M
128
$ numactl -C 0 cache
L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
Memory latency: 145.93 nanoseconds 4.88 parallelism
$ numactl -C 3 cache
L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
Memory latency: 143.29 nanoseconds 5.37 parallelism


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 17:15                   ` Yassine Oudjana
@ 2021-07-06 20:33                     ` Arnd Bergmann
  -1 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-07-06 20:33 UTC (permalink / raw)
  To: Yassine Oudjana
  Cc: Marc Zyngier, Will Deacon, Robin Murphy, Catalin Marinas,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana <y.oudjana@protonmail.com> wrote:
> > (the numactl command helps run this both on the 'big' and 'little'
> > cores without running into migration)
> >
> > Arnd
>
> Here are the results:

Thanks, that was quick

> $ numactl -C 0 line -M 1M
> 128
> $ numactl -C 3 line -M 1M
> 128
> $ numactl -C 0 cache
> L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
> L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
> L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
> L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
> Memory latency: 145.93 nanoseconds 4.88 parallelism
> $ numactl -C 3 cache
> L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
> L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
> Memory latency: 143.29 nanoseconds 5.37 parallelism

This is still somewhat inconclusive, but it does give some hope. The data that
I found on random web sites was

- 32KB L1, 2MB/1MB L2 [1][2]
- 16KB L1, 1.5MB L2 [3]
- 32KB L1, 1MB/512KB L2 [4]

so none of the sizes really line up. My best guess is that the actual hierarchy

1MB per-core L2 cache on the two big CPU, 512KB per-core L2 cache on
the two little ones, but no shared L2 or L3. The older Krait had a 4KB L0
cache, which could explain the 512-byte L1 output.

Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
byte L1 line size that the 'cache' test reported?
      Arnd

[1] https://en.wikipedia.org/wiki/List_of_Qualcomm_Snapdragon_processors#Snapdragon_820_and_821_(2016)
[2] https://en.wikipedia.org/wiki/Kryo
[3] https://www.geektopia.es/es/product/qualcomm/snapdragon-820/
[4] https://www.anandtech.com/show/9837/snapdragon-820-preview/2

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 20:33                     ` Arnd Bergmann
  0 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-07-06 20:33 UTC (permalink / raw)
  To: Yassine Oudjana
  Cc: Marc Zyngier, Will Deacon, Robin Murphy, Catalin Marinas,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm

On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana <y.oudjana@protonmail.com> wrote:
> > (the numactl command helps run this both on the 'big' and 'little'
> > cores without running into migration)
> >
> > Arnd
>
> Here are the results:

Thanks, that was quick

> $ numactl -C 0 line -M 1M
> 128
> $ numactl -C 3 line -M 1M
> 128
> $ numactl -C 0 cache
> L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
> L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
> L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
> L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
> Memory latency: 145.93 nanoseconds 4.88 parallelism
> $ numactl -C 3 cache
> L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
> L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
> Memory latency: 143.29 nanoseconds 5.37 parallelism

This is still somewhat inconclusive, but it does give some hope. The data that
I found on random web sites was

- 32KB L1, 2MB/1MB L2 [1][2]
- 16KB L1, 1.5MB L2 [3]
- 32KB L1, 1MB/512KB L2 [4]

so none of the sizes really line up. My best guess is that the actual hierarchy

1MB per-core L2 cache on the two big CPU, 512KB per-core L2 cache on
the two little ones, but no shared L2 or L3. The older Krait had a 4KB L0
cache, which could explain the 512-byte L1 output.

Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
byte L1 line size that the 'cache' test reported?
      Arnd

[1] https://en.wikipedia.org/wiki/List_of_Qualcomm_Snapdragon_processors#Snapdragon_820_and_821_(2016)
[2] https://en.wikipedia.org/wiki/Kryo
[3] https://www.geektopia.es/es/product/qualcomm/snapdragon-820/
[4] https://www.anandtech.com/show/9837/snapdragon-820-preview/2

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 20:33                     ` Arnd Bergmann
@ 2021-07-06 22:27                       ` Bjorn Andersson
  -1 siblings, 0 replies; 73+ messages in thread
From: Bjorn Andersson @ 2021-07-06 22:27 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yassine Oudjana, Marc Zyngier, Will Deacon, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm

On Tue 06 Jul 15:33 CDT 2021, Arnd Bergmann wrote:

> On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana <y.oudjana@protonmail.com> wrote:
> > > (the numactl command helps run this both on the 'big' and 'little'
> > > cores without running into migration)
> > >
> > > Arnd
> >
> > Here are the results:
> 
> Thanks, that was quick
> 
> > $ numactl -C 0 line -M 1M
> > 128
> > $ numactl -C 3 line -M 1M
> > 128
> > $ numactl -C 0 cache
> > L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
> > L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
> > L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
> > L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
> > Memory latency: 145.93 nanoseconds 4.88 parallelism
> > $ numactl -C 3 cache
> > L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
> > L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
> > Memory latency: 143.29 nanoseconds 5.37 parallelism
> 
> This is still somewhat inconclusive, but it does give some hope. The data that
> I found on random web sites was
> 
> - 32KB L1, 2MB/1MB L2 [1][2]
> - 16KB L1, 1.5MB L2 [3]
> - 32KB L1, 1MB/512KB L2 [4]
> 
> so none of the sizes really line up. My best guess is that the actual hierarchy
> 
> 1MB per-core L2 cache on the two big CPU, 512KB per-core L2 cache on
> the two little ones, but no shared L2 or L3. The older Krait had a 4KB L0
> cache, which could explain the 512-byte L1 output.
> 
> Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> byte L1 line size that the 'cache' test reported?

I can confirm that MSM8996, and a few derivatives, has 128 byte cache
lines.

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-06 22:27                       ` Bjorn Andersson
  0 siblings, 0 replies; 73+ messages in thread
From: Bjorn Andersson @ 2021-07-06 22:27 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yassine Oudjana, Marc Zyngier, Will Deacon, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm

On Tue 06 Jul 15:33 CDT 2021, Arnd Bergmann wrote:

> On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana <y.oudjana@protonmail.com> wrote:
> > > (the numactl command helps run this both on the 'big' and 'little'
> > > cores without running into migration)
> > >
> > > Arnd
> >
> > Here are the results:
> 
> Thanks, that was quick
> 
> > $ numactl -C 0 line -M 1M
> > 128
> > $ numactl -C 3 line -M 1M
> > 128
> > $ numactl -C 0 cache
> > L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
> > L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
> > L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
> > L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
> > Memory latency: 145.93 nanoseconds 4.88 parallelism
> > $ numactl -C 3 cache
> > L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
> > L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
> > Memory latency: 143.29 nanoseconds 5.37 parallelism
> 
> This is still somewhat inconclusive, but it does give some hope. The data that
> I found on random web sites was
> 
> - 32KB L1, 2MB/1MB L2 [1][2]
> - 16KB L1, 1.5MB L2 [3]
> - 32KB L1, 1MB/512KB L2 [4]
> 
> so none of the sizes really line up. My best guess is that the actual hierarchy
> 
> 1MB per-core L2 cache on the two big CPU, 512KB per-core L2 cache on
> the two little ones, but no shared L2 or L3. The older Krait had a 4KB L0
> cache, which could explain the 512-byte L1 output.
> 
> Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> byte L1 line size that the 'cache' test reported?

I can confirm that MSM8996, and a few derivatives, has 128 byte cache
lines.

Regards,
Bjorn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 20:33                     ` Arnd Bergmann
@ 2021-07-07  8:24                       ` Yassine Oudjana
  -1 siblings, 0 replies; 73+ messages in thread
From: Yassine Oudjana @ 2021-07-07  8:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marc Zyngier, Will Deacon, Robin Murphy, Catalin Marinas,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > ...
>
> This is still somewhat inconclusive, but it does give some hope. The data that
>
> I found on random web sites was
>
> -   32KB L1, 2MB/1MB L2 [1][2]
> -   16KB L1, 1.5MB L2 [3]
> -   32KB L1, 1MB/512KB L2 [4]
>
>     so none of the sizes really line up. My best guess is that the actual hierarchy
>     1MB per-core L2 cache on the two big CPU, 512KB per-core L2 cache on
>     the two little ones, but no shared L2 or L3. The older Krait had a 4KB L0
>     cache, which could explain the 512-byte L1 output.
>
>     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
>     byte L1 line size that the 'cache' test reported?
>
>     Arnd
>
>     [1] https://en.wikipedia.org/wiki/List_of_Qualcomm_Snapdragon_processors#Snapdragon_820_and_821_(2016)
>     [2] https://en.wikipedia.org/wiki/Kryo
>     [3] https://www.geektopia.es/es/product/qualcomm/snapdragon-820/
>     [4] https://www.anandtech.com/show/9837/snapdragon-820-preview/2

$ numactl -C 0 line -M 128K
64
$ numactl -C 3 line -M 128K
64


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-07  8:24                       ` Yassine Oudjana
  0 siblings, 0 replies; 73+ messages in thread
From: Yassine Oudjana @ 2021-07-07  8:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marc Zyngier, Will Deacon, Robin Murphy, Catalin Marinas,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > ...
>
> This is still somewhat inconclusive, but it does give some hope. The data that
>
> I found on random web sites was
>
> -   32KB L1, 2MB/1MB L2 [1][2]
> -   16KB L1, 1.5MB L2 [3]
> -   32KB L1, 1MB/512KB L2 [4]
>
>     so none of the sizes really line up. My best guess is that the actual hierarchy
>     1MB per-core L2 cache on the two big CPU, 512KB per-core L2 cache on
>     the two little ones, but no shared L2 or L3. The older Krait had a 4KB L0
>     cache, which could explain the 512-byte L1 output.
>
>     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
>     byte L1 line size that the 'cache' test reported?
>
>     Arnd
>
>     [1] https://en.wikipedia.org/wiki/List_of_Qualcomm_Snapdragon_processors#Snapdragon_820_and_821_(2016)
>     [2] https://en.wikipedia.org/wiki/Kryo
>     [3] https://www.geektopia.es/es/product/qualcomm/snapdragon-820/
>     [4] https://www.anandtech.com/show/9837/snapdragon-820-preview/2

$ numactl -C 0 line -M 128K
64
$ numactl -C 3 line -M 128K
64


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-06 22:27                       ` Bjorn Andersson
@ 2021-07-07  9:27                         ` Will Deacon
  -1 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-07-07  9:27 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Arnd Bergmann, Yassine Oudjana, Marc Zyngier, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm

On Tue, Jul 06, 2021 at 05:27:53PM -0500, Bjorn Andersson wrote:
> On Tue 06 Jul 15:33 CDT 2021, Arnd Bergmann wrote:
> 
> > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana <y.oudjana@protonmail.com> wrote:
> > > > (the numactl command helps run this both on the 'big' and 'little'
> > > > cores without running into migration)
> > > >
> > > > Arnd
> > >
> > > Here are the results:
> > 
> > Thanks, that was quick
> > 
> > > $ numactl -C 0 line -M 1M
> > > 128
> > > $ numactl -C 3 line -M 1M
> > > 128
> > > $ numactl -C 0 cache
> > > L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
> > > L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
> > > L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
> > > L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
> > > Memory latency: 145.93 nanoseconds 4.88 parallelism
> > > $ numactl -C 3 cache
> > > L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
> > > L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
> > > Memory latency: 143.29 nanoseconds 5.37 parallelism
> > 
> > This is still somewhat inconclusive, but it does give some hope. The data that
> > I found on random web sites was
> > 
> > - 32KB L1, 2MB/1MB L2 [1][2]
> > - 16KB L1, 1.5MB L2 [3]
> > - 32KB L1, 1MB/512KB L2 [4]
> > 
> > so none of the sizes really line up. My best guess is that the actual hierarchy
> > 
> > 1MB per-core L2 cache on the two big CPU, 512KB per-core L2 cache on
> > the two little ones, but no shared L2 or L3. The older Krait had a 4KB L0
> > cache, which could explain the 512-byte L1 output.
> > 
> > Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> > byte L1 line size that the 'cache' test reported?
> 
> I can confirm that MSM8996, and a few derivatives, has 128 byte cache
> lines.

Do you know if the caches with 128-byte lines sit above the PoC? i.e. is
non-coherent DMA coherent with this cache or not?

Will

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-07  9:27                         ` Will Deacon
  0 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-07-07  9:27 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Arnd Bergmann, Yassine Oudjana, Marc Zyngier, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm

On Tue, Jul 06, 2021 at 05:27:53PM -0500, Bjorn Andersson wrote:
> On Tue 06 Jul 15:33 CDT 2021, Arnd Bergmann wrote:
> 
> > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana <y.oudjana@protonmail.com> wrote:
> > > > (the numactl command helps run this both on the 'big' and 'little'
> > > > cores without running into migration)
> > > >
> > > > Arnd
> > >
> > > Here are the results:
> > 
> > Thanks, that was quick
> > 
> > > $ numactl -C 0 line -M 1M
> > > 128
> > > $ numactl -C 3 line -M 1M
> > > 128
> > > $ numactl -C 0 cache
> > > L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
> > > L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
> > > L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
> > > L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
> > > Memory latency: 145.93 nanoseconds 4.88 parallelism
> > > $ numactl -C 3 cache
> > > L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
> > > L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
> > > Memory latency: 143.29 nanoseconds 5.37 parallelism
> > 
> > This is still somewhat inconclusive, but it does give some hope. The data that
> > I found on random web sites was
> > 
> > - 32KB L1, 2MB/1MB L2 [1][2]
> > - 16KB L1, 1.5MB L2 [3]
> > - 32KB L1, 1MB/512KB L2 [4]
> > 
> > so none of the sizes really line up. My best guess is that the actual hierarchy
> > 
> > 1MB per-core L2 cache on the two big CPU, 512KB per-core L2 cache on
> > the two little ones, but no shared L2 or L3. The older Krait had a 4KB L0
> > cache, which could explain the 512-byte L1 output.
> > 
> > Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> > byte L1 line size that the 'cache' test reported?
> 
> I can confirm that MSM8996, and a few derivatives, has 128 byte cache
> lines.

Do you know if the caches with 128-byte lines sit above the PoC? i.e. is
non-coherent DMA coherent with this cache or not?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-07  8:24                       ` Yassine Oudjana
@ 2021-07-07  9:29                         ` Arnd Bergmann
  -1 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-07-07  9:29 UTC (permalink / raw)
  To: Yassine Oudjana
  Cc: Marc Zyngier, Will Deacon, Robin Murphy, Catalin Marinas,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Tue, Jul 6, 2021 at 6:20 PM Will Deacon <will@kernel.org> wrote:
>
> I think the million dollar question is whether the 128-byte cache-lines
> live in a cache above the PoC or not. If it's just a system level cache
> through which all DMA is "coherent", then it doesn't matter.

On Wed, Jul 7, 2021 at 10:24 AM Yassine Oudjana
<y.oudjana@protonmail.com> wrote:
>
> On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > >
> > > $ numactl -C 0 line -M 1M
> > > 128
> > > $ numactl -C 3 line -M 1M
> > > 128
> >
> >     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> >     byte L1 line size that the 'cache' test reported?
>
> $ numactl -C 0 line -M 128K
> 64
> $ numactl -C 3 line -M 128K
> 64

Ok, so this seems to confirm that the L1 uses 64 byte lines, while the L2 (or
possibly L3) uses 128 byte lines.

On Wed, Jul 7, 2021 at 12:27 AM Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
>
> I can confirm that MSM8996, and a few derivatives, has 128 byte cache lines.

Ok, thanks. Assuming this is an outer cache and the L1 indeed has a smaller line
size, can you also confirm that this 128 byte lines are north of the point of
coherency?

In other words, does the CTR_EL0.DminLine field also show 128 bytes
(in which case
it seems we already lost)? And if not, does a CPU store to the second half of a
128 byte L2 line cause DMA data in the first half to be clobbered?

       Arnd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-07  9:29                         ` Arnd Bergmann
  0 siblings, 0 replies; 73+ messages in thread
From: Arnd Bergmann @ 2021-07-07  9:29 UTC (permalink / raw)
  To: Yassine Oudjana
  Cc: Marc Zyngier, Will Deacon, Robin Murphy, Catalin Marinas,
	Ard Biesheuvel, Android Kernel Team, Linux ARM, Mark Rutland,
	Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Tue, Jul 6, 2021 at 6:20 PM Will Deacon <will@kernel.org> wrote:
>
> I think the million dollar question is whether the 128-byte cache-lines
> live in a cache above the PoC or not. If it's just a system level cache
> through which all DMA is "coherent", then it doesn't matter.

On Wed, Jul 7, 2021 at 10:24 AM Yassine Oudjana
<y.oudjana@protonmail.com> wrote:
>
> On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > >
> > > $ numactl -C 0 line -M 1M
> > > 128
> > > $ numactl -C 3 line -M 1M
> > > 128
> >
> >     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> >     byte L1 line size that the 'cache' test reported?
>
> $ numactl -C 0 line -M 128K
> 64
> $ numactl -C 3 line -M 128K
> 64

Ok, so this seems to confirm that the L1 uses 64 byte lines, while the L2 (or
possibly L3) uses 128 byte lines.

On Wed, Jul 7, 2021 at 12:27 AM Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
>
> I can confirm that MSM8996, and a few derivatives, has 128 byte cache lines.

Ok, thanks. Assuming this is an outer cache and the L1 indeed has a smaller line
size, can you also confirm that this 128 byte lines are north of the point of
coherency?

In other words, does the CTR_EL0.DminLine field also show 128 bytes
(in which case
it seems we already lost)? And if not, does a CPU store to the second half of a
128 byte L2 line cause DMA data in the first half to be clobbered?

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-07  9:29                         ` Arnd Bergmann
@ 2021-07-07 14:41                           ` Jeffrey Hugo
  -1 siblings, 0 replies; 73+ messages in thread
From: Jeffrey Hugo @ 2021-07-07 14:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yassine Oudjana, Marc Zyngier, Will Deacon, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Wed, Jul 7, 2021 at 3:30 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 6, 2021 at 6:20 PM Will Deacon <will@kernel.org> wrote:
> >
> > I think the million dollar question is whether the 128-byte cache-lines
> > live in a cache above the PoC or not. If it's just a system level cache
> > through which all DMA is "coherent", then it doesn't matter.
>
> On Wed, Jul 7, 2021 at 10:24 AM Yassine Oudjana
> <y.oudjana@protonmail.com> wrote:
> >
> > On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > > >
> > > > $ numactl -C 0 line -M 1M
> > > > 128
> > > > $ numactl -C 3 line -M 1M
> > > > 128
> > >
> > >     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> > >     byte L1 line size that the 'cache' test reported?
> >
> > $ numactl -C 0 line -M 128K
> > 64
> > $ numactl -C 3 line -M 128K
> > 64
>
> Ok, so this seems to confirm that the L1 uses 64 byte lines, while the L2 (or
> possibly L3) uses 128 byte lines.
>
> On Wed, Jul 7, 2021 at 12:27 AM Bjorn Andersson
> <bjorn.andersson@linaro.org> wrote:
> >
> > I can confirm that MSM8996, and a few derivatives, has 128 byte cache lines.
>
> Ok, thanks. Assuming this is an outer cache and the L1 indeed has a smaller line
> size, can you also confirm that this 128 byte lines are north of the point of
> coherency?

Finding this old documentation has been painful  :)

L0 I 64 byte cacheline
L1 I 64
L1 D 64
L2 unified 128 (shared between the CPUs of a duplex)

I believe L2 is within the POC, but I'm trying to dig up the old
documentation to confirm.

>
> In other words, does the CTR_EL0.DminLine field also show 128 bytes
> (in which case
> it seems we already lost)? And if not, does a CPU store to the second half of a
> 128 byte L2 line cause DMA data in the first half to be clobbered?

Per the documentation I'm seeing, CTR_EL0.DminLine should show 128
bytes.  I don't have hardware handy to confirm.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-07 14:41                           ` Jeffrey Hugo
  0 siblings, 0 replies; 73+ messages in thread
From: Jeffrey Hugo @ 2021-07-07 14:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yassine Oudjana, Marc Zyngier, Will Deacon, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Wed, Jul 7, 2021 at 3:30 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 6, 2021 at 6:20 PM Will Deacon <will@kernel.org> wrote:
> >
> > I think the million dollar question is whether the 128-byte cache-lines
> > live in a cache above the PoC or not. If it's just a system level cache
> > through which all DMA is "coherent", then it doesn't matter.
>
> On Wed, Jul 7, 2021 at 10:24 AM Yassine Oudjana
> <y.oudjana@protonmail.com> wrote:
> >
> > On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > > >
> > > > $ numactl -C 0 line -M 1M
> > > > 128
> > > > $ numactl -C 3 line -M 1M
> > > > 128
> > >
> > >     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> > >     byte L1 line size that the 'cache' test reported?
> >
> > $ numactl -C 0 line -M 128K
> > 64
> > $ numactl -C 3 line -M 128K
> > 64
>
> Ok, so this seems to confirm that the L1 uses 64 byte lines, while the L2 (or
> possibly L3) uses 128 byte lines.
>
> On Wed, Jul 7, 2021 at 12:27 AM Bjorn Andersson
> <bjorn.andersson@linaro.org> wrote:
> >
> > I can confirm that MSM8996, and a few derivatives, has 128 byte cache lines.
>
> Ok, thanks. Assuming this is an outer cache and the L1 indeed has a smaller line
> size, can you also confirm that this 128 byte lines are north of the point of
> coherency?

Finding this old documentation has been painful  :)

L0 I 64 byte cacheline
L1 I 64
L1 D 64
L2 unified 128 (shared between the CPUs of a duplex)

I believe L2 is within the POC, but I'm trying to dig up the old
documentation to confirm.

>
> In other words, does the CTR_EL0.DminLine field also show 128 bytes
> (in which case
> it seems we already lost)? And if not, does a CPU store to the second half of a
> 128 byte L2 line cause DMA data in the first half to be clobbered?

Per the documentation I'm seeing, CTR_EL0.DminLine should show 128
bytes.  I don't have hardware handy to confirm.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-07 14:41                           ` Jeffrey Hugo
@ 2021-07-08 20:59                             ` Jeffrey Hugo
  -1 siblings, 0 replies; 73+ messages in thread
From: Jeffrey Hugo @ 2021-07-08 20:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yassine Oudjana, Marc Zyngier, Will Deacon, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Wed, Jul 7, 2021 at 8:41 AM Jeffrey Hugo <jeffrey.l.hugo@gmail.com> wrote:
>
> On Wed, Jul 7, 2021 at 3:30 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Tue, Jul 6, 2021 at 6:20 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > I think the million dollar question is whether the 128-byte cache-lines
> > > live in a cache above the PoC or not. If it's just a system level cache
> > > through which all DMA is "coherent", then it doesn't matter.
> >
> > On Wed, Jul 7, 2021 at 10:24 AM Yassine Oudjana
> > <y.oudjana@protonmail.com> wrote:
> > >
> > > On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > > > >
> > > > > $ numactl -C 0 line -M 1M
> > > > > 128
> > > > > $ numactl -C 3 line -M 1M
> > > > > 128
> > > >
> > > >     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> > > >     byte L1 line size that the 'cache' test reported?
> > >
> > > $ numactl -C 0 line -M 128K
> > > 64
> > > $ numactl -C 3 line -M 128K
> > > 64
> >
> > Ok, so this seems to confirm that the L1 uses 64 byte lines, while the L2 (or
> > possibly L3) uses 128 byte lines.
> >
> > On Wed, Jul 7, 2021 at 12:27 AM Bjorn Andersson
> > <bjorn.andersson@linaro.org> wrote:
> > >
> > > I can confirm that MSM8996, and a few derivatives, has 128 byte cache lines.
> >
> > Ok, thanks. Assuming this is an outer cache and the L1 indeed has a smaller line
> > size, can you also confirm that this 128 byte lines are north of the point of
> > coherency?
>
> Finding this old documentation has been painful  :)
>
> L0 I 64 byte cacheline
> L1 I 64
> L1 D 64
> L2 unified 128 (shared between the CPUs of a duplex)
>
> I believe L2 is within the POC, but I'm trying to dig up the old
> documentation to confirm.

Was able to track down a friendly hardware designer.  The POC lies
between L2 and L3.  Hope this helps.

> > In other words, does the CTR_EL0.DminLine field also show 128 bytes
> > (in which case
> > it seems we already lost)? And if not, does a CPU store to the second half of a
> > 128 byte L2 line cause DMA data in the first half to be clobbered?
>
> Per the documentation I'm seeing, CTR_EL0.DminLine should show 128
> bytes.  I don't have hardware handy to confirm.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-08 20:59                             ` Jeffrey Hugo
  0 siblings, 0 replies; 73+ messages in thread
From: Jeffrey Hugo @ 2021-07-08 20:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yassine Oudjana, Marc Zyngier, Will Deacon, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Wed, Jul 7, 2021 at 8:41 AM Jeffrey Hugo <jeffrey.l.hugo@gmail.com> wrote:
>
> On Wed, Jul 7, 2021 at 3:30 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Tue, Jul 6, 2021 at 6:20 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > I think the million dollar question is whether the 128-byte cache-lines
> > > live in a cache above the PoC or not. If it's just a system level cache
> > > through which all DMA is "coherent", then it doesn't matter.
> >
> > On Wed, Jul 7, 2021 at 10:24 AM Yassine Oudjana
> > <y.oudjana@protonmail.com> wrote:
> > >
> > > On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > > > >
> > > > > $ numactl -C 0 line -M 1M
> > > > > 128
> > > > > $ numactl -C 3 line -M 1M
> > > > > 128
> > > >
> > > >     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> > > >     byte L1 line size that the 'cache' test reported?
> > >
> > > $ numactl -C 0 line -M 128K
> > > 64
> > > $ numactl -C 3 line -M 128K
> > > 64
> >
> > Ok, so this seems to confirm that the L1 uses 64 byte lines, while the L2 (or
> > possibly L3) uses 128 byte lines.
> >
> > On Wed, Jul 7, 2021 at 12:27 AM Bjorn Andersson
> > <bjorn.andersson@linaro.org> wrote:
> > >
> > > I can confirm that MSM8996, and a few derivatives, has 128 byte cache lines.
> >
> > Ok, thanks. Assuming this is an outer cache and the L1 indeed has a smaller line
> > size, can you also confirm that this 128 byte lines are north of the point of
> > coherency?
>
> Finding this old documentation has been painful  :)
>
> L0 I 64 byte cacheline
> L1 I 64
> L1 D 64
> L2 unified 128 (shared between the CPUs of a duplex)
>
> I believe L2 is within the POC, but I'm trying to dig up the old
> documentation to confirm.

Was able to track down a friendly hardware designer.  The POC lies
between L2 and L3.  Hope this helps.

> > In other words, does the CTR_EL0.DminLine field also show 128 bytes
> > (in which case
> > it seems we already lost)? And if not, does a CPU store to the second half of a
> > 128 byte L2 line cause DMA data in the first half to be clobbered?
>
> Per the documentation I'm seeing, CTR_EL0.DminLine should show 128
> bytes.  I don't have hardware handy to confirm.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-08 20:59                             ` Jeffrey Hugo
@ 2021-07-09  8:48                               ` Will Deacon
  -1 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-07-09  8:48 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: Arnd Bergmann, Yassine Oudjana, Marc Zyngier, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Thu, Jul 08, 2021 at 02:59:28PM -0600, Jeffrey Hugo wrote:
> On Wed, Jul 7, 2021 at 8:41 AM Jeffrey Hugo <jeffrey.l.hugo@gmail.com> wrote:
> >
> > On Wed, Jul 7, 2021 at 3:30 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Tue, Jul 6, 2021 at 6:20 PM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > I think the million dollar question is whether the 128-byte cache-lines
> > > > live in a cache above the PoC or not. If it's just a system level cache
> > > > through which all DMA is "coherent", then it doesn't matter.
> > >
> > > On Wed, Jul 7, 2021 at 10:24 AM Yassine Oudjana
> > > <y.oudjana@protonmail.com> wrote:
> > > >
> > > > On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > > > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > > > > >
> > > > > > $ numactl -C 0 line -M 1M
> > > > > > 128
> > > > > > $ numactl -C 3 line -M 1M
> > > > > > 128
> > > > >
> > > > >     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> > > > >     byte L1 line size that the 'cache' test reported?
> > > >
> > > > $ numactl -C 0 line -M 128K
> > > > 64
> > > > $ numactl -C 3 line -M 128K
> > > > 64
> > >
> > > Ok, so this seems to confirm that the L1 uses 64 byte lines, while the L2 (or
> > > possibly L3) uses 128 byte lines.
> > >
> > > On Wed, Jul 7, 2021 at 12:27 AM Bjorn Andersson
> > > <bjorn.andersson@linaro.org> wrote:
> > > >
> > > > I can confirm that MSM8996, and a few derivatives, has 128 byte cache lines.
> > >
> > > Ok, thanks. Assuming this is an outer cache and the L1 indeed has a smaller line
> > > size, can you also confirm that this 128 byte lines are north of the point of
> > > coherency?
> >
> > Finding this old documentation has been painful  :)
> >
> > L0 I 64 byte cacheline
> > L1 I 64
> > L1 D 64
> > L2 unified 128 (shared between the CPUs of a duplex)
> >
> > I believe L2 is within the POC, but I'm trying to dig up the old
> > documentation to confirm.
> 
> Was able to track down a friendly hardware designer.  The POC lies
> between L2 and L3.  Hope this helps.

Damn, yes, it's bad news but thanks for chasing it up. I'll revert the patch
at -rc1 and add a comment about MSM8996.

Will

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-09  8:48                               ` Will Deacon
  0 siblings, 0 replies; 73+ messages in thread
From: Will Deacon @ 2021-07-09  8:48 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: Arnd Bergmann, Yassine Oudjana, Marc Zyngier, Robin Murphy,
	Catalin Marinas, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Thu, Jul 08, 2021 at 02:59:28PM -0600, Jeffrey Hugo wrote:
> On Wed, Jul 7, 2021 at 8:41 AM Jeffrey Hugo <jeffrey.l.hugo@gmail.com> wrote:
> >
> > On Wed, Jul 7, 2021 at 3:30 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Tue, Jul 6, 2021 at 6:20 PM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > I think the million dollar question is whether the 128-byte cache-lines
> > > > live in a cache above the PoC or not. If it's just a system level cache
> > > > through which all DMA is "coherent", then it doesn't matter.
> > >
> > > On Wed, Jul 7, 2021 at 10:24 AM Yassine Oudjana
> > > <y.oudjana@protonmail.com> wrote:
> > > >
> > > > On Wednesday, July 7th, 2021 at 12:33 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > > > On Tue, Jul 6, 2021 at 7:15 PM Yassine Oudjana y.oudjana@protonmail.com wrote:
> > > > > >
> > > > > > $ numactl -C 0 line -M 1M
> > > > > > 128
> > > > > > $ numactl -C 3 line -M 1M
> > > > > > 128
> > > > >
> > > > >     Can you rerun the the 'line' test with '-M 128K' to see if that confirms the 64
> > > > >     byte L1 line size that the 'cache' test reported?
> > > >
> > > > $ numactl -C 0 line -M 128K
> > > > 64
> > > > $ numactl -C 3 line -M 128K
> > > > 64
> > >
> > > Ok, so this seems to confirm that the L1 uses 64 byte lines, while the L2 (or
> > > possibly L3) uses 128 byte lines.
> > >
> > > On Wed, Jul 7, 2021 at 12:27 AM Bjorn Andersson
> > > <bjorn.andersson@linaro.org> wrote:
> > > >
> > > > I can confirm that MSM8996, and a few derivatives, has 128 byte cache lines.
> > >
> > > Ok, thanks. Assuming this is an outer cache and the L1 indeed has a smaller line
> > > size, can you also confirm that this 128 byte lines are north of the point of
> > > coherency?
> >
> > Finding this old documentation has been painful  :)
> >
> > L0 I 64 byte cacheline
> > L1 I 64
> > L1 D 64
> > L2 unified 128 (shared between the CPUs of a duplex)
> >
> > I believe L2 is within the POC, but I'm trying to dig up the old
> > documentation to confirm.
> 
> Was able to track down a friendly hardware designer.  The POC lies
> between L2 and L3.  Hope this helps.

Damn, yes, it's bad news but thanks for chasing it up. I'll revert the patch
at -rc1 and add a comment about MSM8996.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
  2021-07-09  8:48                               ` Will Deacon
@ 2021-07-09 17:10                                 ` Catalin Marinas
  -1 siblings, 0 replies; 73+ messages in thread
From: Catalin Marinas @ 2021-07-09 17:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jeffrey Hugo, Arnd Bergmann, Yassine Oudjana, Marc Zyngier,
	Robin Murphy, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Fri, Jul 09, 2021 at 09:48:42AM +0100, Will Deacon wrote:
> On Thu, Jul 08, 2021 at 02:59:28PM -0600, Jeffrey Hugo wrote:
> > On Wed, Jul 7, 2021 at 8:41 AM Jeffrey Hugo <jeffrey.l.hugo@gmail.com> wrote:
> > > L0 I 64 byte cacheline
> > > L1 I 64
> > > L1 D 64
> > > L2 unified 128 (shared between the CPUs of a duplex)
> > >
> > > I believe L2 is within the POC, but I'm trying to dig up the old
> > > documentation to confirm.
> > 
> > Was able to track down a friendly hardware designer.  The POC lies
> > between L2 and L3.  Hope this helps.
> 
> Damn, yes, it's bad news but thanks for chasing it up. I'll revert the patch
> at -rc1 and add a comment about MSM8996.

It's a shame but we can't do much for this platform.

Longer term, we should look at making kmalloc() cache selection more
dynamic. Probably still starting with a 128 byte minimum size but, after
initialising all the devices during boot, if we can't find any
non-coherent one just relax the kmalloc() allocations. We still have the
issue with platform devices with DT assumed to be non-coherent and any
late call (after boot) to arch_setup_dma_ops().

Some bodge below to get an idea, not a final patch (not even the
beginning of one). It initialises the kmalloc caches to size 8 but
limits the allocation size to a kmalloc_dyn_min_size, initially set to
128 on arm64. In a device_initcall_sync(), if we didn't find any
non-coherent device, we lower this to KMALLOC_MIN_SIZE (8 with slub).

----------------8<----------------------------
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a074459f8f2f..bed65db3c42e 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -40,15 +40,6 @@
 #define CLIDR_LOC(clidr)	(((clidr) >> CLIDR_LOC_SHIFT) & 0x7)
 #define CLIDR_LOUIS(clidr)	(((clidr) >> CLIDR_LOUIS_SHIFT) & 0x7)
 
-/*
- * Memory returned by kmalloc() may be used for DMA, so we must make
- * sure that all such allocations are cache aligned. Otherwise,
- * unrelated code may cause parts of the buffer to be read into the
- * cache before the transfer is done, causing old data to be seen by
- * the CPU.
- */
-#define ARCH_DMA_MINALIGN	(128)
-
 #ifdef CONFIG_KASAN_SW_TAGS
 #define ARCH_SLAB_MINALIGN	(1ULL << KASAN_SHADOW_SCALE_SHIFT)
 #elif defined(CONFIG_KASAN_HW_TAGS)
@@ -59,6 +50,9 @@
 
 #include <linux/bitops.h>
 
+extern int kmalloc_dyn_min_size;
+#define __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+
 #define ICACHEF_ALIASING	0
 #define ICACHEF_VPIPT		1
 extern unsigned long __icache_flags;
@@ -88,7 +82,7 @@ static inline int cache_line_size_of_cpu(void)
 {
 	u32 cwg = cache_type_cwg();
 
-	return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
+	return cwg ? 4 << cwg : __alignof__(unsigned long long);
 }
 
 int cache_line_size(void);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index efed2830d141..a25813377187 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2808,8 +2808,8 @@ void __init setup_cpu_features(void)
 	 */
 	cwg = cache_type_cwg();
 	if (!cwg)
-		pr_warn("No Cache Writeback Granule information, assuming %d\n",
-			ARCH_DMA_MINALIGN);
+		pr_warn("No Cache Writeback Granule information, assuming %ld\n",
+			__alignof__(unsigned long long));
 }
 
 static void __maybe_unused cpu_enable_cnp(struct arm64_cpu_capabilities const *cap)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4bf1dd3eb041..9a30d1beb3ea 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -13,6 +13,18 @@
 
 #include <asm/cacheflush.h>
 
+/*
+ * Memory returned by kmalloc() may be used for DMA, so we must make
+ * sure that all such allocations are cache aligned. Otherwise,
+ * unrelated code may cause parts of the buffer to be read into the
+ * cache before the transfer is done, causing old data to be seen by
+ * the CPU.
+ */
+int kmalloc_dyn_min_size = 128;
+EXPORT_SYMBOL(kmalloc_dyn_min_size);
+
+static bool non_coherent_devices;
+
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir)
 {
@@ -42,11 +54,14 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 {
 	int cls = cache_line_size_of_cpu();
 
-	WARN_TAINT(!coherent && cls > ARCH_DMA_MINALIGN,
+	WARN_TAINT(!coherent && cls > kmalloc_dyn_min_size,
 		   TAINT_CPU_OUT_OF_SPEC,
-		   "%s %s: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
+		   "%s %s: kmalloc() minimum size smaller than CTR_EL0.CWG (%d < %d)",
 		   dev_driver_string(dev), dev_name(dev),
-		   ARCH_DMA_MINALIGN, cls);
+		   kmalloc_dyn_min_size, cls);
+
+	if (!coherent)
+		non_coherent_devices = true;
 
 	dev->dma_coherent = coherent;
 	if (iommu)
@@ -57,3 +72,12 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		dev->dma_ops = &xen_swiotlb_dma_ops;
 #endif
 }
+
+static int __init adjust_kmalloc_dyn_min_size(void)
+{
+	if (!non_coherent_devices)
+		kmalloc_dyn_min_size = KMALLOC_MIN_SIZE;
+
+	return 0;
+}
+device_initcall_sync(adjust_kmalloc_dyn_min_size);
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c97d788762c..e40c7899cb07 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -349,15 +349,21 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
  */
 static __always_inline unsigned int kmalloc_index(size_t size)
 {
+	int min_size = KMALLOC_MIN_SIZE;
+
 	if (!size)
 		return 0;
 
-	if (size <= KMALLOC_MIN_SIZE)
-		return KMALLOC_SHIFT_LOW;
+#ifdef __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+	min_size = kmalloc_dyn_min_size;
+#endif
+
+	if (size <= min_size)
+		return ilog2(min_size);
 
-	if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96)
+	if (min_size <= 32 && size > 64 && size <= 96)
 		return 1;
-	if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192)
+	if (min_size <= 64 && size > 128 && size <= 192)
 		return 2;
 	if (size <=          8) return 3;
 	if (size <=         16) return 4;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7cab77655f11..2666237c84c4 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -725,6 +725,10 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
 		if (!size)
 			return ZERO_SIZE_PTR;
 
+#ifdef __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+		if (size < kmalloc_dyn_min_size)
+			size = kmalloc_dyn_min_size;
+#endif
 		index = size_index[size_index_elem(size)];
 	} else {
 		if (WARN_ON_ONCE(size > KMALLOC_MAX_CACHE_SIZE))

-- 
Catalin

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)
@ 2021-07-09 17:10                                 ` Catalin Marinas
  0 siblings, 0 replies; 73+ messages in thread
From: Catalin Marinas @ 2021-07-09 17:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jeffrey Hugo, Arnd Bergmann, Yassine Oudjana, Marc Zyngier,
	Robin Murphy, Ard Biesheuvel, Android Kernel Team, Linux ARM,
	Mark Rutland, Vincent Whitchurch, linux-arm-msm, Bjorn Andersson

On Fri, Jul 09, 2021 at 09:48:42AM +0100, Will Deacon wrote:
> On Thu, Jul 08, 2021 at 02:59:28PM -0600, Jeffrey Hugo wrote:
> > On Wed, Jul 7, 2021 at 8:41 AM Jeffrey Hugo <jeffrey.l.hugo@gmail.com> wrote:
> > > L0 I 64 byte cacheline
> > > L1 I 64
> > > L1 D 64
> > > L2 unified 128 (shared between the CPUs of a duplex)
> > >
> > > I believe L2 is within the POC, but I'm trying to dig up the old
> > > documentation to confirm.
> > 
> > Was able to track down a friendly hardware designer.  The POC lies
> > between L2 and L3.  Hope this helps.
> 
> Damn, yes, it's bad news but thanks for chasing it up. I'll revert the patch
> at -rc1 and add a comment about MSM8996.

It's a shame but we can't do much for this platform.

Longer term, we should look at making kmalloc() cache selection more
dynamic. Probably still starting with a 128 byte minimum size but, after
initialising all the devices during boot, if we can't find any
non-coherent one just relax the kmalloc() allocations. We still have the
issue with platform devices with DT assumed to be non-coherent and any
late call (after boot) to arch_setup_dma_ops().

Some bodge below to get an idea, not a final patch (not even the
beginning of one). It initialises the kmalloc caches to size 8 but
limits the allocation size to a kmalloc_dyn_min_size, initially set to
128 on arm64. In a device_initcall_sync(), if we didn't find any
non-coherent device, we lower this to KMALLOC_MIN_SIZE (8 with slub).

----------------8<----------------------------
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a074459f8f2f..bed65db3c42e 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -40,15 +40,6 @@
 #define CLIDR_LOC(clidr)	(((clidr) >> CLIDR_LOC_SHIFT) & 0x7)
 #define CLIDR_LOUIS(clidr)	(((clidr) >> CLIDR_LOUIS_SHIFT) & 0x7)
 
-/*
- * Memory returned by kmalloc() may be used for DMA, so we must make
- * sure that all such allocations are cache aligned. Otherwise,
- * unrelated code may cause parts of the buffer to be read into the
- * cache before the transfer is done, causing old data to be seen by
- * the CPU.
- */
-#define ARCH_DMA_MINALIGN	(128)
-
 #ifdef CONFIG_KASAN_SW_TAGS
 #define ARCH_SLAB_MINALIGN	(1ULL << KASAN_SHADOW_SCALE_SHIFT)
 #elif defined(CONFIG_KASAN_HW_TAGS)
@@ -59,6 +50,9 @@
 
 #include <linux/bitops.h>
 
+extern int kmalloc_dyn_min_size;
+#define __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+
 #define ICACHEF_ALIASING	0
 #define ICACHEF_VPIPT		1
 extern unsigned long __icache_flags;
@@ -88,7 +82,7 @@ static inline int cache_line_size_of_cpu(void)
 {
 	u32 cwg = cache_type_cwg();
 
-	return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
+	return cwg ? 4 << cwg : __alignof__(unsigned long long);
 }
 
 int cache_line_size(void);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index efed2830d141..a25813377187 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2808,8 +2808,8 @@ void __init setup_cpu_features(void)
 	 */
 	cwg = cache_type_cwg();
 	if (!cwg)
-		pr_warn("No Cache Writeback Granule information, assuming %d\n",
-			ARCH_DMA_MINALIGN);
+		pr_warn("No Cache Writeback Granule information, assuming %ld\n",
+			__alignof__(unsigned long long));
 }
 
 static void __maybe_unused cpu_enable_cnp(struct arm64_cpu_capabilities const *cap)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4bf1dd3eb041..9a30d1beb3ea 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -13,6 +13,18 @@
 
 #include <asm/cacheflush.h>
 
+/*
+ * Memory returned by kmalloc() may be used for DMA, so we must make
+ * sure that all such allocations are cache aligned. Otherwise,
+ * unrelated code may cause parts of the buffer to be read into the
+ * cache before the transfer is done, causing old data to be seen by
+ * the CPU.
+ */
+int kmalloc_dyn_min_size = 128;
+EXPORT_SYMBOL(kmalloc_dyn_min_size);
+
+static bool non_coherent_devices;
+
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
 		enum dma_data_direction dir)
 {
@@ -42,11 +54,14 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 {
 	int cls = cache_line_size_of_cpu();
 
-	WARN_TAINT(!coherent && cls > ARCH_DMA_MINALIGN,
+	WARN_TAINT(!coherent && cls > kmalloc_dyn_min_size,
 		   TAINT_CPU_OUT_OF_SPEC,
-		   "%s %s: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
+		   "%s %s: kmalloc() minimum size smaller than CTR_EL0.CWG (%d < %d)",
 		   dev_driver_string(dev), dev_name(dev),
-		   ARCH_DMA_MINALIGN, cls);
+		   kmalloc_dyn_min_size, cls);
+
+	if (!coherent)
+		non_coherent_devices = true;
 
 	dev->dma_coherent = coherent;
 	if (iommu)
@@ -57,3 +72,12 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		dev->dma_ops = &xen_swiotlb_dma_ops;
 #endif
 }
+
+static int __init adjust_kmalloc_dyn_min_size(void)
+{
+	if (!non_coherent_devices)
+		kmalloc_dyn_min_size = KMALLOC_MIN_SIZE;
+
+	return 0;
+}
+device_initcall_sync(adjust_kmalloc_dyn_min_size);
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c97d788762c..e40c7899cb07 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -349,15 +349,21 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
  */
 static __always_inline unsigned int kmalloc_index(size_t size)
 {
+	int min_size = KMALLOC_MIN_SIZE;
+
 	if (!size)
 		return 0;
 
-	if (size <= KMALLOC_MIN_SIZE)
-		return KMALLOC_SHIFT_LOW;
+#ifdef __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+	min_size = kmalloc_dyn_min_size;
+#endif
+
+	if (size <= min_size)
+		return ilog2(min_size);
 
-	if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96)
+	if (min_size <= 32 && size > 64 && size <= 96)
 		return 1;
-	if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192)
+	if (min_size <= 64 && size > 128 && size <= 192)
 		return 2;
 	if (size <=          8) return 3;
 	if (size <=         16) return 4;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7cab77655f11..2666237c84c4 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -725,6 +725,10 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
 		if (!size)
 			return ZERO_SIZE_PTR;
 
+#ifdef __HAVE_ARCH_KMALLOC_DYN_MIN_SIZE
+		if (size < kmalloc_dyn_min_size)
+			size = kmalloc_dyn_min_size;
+#endif
 		index = size_index[size_index_elem(size)];
 	} else {
 		if (WARN_ON_ONCE(size > KMALLOC_MAX_CACHE_SIZE))

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2021-07-09 17:12 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-27 12:43 [PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES) Will Deacon
2021-05-27 13:11 ` Catalin Marinas
2021-05-27 13:19 ` Mark Rutland
2021-05-28  9:35   ` Arnd Bergmann
2021-06-01 10:14     ` Catalin Marinas
2021-05-31  5:38 ` Ard Biesheuvel
2021-06-01 18:21 ` Will Deacon
     [not found] ` <CGME20210602132541eucas1p17127696041c26c00d1d2f50bef9cfaf0@eucas1p1.samsung.com>
2021-06-02 13:25   ` Marek Szyprowski
2021-06-02 13:51     ` Mark Rutland
2021-06-02 14:09       ` Marek Szyprowski
2021-06-02 14:14         ` Arnd Bergmann
2021-06-02 14:28           ` Marek Szyprowski
2021-06-02 14:52             ` Arnd Bergmann
2021-06-07 12:17               ` Arnd Bergmann
2021-06-04 10:01         ` Mark Rutland
2021-06-07  9:58           ` Marek Szyprowski
2021-06-07 12:01             ` Mark Rutland
2021-06-07 13:08               ` Mark Rutland
2021-06-07 13:39                 ` Will Deacon
2021-06-07 13:39                   ` Will Deacon
2021-06-07 13:56                   ` Mark Rutland
2021-06-07 13:56                     ` Mark Rutland
2021-06-07 13:57                   ` Arnd Bergmann
2021-06-07 13:57                     ` Arnd Bergmann
2021-06-07 15:17                     ` Maxime Ripard
2021-06-07 15:17                       ` Maxime Ripard
2021-06-07 15:50                       ` Arnd Bergmann
2021-06-07 15:50                         ` Arnd Bergmann
2021-06-08  8:57                         ` Mark Rutland
2021-06-08  8:57                           ` Mark Rutland
2021-06-07 15:32                     ` Mark Rutland
2021-06-07 15:32                       ` Mark Rutland
2021-06-02 14:11       ` Arnd Bergmann
2021-06-02 14:15         ` Marek Szyprowski
2021-07-06  9:26 ` Yassine Oudjana
2021-07-06 10:26   ` Catalin Marinas
2021-07-06 10:26     ` Catalin Marinas
2021-07-06 13:29     ` Robin Murphy
2021-07-06 13:29       ` Robin Murphy
2021-07-06 13:33       ` Will Deacon
2021-07-06 13:33         ` Will Deacon
2021-07-06 13:44         ` Marc Zyngier
2021-07-06 13:44           ` Marc Zyngier
2021-07-06 14:21           ` Robin Murphy
2021-07-06 14:21             ` Robin Murphy
2021-07-06 14:30           ` Arnd Bergmann
2021-07-06 14:30             ` Arnd Bergmann
2021-07-06 14:46             ` Marc Zyngier
2021-07-06 14:46               ` Marc Zyngier
2021-07-06 15:43               ` Arnd Bergmann
2021-07-06 15:43                 ` Arnd Bergmann
2021-07-06 17:15                 ` Yassine Oudjana
2021-07-06 17:15                   ` Yassine Oudjana
2021-07-06 20:33                   ` Arnd Bergmann
2021-07-06 20:33                     ` Arnd Bergmann
2021-07-06 22:27                     ` Bjorn Andersson
2021-07-06 22:27                       ` Bjorn Andersson
2021-07-07  9:27                       ` Will Deacon
2021-07-07  9:27                         ` Will Deacon
2021-07-07  8:24                     ` Yassine Oudjana
2021-07-07  8:24                       ` Yassine Oudjana
2021-07-07  9:29                       ` Arnd Bergmann
2021-07-07  9:29                         ` Arnd Bergmann
2021-07-07 14:41                         ` Jeffrey Hugo
2021-07-07 14:41                           ` Jeffrey Hugo
2021-07-08 20:59                           ` Jeffrey Hugo
2021-07-08 20:59                             ` Jeffrey Hugo
2021-07-09  8:48                             ` Will Deacon
2021-07-09  8:48                               ` Will Deacon
2021-07-09 17:10                               ` Catalin Marinas
2021-07-09 17:10                                 ` Catalin Marinas
2021-07-06 16:20             ` Will Deacon
2021-07-06 16:20               ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.