From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932333AbcDSOsf (ORCPT <rfc822;w@1wt.eu>);
	Tue, 19 Apr 2016 10:48:35 -0400
Received: from mail-io0-f179.google.com ([209.85.223.179]:33259 "EHLO
	mail-io0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932174AbcDSOse (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 19 Apr 2016 10:48:34 -0400
MIME-Version: 1.0
In-Reply-To: <20160419141321.GD8482@e104818-lin.cambridge.arm.com>
References: <1461061773-19571-1-git-send-email-ard.biesheuvel@linaro.org>
	<20160419125615.GA20991@leverpostej>
	<CAKv+Gu91hFkn5k0s=xVmJ+z0vEYMNe6h6UD6z9Qb0fxe7ba0Vw@mail.gmail.com>
	<20160419141321.GD8482@e104818-lin.cambridge.arm.com>
Date: Tue, 19 Apr 2016 16:48:32 +0200
Message-ID: <CAKv+Gu9+V_i6E_85W7kArugp4mwUjYf_p-=O7AFYGn460xU2vw@mail.gmail.com>
Subject: Re: [PATCH] arm64: mm: take CWG into account in __inval_cache_range()
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>, Will Deacon <will.deacon@arm.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        James Morse <james.morse@arm.com>, robin.murphy@arm.com,
        "linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 19 April 2016 at 16:13, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Tue, Apr 19, 2016 at 03:08:27PM +0200, Ard Biesheuvel wrote:
>> On 19 April 2016 at 14:56, Mark Rutland <mark.rutland@arm.com> wrote:
>> > Is sharing a CWG broken by design, or is there some caveat I'm missing
>> > that prevents/prohibits unrelated data from being dirtied?
>>
>> I think sharing a CWG window is broken by design, now that I think of
>> it. The invalidate is part of the dma_unmap() code path, which means
>> the cleaning we do on the edges of the buffer may clobber data in
>> memory written by the device, and not cleaning isn't an option either.
>
> Indeed. It only works if the cache line is always clean (e.g. read or
> speculatively loaded) but you can't guarantee what's accessing it. I
> think we shouldn't even bother with the edges at all in __dma_inv_range.
>

Agreed.

> The best we could do is to warn if ARCH_DMA_MINALIGN is smaller than CWG
> (as Robin suggested, we could do this only if we have non-coherent DMA
> masters via arch_setup_dma_ops()). Quick hack below:
>
> -------------------------------8<-----------------------------------
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index 5082b30bc2c0..5967fcbb617a 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -28,7 +28,7 @@
>   * cache before the transfer is done, causing old data to be seen by
>   * the CPU.
>   */
> -#define ARCH_DMA_MINALIGN      L1_CACHE_BYTES
> +#define ARCH_DMA_MINALIGN      128
>
>  #ifndef __ASSEMBLY__
>
> @@ -37,7 +37,7 @@
>  static inline int cache_line_size(void)
>  {
>         u32 cwg = cache_type_cwg();
> -       return cwg ? 4 << cwg : L1_CACHE_BYTES;
> +       return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;

Unrelated, but this does not look right: if the CWG field is zero, we
should either assume 2 KB, or iterate over all the CCSIDR values and
take the maximum linesize.

>  }
>
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 943f5140e0f3..b1d4d1f5b0dd 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -970,7 +970,6 @@ static void __init setup_feature_capabilities(void)
>  void __init setup_cpu_features(void)
>  {
>         u32 cwg;
> -       int cls;
>
>         /* Set the CPU feature capabilies */
>         setup_feature_capabilities();
> @@ -983,13 +982,9 @@ void __init setup_cpu_features(void)
>          * Check for sane CTR_EL0.CWG value.
>          */
>         cwg = cache_type_cwg();
> -       cls = cache_line_size();
>         if (!cwg)
> -               pr_warn("No Cache Writeback Granule information, assuming cache line size %d\n",
> -                       cls);
> -       if (L1_CACHE_BYTES < cls)
> -               pr_warn("L1_CACHE_BYTES smaller than the Cache Writeback Granule (%d < %d)\n",
> -                       L1_CACHE_BYTES, cls);
> +               pr_warn("No Cache Writeback Granule information, assuming %d\n",
> +                       ARCH_DMA_MINALIGN);
>  }
>
>  static bool __maybe_unused
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index a6e757cbab77..08232faf38ad 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -987,6 +987,11 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>                         struct iommu_ops *iommu, bool coherent)
>  {
> +       WARN_TAINT_ONCE(!coherent && ARCH_DMA_MINALIGN < cache_line_size(),
> +                       TAINT_CPU_OUT_OF_SPEC,
> +                       "ARCH_DMA_MINALIGN smaller than the Cache Writeback Granule (%d < %d)",
> +                       ARCH_DMA_MINALIGN, cache_line_size());
> +
>         if (!dev->archdata.dma_ops)
>                 dev->archdata.dma_ops = &swiotlb_dma_ops;
>
> -------------------------------8<-----------------------------------
>
> This way we could also reduce L1_CACHE_SHIFT back to 6 as it shouldn't
> cause any correctness issues (potentially performance but so far it
> seems that increasing it made things worse on some SoCs).
>
> The reason for cache_line_size() using ARCH_DMA_MINALIGN instead of
> L1_CACHE_BYTES is that I can see it used in the sl*b code when
> SLAB_HWCACHE_ALIGN is passed and on few occasions for DMA buffers.
>
> --
> Catalin

From mboxrd@z Thu Jan  1 00:00:00 1970
From: ard.biesheuvel@linaro.org (Ard Biesheuvel)
Date: Tue, 19 Apr 2016 16:48:32 +0200
Subject: [PATCH] arm64: mm: take CWG into account in __inval_cache_range()
In-Reply-To: <20160419141321.GD8482@e104818-lin.cambridge.arm.com>
References: <1461061773-19571-1-git-send-email-ard.biesheuvel@linaro.org>
 <20160419125615.GA20991@leverpostej>
 <CAKv+Gu91hFkn5k0s=xVmJ+z0vEYMNe6h6UD6z9Qb0fxe7ba0Vw@mail.gmail.com>
 <20160419141321.GD8482@e104818-lin.cambridge.arm.com>
Message-ID: <CAKv+Gu9+V_i6E_85W7kArugp4mwUjYf_p-=O7AFYGn460xU2vw@mail.gmail.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On 19 April 2016 at 16:13, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Tue, Apr 19, 2016 at 03:08:27PM +0200, Ard Biesheuvel wrote:
>> On 19 April 2016 at 14:56, Mark Rutland <mark.rutland@arm.com> wrote:
>> > Is sharing a CWG broken by design, or is there some caveat I'm missing
>> > that prevents/prohibits unrelated data from being dirtied?
>>
>> I think sharing a CWG window is broken by design, now that I think of
>> it. The invalidate is part of the dma_unmap() code path, which means
>> the cleaning we do on the edges of the buffer may clobber data in
>> memory written by the device, and not cleaning isn't an option either.
>
> Indeed. It only works if the cache line is always clean (e.g. read or
> speculatively loaded) but you can't guarantee what's accessing it. I
> think we shouldn't even bother with the edges at all in __dma_inv_range.
>

Agreed.

> The best we could do is to warn if ARCH_DMA_MINALIGN is smaller than CWG
> (as Robin suggested, we could do this only if we have non-coherent DMA
> masters via arch_setup_dma_ops()). Quick hack below:
>
> -------------------------------8<-----------------------------------
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index 5082b30bc2c0..5967fcbb617a 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -28,7 +28,7 @@
>   * cache before the transfer is done, causing old data to be seen by
>   * the CPU.
>   */
> -#define ARCH_DMA_MINALIGN      L1_CACHE_BYTES
> +#define ARCH_DMA_MINALIGN      128
>
>  #ifndef __ASSEMBLY__
>
> @@ -37,7 +37,7 @@
>  static inline int cache_line_size(void)
>  {
>         u32 cwg = cache_type_cwg();
> -       return cwg ? 4 << cwg : L1_CACHE_BYTES;
> +       return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;

Unrelated, but this does not look right: if the CWG field is zero, we
should either assume 2 KB, or iterate over all the CCSIDR values and
take the maximum linesize.

>  }
>
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 943f5140e0f3..b1d4d1f5b0dd 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -970,7 +970,6 @@ static void __init setup_feature_capabilities(void)
>  void __init setup_cpu_features(void)
>  {
>         u32 cwg;
> -       int cls;
>
>         /* Set the CPU feature capabilies */
>         setup_feature_capabilities();
> @@ -983,13 +982,9 @@ void __init setup_cpu_features(void)
>          * Check for sane CTR_EL0.CWG value.
>          */
>         cwg = cache_type_cwg();
> -       cls = cache_line_size();
>         if (!cwg)
> -               pr_warn("No Cache Writeback Granule information, assuming cache line size %d\n",
> -                       cls);
> -       if (L1_CACHE_BYTES < cls)
> -               pr_warn("L1_CACHE_BYTES smaller than the Cache Writeback Granule (%d < %d)\n",
> -                       L1_CACHE_BYTES, cls);
> +               pr_warn("No Cache Writeback Granule information, assuming %d\n",
> +                       ARCH_DMA_MINALIGN);
>  }
>
>  static bool __maybe_unused
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index a6e757cbab77..08232faf38ad 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -987,6 +987,11 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>                         struct iommu_ops *iommu, bool coherent)
>  {
> +       WARN_TAINT_ONCE(!coherent && ARCH_DMA_MINALIGN < cache_line_size(),
> +                       TAINT_CPU_OUT_OF_SPEC,
> +                       "ARCH_DMA_MINALIGN smaller than the Cache Writeback Granule (%d < %d)",
> +                       ARCH_DMA_MINALIGN, cache_line_size());
> +
>         if (!dev->archdata.dma_ops)
>                 dev->archdata.dma_ops = &swiotlb_dma_ops;
>
> -------------------------------8<-----------------------------------
>
> This way we could also reduce L1_CACHE_SHIFT back to 6 as it shouldn't
> cause any correctness issues (potentially performance but so far it
> seems that increasing it made things worse on some SoCs).
>
> The reason for cache_line_size() using ARCH_DMA_MINALIGN instead of
> L1_CACHE_BYTES is that I can see it used in the sl*b code when
> SLAB_HWCACHE_ALIGN is passed and on few occasions for DMA buffers.
>
> --
> Catalin