All of lore.kernel.org
 help / color / mirror / Atom feed
From: l.stach@pengutronix.de (Lucas Stach)
To: linux-arm-kernel@lists.infradead.org
Subject: Clarifying dma_wmb behavior in presence of non-coherent masters and outer caches
Date: Mon, 09 Jul 2018 11:45:46 +0200	[thread overview]
Message-ID: <1531129546.3163.17.camel@pengutronix.de> (raw)
In-Reply-To: <20180706122610.GA6993@arm.com>

Hi Will,

Am Freitag, den 06.07.2018, 13:26 +0100 schrieb Will Deacon:
> Hi Lucas,
> 
> On Mon, Jul 02, 2018 at 06:45:52PM +0100, Will Deacon wrote:
> > On Mon, Jul 02, 2018 at 03:49:26PM +0200, Lucas Stach wrote:
> > > Also the section "External caches", "Before AMBA4 ACE" seems to suggest
> > > that the barriers aren't fully propagated to the PL310 write-caches and
> > > master ports. I'm unsure if this is just an artifact of the mentioned
> > > MMIO access, so handling of normal vs. device memory transactions in
> > > the PL310.
> > 
> > I'll check with the hardware folks about the PL310, but I don't see how
> > you could propagate barriers over AXI3 anyway because I don't think this
> > stuff existed back then.
> 
> Damn. As I feared, you can't propagate the barrier to the PL310, so the
> A9 is unable to enforce ordering beyond the responses it gets back. The
> PL310 will then re-order normal memory accesses (including non-cacheable)
> in its store buffer, so we do actually need a bloody CACHE SYNC here.
> 
> Patch below. Does it help?

It seems the issues we see on the SocFPGA aren't caused by insufficient
barriers, but we are still in the process of tracking down the root
cause.

Aside from this, the patch below seem to match my understanding of the
outer cache issue. I'll give this a spin on i.MX6, as this would
potentially allow us to switch to the dma_* barrier variants in a
number drivers, reducing overhead on systems without an outer cache.

Thanks,
Lucas

> --->8
> 
> From 5a2769b45e94b212d21bc44f5940ed5f3bde5cc2 Mon Sep 17 00:00:00 2001
> > From: Will Deacon <will.deacon@arm.com>
> Date: Wed, 4 Jul 2018 10:56:22 +0100
> Subject: [PATCH] ARM: barrier: Kick outer-cache and SoC fabric in dma_wmb()
> 
> Prior to AMBA4-ACE, the AXI bus protocol had no support for barrier
> transactions or shareability domains and was therefore unable to enforce
> ordering between memory accesses either side of a barrier instruction,
> requiring the CPU to handle all of the ordering itself.
> 
> Unfortunately, IP such as the venerable PL310 will re-order stores to
> Normal memory inside its store-buffer, meaning that accesses to a coherent
> DMA buffer (i.e. one allocated by dma_alloc_coherent()) can be observed
> out-of-order by a non-coherent DMA engine unless the PL310 is kicked with
> a CACHE_SYNC command between buffering the accesses. Whilst this is already
> the case for mb() and wmb(), dma_wmb() just expands to a DMB OSHST, leading
> to sequences such as:
> 
> > 	STR	Rdata, [Rbuf, #DATA_OFFSET]
> > 	DMB	OSHST
> > 	STR	Rvalid, [Rbuf, #VALID_OFFSET]
> 
> being reordered such that the DMA device sees the write of valid but not
> the write of data.
> 
> This patch ensures that the SoC fabric and the outer-cache are kicked
> by dma_wmb() if they are able to buffer transactions outside of the CPU
> but before the Point of Coherency.
> 
> > Reported-by: Lucas Stach <l.stach@pengutronix.de>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
> ?arch/arm/include/asm/barrier.h????| 14 ++++++++++----
> ?arch/arm/include/asm/outercache.h |??6 ++++++
> ?arch/arm/mm/flush.c???????????????| 24 +++++++++++++++++++++++-
> ?3 files changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
> index 69772e742a0a..e31833f72b69 100644
> --- a/arch/arm/include/asm/barrier.h
> +++ b/arch/arm/include/asm/barrier.h
> @@ -53,17 +53,23 @@
> ?#ifdef CONFIG_ARM_HEAVY_MB
> ?extern void (*soc_mb)(void);
> ?extern void arm_heavy_mb(void);
> -#define __arm_heavy_mb(x...) do { dsb(x); arm_heavy_mb(); } while (0)
> +extern void arm_heavy_wmb(void);
> +extern void arm_heavy_dma_wmb(void);
> > +#define __arm_heavy_mb()	arm_heavy_mb()
> > +#define __arm_heavy_wmb()	arm_heavy_wmb()
> > +#define __arm_heavy_dma_wmb()	arm_heavy_dma_wmb()
> ?#else
> -#define __arm_heavy_mb(x...) dsb(x)
> > +#define __arm_heavy_mb()	dsb()
> > +#define __arm_heavy_wmb()	dsb(st)
> > +#define __arm_heavy_dma_wmb()	dmb(oshst)
> ?#endif
> ?
> ?#if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
> > ?#define mb()		__arm_heavy_mb()
> > ?#define rmb()		dsb()
> > -#define wmb()		__arm_heavy_mb(st)
> > +#define wmb()		__arm_heavy_wmb()
> > ?#define dma_rmb()	dmb(osh)
> > -#define dma_wmb()	dmb(oshst)
> > +#define dma_wmb()	__arm_heavy_dma_wmb()
> ?#else
> > ?#define mb()		barrier()
> > ?#define rmb()		barrier()
> diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h
> index c2bf24f40177..e744efe6e2a7 100644
> --- a/arch/arm/include/asm/outercache.h
> +++ b/arch/arm/include/asm/outercache.h
> @@ -43,6 +43,12 @@ struct outer_cache_fns {
> ?
> ?extern struct outer_cache_fns outer_cache;
> ?
> +#ifdef CONFIG_OUTER_CACHE_SYNC
> +static inline bool outer_cache_has_sync(void) { return outer_cache.sync; }
> +#else
> +static inline bool outer_cache_has_sync(void) { return 0; }
> +#endif
> +
> ?#ifdef CONFIG_OUTER_CACHE
> ?/**
> ? * outer_inv_range - invalidate range of outer cache lines
> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> index 58469623b015..1252689362d8 100644
> --- a/arch/arm/mm/flush.c
> +++ b/arch/arm/mm/flush.c
> @@ -24,7 +24,7 @@
> ?#ifdef CONFIG_ARM_HEAVY_MB
> ?void (*soc_mb)(void);
> ?
> -void arm_heavy_mb(void)
> +static void __arm_heavy_sync(void)
> ?{
> ?#ifdef CONFIG_OUTER_CACHE_SYNC
> > ?	if (outer_cache.sync)
> @@ -33,7 +33,29 @@ void arm_heavy_mb(void)
> > ?	if (soc_mb)
> > ?		soc_mb();
> ?}
> +
> +void arm_heavy_mb(void)
> +{
> > +	dsb();
> > +	__arm_heavy_sync();
> +}
> ?EXPORT_SYMBOL(arm_heavy_mb);
> +
> +void arm_heavy_wmb(void)
> +{
> > +	dsb(st);
> > +	__arm_heavy_sync();
> +}
> +EXPORT_SYMBOL(arm_heavy_wmb);
> +
> +void arm_heavy_dma_wmb(void)
> +{
> > +	if (outer_cache_has_sync() || soc_mb)
> > +		arm_heavy_wmb();
> > +	else
> > +		dmb(oshst);
> +}
> +EXPORT_SYMBOL(arm_heavy_dma_wmb);
> ?#endif
> ?
> ?#ifdef CONFIG_CPU_CACHE_VIPT

  parent reply	other threads:[~2018-07-09  9:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29 12:28 Clarifying dma_wmb behavior in presence of non-coherent masters and outer caches Lucas Stach
2018-06-29 14:25 ` Russell King - ARM Linux
2018-06-29 16:19   ` Lucas Stach
2018-06-29 16:22   ` Will Deacon
2018-06-29 16:48     ` Russell King - ARM Linux
2018-06-29 17:43       ` Will Deacon
2018-06-29 18:01         ` Russell King - ARM Linux
2018-07-02 13:49         ` Lucas Stach
2018-07-02 17:45           ` Will Deacon
2018-07-06 12:26             ` Will Deacon
2018-07-09  6:20               ` Oleksij Rempel
2018-09-13 13:17                 ` Will Deacon
2018-09-13 14:09                   ` Oleksij Rempel
2018-07-09  9:45               ` Lucas Stach [this message]
2018-06-29 17:14     ` Lucas Stach
2018-06-29 17:46       ` Will Deacon
2018-07-02  9:58         ` Lucas Stach

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1531129546.3163.17.camel@pengutronix.de \
    --to=l.stach@pengutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.