All of lore.kernel.org
 help / color / mirror / Atom feed
* Data Synchronization Barrier (DSB)
@ 2014-12-03 16:20 Mason
  2014-12-03 20:23 ` Arnd Bergmann
  0 siblings, 1 reply; 2+ messages in thread
From: Mason @ 2014-12-03 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hello everyone,

I have several naive questions about memory barriers.
I've glanced at memory-barriers.txt
https://www.kernel.org/doc/Documentation/memory-barriers.txt

QUESTION 1
Are memory ordering issues and caching orthogonal?
In other words, are memory barriers needed even when accessing
non-cached memory (actual memory or device registers)?

QUESTION 1.1
These days, CPUs feature multiple cores, but they often share
the last level of cache. (Implied assumption: the cores are more
tightly-coupled than yesterday's SMP systems) Do multi-core
systems change the need for memory barriers? (Compared to a
unicore system.)


QUESTION 2

On my platform, the MMIO primitives are aliased to __raw_readl
and __raw_writel.

#define IO_ADDRESS(x)  (0xf0000000 +(x))
#define gbus_read_reg32(r)      __raw_readl((volatile void __iomem *)IO_ADDRESS(r))
#define gbus_write_reg32(r, v)  __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r))

Arnd Bergmann has already pointed out:
"don't use __raw_readl in driver code, use readl or readl_relaxed"

In fact, on ARM platforms, __raw_readl does not insert any memory
barrier (or compiler barrier for that matter, the only constraints
are those imposed by the "volatile" keyword)

static inline u32 __raw_readl(const volatile void __iomem *addr)
{
	u32 val;
	asm volatile("ldr %1, %0"
		     : "+Qo" (*(volatile u32 __force *)addr),
		       "=r" (val));
	return val;
}

If I understand correctly, accessing memory-mapped registers without
using memory barriers can lead to subtle bugs, from memory reordering?
(This part is really unclear for me.)

Should I alias my primitives to ioread32 and iowrite32?

NOTE: iowrite32 calls outer_sync() which seems to have somewhat high
of an overhead. If I'm writing to 4 consecutive MM registers, do I
need to sync after each write?

Regards.



Notes for my own reference...

Comment from barrier.h

/*
  * Force strict CPU ordering. And yes, this is required on UP too when we're
  * talking to devices.
  *
  * Fall back to compiler barriers if nothing better is provided.
  */

/* IO barriers */
#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE /* y */
#include <asm/barrier.h>
#define __iormb()		rmb()
#define __iowmb()		wmb()

#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb()		do { dsb(); outer_sync(); } while (0)
#define rmb()		dsb()
#define wmb()		do { dsb(st); outer_sync(); } while (0)

#if __LINUX_ARM_ARCH__ >= 7
#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")

#ifdef CONFIG_OUTER_CACHE_SYNC
static inline void outer_sync(void)
{
	if (outer_cache.sync)
		outer_cache.sync();
}

static void l2x0_cache_sync(void)
{
	unsigned long flags;

	raw_spin_lock_irqsave(&l2x0_lock, flags);
	cache_sync();
	raw_spin_unlock_irqrestore(&l2x0_lock, flags);
}

static inline void cache_sync(void)
{
	void __iomem *base = l2x0_base;

	writel_relaxed(0, base + sync_reg_offset);
	cache_wait(base + L2X0_CACHE_SYNC, 1);
}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Data Synchronization Barrier (DSB)
  2014-12-03 16:20 Data Synchronization Barrier (DSB) Mason
@ 2014-12-03 20:23 ` Arnd Bergmann
  0 siblings, 0 replies; 2+ messages in thread
From: Arnd Bergmann @ 2014-12-03 20:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 03 December 2014 17:20:26 Mason wrote:
> 
> In fact, on ARM platforms, __raw_readl does not insert any memory
> barrier (or compiler barrier for that matter, the only constraints
> are those imposed by the "volatile" keyword)
> 
> static inline u32 __raw_readl(const volatile void __iomem *addr)
> {
>         u32 val;
>         asm volatile("ldr %1, %0"
>                      : "+Qo" (*(volatile u32 __force *)addr),
>                        "=r" (val));
>         return val;
> }
> 
> If I understand correctly, accessing memory-mapped registers without
> using memory barriers can lead to subtle bugs, from memory reordering?
> (This part is really unclear for me.)

The "asm volatile" makes the compiler emit the accesses in the order
that is given in source code, and we rely on the CPU to send them
to the bus in the same order, which on ARM is enforced through the
page table attributes that ioremap sets.

The barriers are needed only to ensure ordering between MMIO accesses
and memory accesses, in particular memory that is seen by a DMA
bus master device that is controlled using this MMIO.

The classic example for this is writing to a DMA buffer from the
CPU and then telling a device using writel to fetch the data.
Without the barrier, that data may still be in a CPU buffer
by the time that a device reads it.

> Should I alias my primitives to ioread32 and iowrite32?
> 
> NOTE: iowrite32 calls outer_sync() which seems to have somewhat high
> of an overhead. If I'm writing to 4 consecutive MM registers, do I
> need to sync after each write?

I think readl_relaxed() is enough for you in this case, as long as
there are no DMAs.

	Arnd

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-12-03 20:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-03 16:20 Data Synchronization Barrier (DSB) Mason
2014-12-03 20:23 ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.