All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
       [not found]                         ` <alpine.LFD.2.21.1808221609000.26277@eddie.linux-mips.org>
@ 2018-08-22 15:50                           ` Mikulas Patocka
  2018-08-22 16:06                             ` Arnd Bergmann
  0 siblings, 1 reply; 9+ messages in thread
From: Mikulas Patocka @ 2018-08-22 15:50 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Sinan Kaya, Arnd Bergmann, Matt Turner, linux-alpha, okaya,
	Will Deacon, linux-arch, Peter Zijlstra, Thomas Gleixner



On Wed, 22 Aug 2018, Maciej W. Rozycki wrote:

> On Wed, 22 Aug 2018, Sinan Kaya wrote:
> 
> > > It's hard to tell. The Alpha manual says that only overlapping accesses
> > > are ordered.
> > > 
> > > I did some tests on framebuffer and found out that "read+read+write+write"
> > > is faster than "read+write+read+write" - that may suggest that the reads
> > > flush the write queue.
> > 
> > Do you know if the framebuffer BAR you are using is non-prefetchable? (you
> > can find out from lspci)
> > 
> > Ordering rule only applies to non-prefetchable BARs only. Architectures are
> > allowed to do whatever they want for for prefetchable BARs.
> 
>  Well, data accesses have to reach the relevant PCI host bridge first 
> (i.e. leave the CPU and pass through any intermediate bus bridges between 
> the CPU and the PCI bus tree accessed) for any PCI data ordering rules to 
> apply.  Depending on the system architecture this may or may not require 
> OS software intervention.  NB this is a general observation, not specific 
> to Alpha.
> 
>  "Alpha Architecture Handbook" has an extensive discussion on data 
> ordering, concerning both memory and MMIO (termed "memory-like region" and 
> "non-memory-like region" respectively in the said document), and I'll try 
> to get through all of it in the coming days to see if I can get to a 
> conclusion which will let us avoid excessive synchronisation.
> 
>  Meanwhile I'll be happy of course to accept any input backed with 
> suitable references.
> 
>   Maciej

According to the Alpha handbook, non-overlapping accesses may be 
reordered.

So if someone does 
writel(REG1);
readl(REG2);

readl may (according to the spec) reach the device before writel. Although 
actual experiments suggests that the read flushes the queued writes.

I would be quite interested why did Linux developers decide that readl 
should be implemented as "read+barrier" and writel should be implemented 
as "barrier+write". Why is there this assymetry in the barriers?

Does ARM have some hardware magic that prevents reordering the write and 
the read in this case?

Will Deacon made the change to "memory-barriers.txt" to specify this 
requirement - could you please describe why did you specify it this way?

Mikulas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
  2018-08-22 15:50                           ` Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130 Mikulas Patocka
@ 2018-08-22 16:06                             ` Arnd Bergmann
  2018-08-22 17:20                               ` Maciej W. Rozycki
  2018-08-22 17:47                               ` Mikulas Patocka
  0 siblings, 2 replies; 9+ messages in thread
From: Arnd Bergmann @ 2018-08-22 16:06 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Maciej W. Rozycki, Sinan Kaya, Matt Turner, linux-alpha, okaya,
	Will Deacon, linux-arch, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 22, 2018 at 5:50 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> On Wed, 22 Aug 2018, Maciej W. Rozycki wrote:
> > On Wed, 22 Aug 2018, Sinan Kaya wrote:
>
> According to the Alpha handbook, non-overlapping accesses may be
> reordered.
>
> So if someone does
> writel(REG1);
> readl(REG2);
>
> readl may (according to the spec) reach the device before writel. Although
> actual experiments suggests that the read flushes the queued writes.
>
> I would be quite interested why did Linux developers decide that readl
> should be implemented as "read+barrier" and writel should be implemented
> as "barrier+write". Why is there this assymetry in the barriers?

I can explain this part: those two barriers are used specifically do order
an MMIO access against a DMA access: a writel() may be used to start
a DMA operation copying data from RAM to the device, so we must
have a barrier between the store to that data and the store to the register
to ensure the data is visible to the device.
Similarly, a readl() may check the status of a register that tells us when
a DMA from device to RAM has completed. We must have a read
barrier between that mmio load and the load from RAM to prevent
the data to be prefetched while the MMIO is still in progress.

> Does ARM have some hardware magic that prevents reordering the write and
> the read in this case?

Most architecture have this AFAICT, ARM and x86 definitely do, and
PCI requires this to be true on the bus:

All MMIO accesses from a given CPU to a given device (according
to an architecture-specific definition of "device") are ordered with respect
to one another.

If the hardware does not guarantee that, for simple load/store operations
on uncached device memory, then we need a full barrier after each store
in addition to the write barrier needed for the DMA synchronization.

      Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
  2018-08-22 16:06                             ` Arnd Bergmann
@ 2018-08-22 17:20                               ` Maciej W. Rozycki
  2018-08-22 17:47                               ` Mikulas Patocka
  1 sibling, 0 replies; 9+ messages in thread
From: Maciej W. Rozycki @ 2018-08-22 17:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mikulas Patocka, Sinan Kaya, Matt Turner, linux-alpha, okaya,
	Will Deacon, linux-arch, Peter Zijlstra, Thomas Gleixner

On Wed, 22 Aug 2018, Arnd Bergmann wrote:

> > According to the Alpha handbook, non-overlapping accesses may be
> > reordered.

 I have had a notion of this since forever, however I have had troubles 
tracking down the exact reference in the architecture specification.

> > So if someone does
> > writel(REG1);
> > readl(REG2);
> >
> > readl may (according to the spec) reach the device before writel. Although
> > actual experiments suggests that the read flushes the queued writes.

 Individual implementations can surely be more strongly ordered than the 
architecture specification requires.

> > Does ARM have some hardware magic that prevents reordering the write and
> > the read in this case?
> 
> Most architecture have this AFAICT, ARM and x86 definitely do, and
> PCI requires this to be true on the bus:
> 
> All MMIO accesses from a given CPU to a given device (according
> to an architecture-specific definition of "device") are ordered with respect
> to one another.
> 
> If the hardware does not guarantee that, for simple load/store operations
> on uncached device memory, then we need a full barrier after each store
> in addition to the write barrier needed for the DMA synchronization.

 MIPS is architecturally even more weakly ordered and a set of barrier 
instructions has been defined for synchronisation: SYNC for a completion 
barrier, and SYNC_ACQUIRE, SYNC_RELEASE, SYNC_RMB, SYNC_WMB and SYNC_MB 
for various ordering barriers.  Older architecture revisions had this less 
standardised.  Many if not most implementations are more strongly ordered 
though, in which case the relevant SYNC instructions are effectively NOPs.

 I'd expect some other architectures to be similarly weakly ordered.

 FWIW,

  Maciej

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
  2018-08-22 16:06                             ` Arnd Bergmann
  2018-08-22 17:20                               ` Maciej W. Rozycki
@ 2018-08-22 17:47                               ` Mikulas Patocka
  2018-08-22 19:38                                 ` Sinan Kaya
  1 sibling, 1 reply; 9+ messages in thread
From: Mikulas Patocka @ 2018-08-22 17:47 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Maciej W. Rozycki, Sinan Kaya, Matt Turner, linux-alpha, okaya,
	Will Deacon, linux-arch, Peter Zijlstra, Thomas Gleixner



On Wed, 22 Aug 2018, Arnd Bergmann wrote:

> On Wed, Aug 22, 2018 at 5:50 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > On Wed, 22 Aug 2018, Maciej W. Rozycki wrote:
> > > On Wed, 22 Aug 2018, Sinan Kaya wrote:
> >
> > According to the Alpha handbook, non-overlapping accesses may be
> > reordered.
> >
> > So if someone does
> > writel(REG1);
> > readl(REG2);
> >
> > readl may (according to the spec) reach the device before writel. Although
> > actual experiments suggests that the read flushes the queued writes.
> >
> > I would be quite interested why did Linux developers decide that readl
> > should be implemented as "read+barrier" and writel should be implemented
> > as "barrier+write". Why is there this assymetry in the barriers?
> 
> I can explain this part: those two barriers are used specifically do order
> an MMIO access against a DMA access: a writel() may be used to start
> a DMA operation copying data from RAM to the device, so we must
> have a barrier between the store to that data and the store to the register
> to ensure the data is visible to the device.
> Similarly, a readl() may check the status of a register that tells us when
> a DMA from device to RAM has completed. We must have a read
> barrier between that mmio load and the load from RAM to prevent
> the data to be prefetched while the MMIO is still in progress.

Then - the question is - why not just use barriers before and after 
accesses to DMA'd memory? For DMA into non-coheren memory, the barrier 
could be injected into dma_map_* and dma_unmap_* functions (with no change 
in drivers) - and for DMA into coherent memory you could have something 
like dma_coherent_barrier().

Why does Linux add the barriers between every read and write to memory 
mapped registers?

> > Does ARM have some hardware magic that prevents reordering the write and
> > the read in this case?
> 
> Most architecture have this AFAICT, ARM and x86 definitely do, and
> PCI requires this to be true on the bus:
> 
> All MMIO accesses from a given CPU to a given device (according
> to an architecture-specific definition of "device") are ordered with respect
> to one another.

If ARM guarantees that the accesses to a given device are not reordered - 
then the barriers in readl and writel are superfluous.

> If the hardware does not guarantee that, for simple load/store operations
> on uncached device memory, then we need a full barrier after each store
> in addition to the write barrier needed for the DMA synchronization.
> 
>       Arnd

Mikulas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
  2018-08-22 17:47                               ` Mikulas Patocka
@ 2018-08-22 19:38                                 ` Sinan Kaya
  2018-08-22 19:56                                   ` Mikulas Patocka
  0 siblings, 1 reply; 9+ messages in thread
From: Sinan Kaya @ 2018-08-22 19:38 UTC (permalink / raw)
  To: Mikulas Patocka, Arnd Bergmann
  Cc: Maciej W. Rozycki, Matt Turner, linux-alpha, okaya, Will Deacon,
	linux-arch, Peter Zijlstra, Thomas Gleixner

On 8/22/2018 1:47 PM, Mikulas Patocka wrote:
> If ARM guarantees that the accesses to a given device are not reordered -
> then the barriers in readl and writel are superfluous.

It is not. ARM only guarantees ordering of read/write transactions targeting
a device not memory.

example:

write memory
raw write to device

or

raw read from device
read memory

these can bypass each other on ARM unless a barrier is placed in the right
place either via readl()/writel() or explicitly.


raw write to device
raw write to device

or

raw write to device
raw read from device

or

raw read from device
raw read from device

are guaranteed to be ordered on ARM without needing any explicit barrier.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
  2018-08-22 19:38                                 ` Sinan Kaya
@ 2018-08-22 19:56                                   ` Mikulas Patocka
  2018-08-22 20:03                                     ` Will Deacon
  2018-08-22 20:06                                     ` Sinan Kaya
  0 siblings, 2 replies; 9+ messages in thread
From: Mikulas Patocka @ 2018-08-22 19:56 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Arnd Bergmann, Maciej W. Rozycki, Matt Turner, linux-alpha,
	okaya, Will Deacon, linux-arch, Peter Zijlstra, Thomas Gleixner



On Wed, 22 Aug 2018, Sinan Kaya wrote:

> On 8/22/2018 1:47 PM, Mikulas Patocka wrote:
> > If ARM guarantees that the accesses to a given device are not reordered -
> > then the barriers in readl and writel are superfluous.
> 
> It is not. ARM only guarantees ordering of read/write transactions targeting
> a device not memory.
> 
> example:
> 
> write memory
> raw write to device
> 
> or
> 
> raw read from device
> read memory
> 
> these can bypass each other on ARM unless a barrier is placed in the right
> place either via readl()/writel() or explicitly.

Yes - but - why does Linux insert the barriers into readl() and writel() 
instead of inserting them between accesses to registers and memory?

A lot of drivers have long sequences of accesses to memory-mapped 
registers with no interleaving accesses to coherent memory and these 
implicit barriers slow them down with no gain at all.

Mikulas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
  2018-08-22 19:56                                   ` Mikulas Patocka
@ 2018-08-22 20:03                                     ` Will Deacon
  2018-08-22 20:06                                     ` Sinan Kaya
  1 sibling, 0 replies; 9+ messages in thread
From: Will Deacon @ 2018-08-22 20:03 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Sinan Kaya, Arnd Bergmann, Maciej W. Rozycki, Matt Turner,
	linux-alpha, okaya, linux-arch, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 22, 2018 at 03:56:28PM -0400, Mikulas Patocka wrote:
> 
> 
> On Wed, 22 Aug 2018, Sinan Kaya wrote:
> 
> > On 8/22/2018 1:47 PM, Mikulas Patocka wrote:
> > > If ARM guarantees that the accesses to a given device are not reordered -
> > > then the barriers in readl and writel are superfluous.
> > 
> > It is not. ARM only guarantees ordering of read/write transactions targeting
> > a device not memory.
> > 
> > example:
> > 
> > write memory
> > raw write to device
> > 
> > or
> > 
> > raw read from device
> > read memory
> > 
> > these can bypass each other on ARM unless a barrier is placed in the right
> > place either via readl()/writel() or explicitly.
> 
> Yes - but - why does Linux insert the barriers into readl() and writel() 
> instead of inserting them between accesses to registers and memory?
> 
> A lot of drivers have long sequences of accesses to memory-mapped 
> registers with no interleaving accesses to coherent memory and these 
> implicit barriers slow them down with no gain at all.

That's what readX_relaxed and writeX_relaxed are for. There was some
discussion on the lists a way back, and Torvalds was very clear that readX
and writeX should follow the x86 semantics, which have these ordering
guarantees against accesses to memory.

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
  2018-08-22 19:56                                   ` Mikulas Patocka
  2018-08-22 20:03                                     ` Will Deacon
@ 2018-08-22 20:06                                     ` Sinan Kaya
  2018-08-22 20:12                                       ` Will Deacon
  1 sibling, 1 reply; 9+ messages in thread
From: Sinan Kaya @ 2018-08-22 20:06 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arnd Bergmann, Maciej W. Rozycki, Matt Turner, linux-alpha,
	okaya, Will Deacon, linux-arch, Peter Zijlstra, Thomas Gleixner

On 8/22/2018 3:56 PM, Mikulas Patocka wrote:
> 
> 
> On Wed, 22 Aug 2018, Sinan Kaya wrote:
> 
>> On 8/22/2018 1:47 PM, Mikulas Patocka wrote:
>>> If ARM guarantees that the accesses to a given device are not reordered -
>>> then the barriers in readl and writel are superfluous.
>>
>> It is not. ARM only guarantees ordering of read/write transactions targeting
>> a device not memory.
>>
>> example:
>>
>> write memory
>> raw write to device
>>
>> or
>>
>> raw read from device
>> read memory
>>
>> these can bypass each other on ARM unless a barrier is placed in the right
>> place either via readl()/writel() or explicitly.
> 
> Yes - but - why does Linux insert the barriers into readl() and writel()
> instead of inserting them between accesses to registers and memory?
> 
> A lot of drivers have long sequences of accesses to memory-mapped
> registers with no interleaving accesses to coherent memory and these
> implicit barriers slow them down with no gain at all.

It is an abstraction issue. Majority of drivers are developed against x86
and the developers have no idea about the weakly ordered architecture
implications.

Now, Will Deacon added new primitives to address your concern. There are
new APIs as readl_relaxed() and writel_relaxed() as opposed to readl()
and writel().

Relaxed version still guarantee of register accesses with respect to each
other but no guaranteed with respect to memory. Relaxed versions could
be used in performance critical path.

> 
> Mikulas
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130
  2018-08-22 20:06                                     ` Sinan Kaya
@ 2018-08-22 20:12                                       ` Will Deacon
  0 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2018-08-22 20:12 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Mikulas Patocka, Arnd Bergmann, Maciej W. Rozycki, Matt Turner,
	linux-alpha, okaya, linux-arch, Peter Zijlstra, Thomas Gleixner

[sorry, thought I replied on this thread already but my wifi is flakey]

On Wed, Aug 22, 2018 at 04:06:09PM -0400, Sinan Kaya wrote:
> On 8/22/2018 3:56 PM, Mikulas Patocka wrote:
> >
> >
> >On Wed, 22 Aug 2018, Sinan Kaya wrote:
> >
> >>On 8/22/2018 1:47 PM, Mikulas Patocka wrote:
> >>>If ARM guarantees that the accesses to a given device are not reordered -
> >>>then the barriers in readl and writel are superfluous.
> >>
> >>It is not. ARM only guarantees ordering of read/write transactions targeting
> >>a device not memory.
> >>
> >>example:
> >>
> >>write memory
> >>raw write to device
> >>
> >>or
> >>
> >>raw read from device
> >>read memory
> >>
> >>these can bypass each other on ARM unless a barrier is placed in the right
> >>place either via readl()/writel() or explicitly.
> >
> >Yes - but - why does Linux insert the barriers into readl() and writel()
> >instead of inserting them between accesses to registers and memory?
> >
> >A lot of drivers have long sequences of accesses to memory-mapped
> >registers with no interleaving accesses to coherent memory and these
> >implicit barriers slow them down with no gain at all.
> 
> It is an abstraction issue. Majority of drivers are developed against x86
> and the developers have no idea about the weakly ordered architecture
> implications.

Right, and Torvalds was very clear that readX/writeX must follow the x86
semantics here.

> Now, Will Deacon added new primitives to address your concern. There are
> new APIs as readl_relaxed() and writel_relaxed() as opposed to readl()
> and writel().
> 
> Relaxed version still guarantee of register accesses with respect to each
> other but no guaranteed with respect to memory. Relaxed versions could
> be used in performance critical path.

Yes, and the heavy ordering requirements of plain readX/writeX were exactly
what motivated the addition of the _relaxed forms.

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-08-22 23:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <alpine.LRH.2.02.1808161556450.13597@file01.intranet.prod.int.rdu2.redhat.com>
     [not found] ` <CAK8P3a09jqhxYah6SZUjbku3NGiPX2PyhA+jJNG7VzJUnwUZKQ@mail.gmail.com>
     [not found]   ` <alpine.LFD.2.21.1808172254110.26277@eddie.linux-mips.org>
     [not found]     ` <CAK8P3a3kq35zny70Mnzmk9Tzfm2U9DLPNMyBrursPtOHpOyJSw@mail.gmail.com>
     [not found]       ` <28597e7477418ac7cb646e2edb5e6da2@codeaurora.org>
     [not found]         ` <alpine.LRH.2.02.1808201010200.15146@file01.intranet.prod.int.rdu2.redhat.com>
     [not found]           ` <CAK8P3a3ribyvLwXaB=J4dcTwD9aYc64hfw+ORTFtyvOHuS-U0g@mail.gmail.com>
     [not found]             ` <alpine.LRH.2.02.1808201740170.2948@file01.intranet.prod.int.rdu2.redhat.com>
     [not found]               ` <CAK8P3a1E2V-zFN5PpJ868L=6CgTykkyjtF6-aTsCCh6QqryUig@mail.gmail.com>
     [not found]                 ` <alpine.LRH.2.02.1808210814250.24287@file01.intranet.prod.int.rdu2.redhat.com>
     [not found]                   ` <CAK8P3a3vJK1caKpDqkEhMG=8n8N3U6ckqe=0f2fjUJwk-9K0XA@mail.gmail.com>
     [not found]                     ` <alpine.LRH.2.02.1808220743270.12730@file01.intranet.prod.int.rdu2.redhat.com>
     [not found]                       ` <21c0bd37-0ae7-db8f-76b8-6552c30faa4f@codeaurora.org>
     [not found]                         ` <alpine.LFD.2.21.1808221609000.26277@eddie.linux-mips.org>
2018-08-22 15:50                           ` Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130 Mikulas Patocka
2018-08-22 16:06                             ` Arnd Bergmann
2018-08-22 17:20                               ` Maciej W. Rozycki
2018-08-22 17:47                               ` Mikulas Patocka
2018-08-22 19:38                                 ` Sinan Kaya
2018-08-22 19:56                                   ` Mikulas Patocka
2018-08-22 20:03                                     ` Will Deacon
2018-08-22 20:06                                     ` Sinan Kaya
2018-08-22 20:12                                       ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.