linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] alpha: add a delay between RTC port write and read
@ 2020-05-06 11:21 Mikulas Patocka
  2020-05-06 14:20 ` Arnd Bergmann
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-06 11:21 UTC (permalink / raw)
  To: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman
  Cc: linux-alpha, Sinan Kaya, Arnd Bergmann, linux-serial, linux-rtc

The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder
barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on
the Alpha Avanti platform.

The patch changes timing between accesses to the ISA bus, in particular,
it reduces the time between "write" access and a subsequent "read" access.

This causes lock-up when accessing the real time clock and serial ports.

This patch fixes the real time clock by adding a small delay between
outb_p and inb_p.

Note that we don't have to add the delay to CMOS_WRITE, because it
consists of two write accesses and they already have mb() between them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org	# v4.17+

---
 arch/alpha/include/asm/mc146818rtc.h |    4 ++++
 1 file changed, 4 insertions(+)

Index: linux-stable/arch/alpha/include/asm/mc146818rtc.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/mc146818rtc.h	2020-05-05 20:48:30.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/mc146818rtc.h	2020-05-05 21:05:53.000000000 +0200
@@ -15,9 +15,13 @@
 /*
  * The yet supported machines all access the RTC index register via
  * an ISA port access but the way to access the date register differs ...
+ *
+ * The ISA bus on Alpha Avanti doesn't like back-to-back accesses,
+ * we need to add a small delay.
  */
 #define CMOS_READ(addr) ({ \
 outb_p((addr),RTC_PORT(0)); \
+udelay(2); \
 inb_p(RTC_PORT(1)); \
 })
 #define CMOS_WRITE(val, addr) ({ \


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] alpha: add a delay between RTC port write and read
  2020-05-06 11:21 [PATCH 1/2] alpha: add a delay between RTC port write and read Mikulas Patocka
@ 2020-05-06 14:20 ` Arnd Bergmann
  2020-05-06 17:12   ` [PATCH 1/2 v2] alpha: add a delay to inb_p, inb_w and inb_l Mikulas Patocka
  0 siblings, 1 reply; 45+ messages in thread
From: Arnd Bergmann @ 2020-05-06 14:20 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, Sinan Kaya, linux-serial, linux-rtc

On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote:

>  /*
>   * The yet supported machines all access the RTC index register via
>   * an ISA port access but the way to access the date register differs ...
> + *
> + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses,
> + * we need to add a small delay.
>   */
>  #define CMOS_READ(addr) ({ \
>  outb_p((addr),RTC_PORT(0)); \
> +udelay(2); \
>  inb_p(RTC_PORT(1)); \


The inb_p() / outb_p() functions are meant to already have a delay in them,
maybe we should just add it there for alpha?

     Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/2 v2] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-06 14:20 ` Arnd Bergmann
@ 2020-05-06 17:12   ` Mikulas Patocka
  2020-05-07  8:06     ` [PATCH 1/2 v3] " Mikulas Patocka
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-06 17:12 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, Sinan Kaya, linux-serial, linux-rtc



On Wed, 6 May 2020, Arnd Bergmann wrote:

> On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> >  /*
> >   * The yet supported machines all access the RTC index register via
> >   * an ISA port access but the way to access the date register differs ...
> > + *
> > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses,
> > + * we need to add a small delay.
> >   */
> >  #define CMOS_READ(addr) ({ \
> >  outb_p((addr),RTC_PORT(0)); \
> > +udelay(2); \
> >  inb_p(RTC_PORT(1)); \
> 
> 
> The inb_p() / outb_p() functions are meant to already have a delay in them,
> maybe we should just add it there for alpha?
> 
>      Arnd

Yes, that is possible too - it fixes the real time clock hang for me.



From: Mikulas Patocka <mpatocka@redhat.com>

The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder
barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on
the Alpha Avanti platform.

The patch changes timing between accesses to the ISA bus, in particular,
it reduces the time between "write" access and a subsequent "read" access.

This causes lock-up when accessing the real time clock and serial ports.

This patch fixes the real time clock by adding a small delay to the inb_p,
inw_p and inl_p macros.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org	# v4.17+

---
 arch/alpha/include/asm/io.h |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-06 08:23:47.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-06 18:33:47.000000000 +0200
@@ -6,6 +6,7 @@
 
 #include <linux/kernel.h>
 #include <linux/mm.h>
+#include <linux/delay.h>
 #include <asm/compiler.h>
 #include <asm/pgtable.h>
 #include <asm/machvec.h>
@@ -481,9 +482,9 @@ extern inline void writeq(u64 b, volatil
 #define iowrite16be(v,p) iowrite16(cpu_to_be16(v), (p))
 #define iowrite32be(v,p) iowrite32(cpu_to_be32(v), (p))
 
-#define inb_p		inb
-#define inw_p		inw
-#define inl_p		inl
+#define inb_p(x)	(ndelay(300), inb(x))
+#define inw_p(x)	(ndelay(300), inw(x))
+#define inl_p(x)	(ndelay(300), inl(x))
 #define outb_p		outb
 #define outw_p		outw
 #define outl_p		outl


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-06 17:12   ` [PATCH 1/2 v2] alpha: add a delay to inb_p, inb_w and inb_l Mikulas Patocka
@ 2020-05-07  8:06     ` Mikulas Patocka
  2020-05-07  8:20       ` Greg Kroah-Hartman
  2020-05-07 13:30       ` Arnd Bergmann
  0 siblings, 2 replies; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-07  8:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, Sinan Kaya, linux-serial, linux-rtc



On Wed, 6 May 2020, Mikulas Patocka wrote:

> 
> 
> On Wed, 6 May 2020, Arnd Bergmann wrote:
> 
> > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > >  /*
> > >   * The yet supported machines all access the RTC index register via
> > >   * an ISA port access but the way to access the date register differs ...
> > > + *
> > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses,
> > > + * we need to add a small delay.
> > >   */
> > >  #define CMOS_READ(addr) ({ \
> > >  outb_p((addr),RTC_PORT(0)); \
> > > +udelay(2); \
> > >  inb_p(RTC_PORT(1)); \
> > 
> > 
> > The inb_p() / outb_p() functions are meant to already have a delay in them,
> > maybe we should just add it there for alpha?
> > 
> >      Arnd
> 
> Yes, that is possible too - it fixes the real time clock hang for me.
> 
> 
> -#define inb_p		inb
> -#define inw_p		inw
> -#define inl_p		inl
> +#define inb_p(x)	(ndelay(300), inb(x))
> +#define inw_p(x)	(ndelay(300), inw(x))
> +#define inl_p(x)	(ndelay(300), inl(x))
>  #define outb_p		outb
>  #define outw_p		outw
>  #define outl_p		outl

300ns was too low. We need at least 1400ns to fix the hang reliably.

Mikulas


From: Mikulas Patocka <mpatocka@redhat.com>

The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder
barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on
the Alpha Avanti platform.

The patch changes timing between accesses to the ISA bus, in particular,
it reduces the time between "write" access and a subsequent "read" access.

This causes lock-up when accessing the real time clock and serial ports.

This patch fixes the real time clock by adding a small delay to the inb_p,
inw_p and inl_p macros.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org	# v4.17+

---
 arch/alpha/include/asm/io.h |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-07 09:54:52.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-07 09:54:52.000000000 +0200
@@ -6,6 +6,7 @@
 
 #include <linux/kernel.h>
 #include <linux/mm.h>
+#include <linux/delay.h>
 #include <asm/compiler.h>
 #include <asm/pgtable.h>
 #include <asm/machvec.h>
@@ -481,9 +482,9 @@ extern inline void writeq(u64 b, volatil
 #define iowrite16be(v,p) iowrite16(cpu_to_be16(v), (p))
 #define iowrite32be(v,p) iowrite32(cpu_to_be32(v), (p))
 
-#define inb_p		inb
-#define inw_p		inw
-#define inl_p		inl
+#define inb_p(x)	(ndelay(1400), inb(x))
+#define inw_p(x)	(ndelay(1400), inw(x))
+#define inl_p(x)	(ndelay(1400), inl(x))
 #define outb_p		outb
 #define outw_p		outw
 #define outl_p		outl


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-07  8:06     ` [PATCH 1/2 v3] " Mikulas Patocka
@ 2020-05-07  8:20       ` Greg Kroah-Hartman
  2020-05-07 10:53         ` Mikulas Patocka
  2020-05-07 13:30       ` Arnd Bergmann
  1 sibling, 1 reply; 45+ messages in thread
From: Greg Kroah-Hartman @ 2020-05-07  8:20 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	alpha, Sinan Kaya, linux-serial, linux-rtc

On Thu, May 07, 2020 at 04:06:31AM -0400, Mikulas Patocka wrote:
> 
> 
> On Wed, 6 May 2020, Mikulas Patocka wrote:
> 
> > 
> > 
> > On Wed, 6 May 2020, Arnd Bergmann wrote:
> > 
> > > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > 
> > > >  /*
> > > >   * The yet supported machines all access the RTC index register via
> > > >   * an ISA port access but the way to access the date register differs ...
> > > > + *
> > > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses,
> > > > + * we need to add a small delay.
> > > >   */
> > > >  #define CMOS_READ(addr) ({ \
> > > >  outb_p((addr),RTC_PORT(0)); \
> > > > +udelay(2); \
> > > >  inb_p(RTC_PORT(1)); \
> > > 
> > > 
> > > The inb_p() / outb_p() functions are meant to already have a delay in them,
> > > maybe we should just add it there for alpha?
> > > 
> > >      Arnd
> > 
> > Yes, that is possible too - it fixes the real time clock hang for me.
> > 
> > 
> > -#define inb_p		inb
> > -#define inw_p		inw
> > -#define inl_p		inl
> > +#define inb_p(x)	(ndelay(300), inb(x))
> > +#define inw_p(x)	(ndelay(300), inw(x))
> > +#define inl_p(x)	(ndelay(300), inl(x))
> >  #define outb_p		outb
> >  #define outw_p		outw
> >  #define outl_p		outl
> 
> 300ns was too low. We need at least 1400ns to fix the hang reliably.
> 
> Mikulas
> 
> 
> From: Mikulas Patocka <mpatocka@redhat.com>
> 
> The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder
> barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on
> the Alpha Avanti platform.
> 
> The patch changes timing between accesses to the ISA bus, in particular,
> it reduces the time between "write" access and a subsequent "read" access.
> 
> This causes lock-up when accessing the real time clock and serial ports.
> 
> This patch fixes the real time clock by adding a small delay to the inb_p,
> inw_p and inl_p macros.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> Cc: stable@vger.kernel.org	# v4.17+
> 
> ---
>  arch/alpha/include/asm/io.h |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

<snip>

This is not in a format that anyone can apply it in, please resend in a
proper way if you wish for it to be accepted.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-07  8:20       ` Greg Kroah-Hartman
@ 2020-05-07 10:53         ` Mikulas Patocka
  0 siblings, 0 replies; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-07 10:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	alpha, linux-serial, linux-rtc

alpha: add a delay to inb_p, inb_w and inb_l

The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder
barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on
the Alpha Avanti platform.

The patch changes timing between accesses to the ISA bus, in particular,
it reduces the time between "write" access and a subsequent "read" access.

This causes lock-up when accessing the real time clock and serial ports.

This patch fixes the real time clock by adding a small delay to the inb_p,
inw_p and inl_p macros. Note that we don't have to add a delay before
outb_p, outw_p and outl_p, because there is already a "mb()" instruction
before them and this instruction slows down access sufficiently.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org	# v4.17+

---
 arch/alpha/include/asm/io.h |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-07 09:54:52.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-07 09:54:52.000000000 +0200
@@ -6,6 +6,7 @@
 
 #include <linux/kernel.h>
 #include <linux/mm.h>
+#include <linux/delay.h>
 #include <asm/compiler.h>
 #include <asm/pgtable.h>
 #include <asm/machvec.h>
@@ -481,9 +482,9 @@ extern inline void writeq(u64 b, volatil
 #define iowrite16be(v,p) iowrite16(cpu_to_be16(v), (p))
 #define iowrite32be(v,p) iowrite32(cpu_to_be32(v), (p))
 
-#define inb_p		inb
-#define inw_p		inw
-#define inl_p		inl
+#define inb_p(x)	(ndelay(1400), inb(x))
+#define inw_p(x)	(ndelay(1400), inw(x))
+#define inl_p(x)	(ndelay(1400), inl(x))
 #define outb_p		outb
 #define outw_p		outw
 #define outl_p		outl


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-07  8:06     ` [PATCH 1/2 v3] " Mikulas Patocka
  2020-05-07  8:20       ` Greg Kroah-Hartman
@ 2020-05-07 13:30       ` Arnd Bergmann
  2020-05-07 14:09         ` Mikulas Patocka
  1 sibling, 1 reply; 45+ messages in thread
From: Arnd Bergmann @ 2020-05-07 13:30 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, Sinan Kaya, linux-serial, linux-rtc

On Thu, May 7, 2020 at 10:06 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> On Wed, 6 May 2020, Mikulas Patocka wrote:
> > On Wed, 6 May 2020, Arnd Bergmann wrote:
> > > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > >  /*
> > > >   * The yet supported machines all access the RTC index register via
> > > >   * an ISA port access but the way to access the date register differs ...
> > > > + *
> > > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses,
> > > > + * we need to add a small delay.
> > > >   */
> > > >  #define CMOS_READ(addr) ({ \
> > > >  outb_p((addr),RTC_PORT(0)); \
> > > > +udelay(2); \
> > > >  inb_p(RTC_PORT(1)); \
> > >
> > >
> > > The inb_p() / outb_p() functions are meant to already have a delay in them,
> > > maybe we should just add it there for alpha?
> > >
> > >      Arnd
> >
> > Yes, that is possible too - it fixes the real time clock hang for me.
> >
> >
> > -#define inb_p                inb
> > -#define inw_p                inw
> > -#define inl_p                inl
> > +#define inb_p(x)     (ndelay(300), inb(x))
> > +#define inw_p(x)     (ndelay(300), inw(x))
> > +#define inl_p(x)     (ndelay(300), inl(x))
> >  #define outb_p               outb
> >  #define outw_p               outw
> >  #define outl_p               outl
>
> 300ns was too low. We need at least 1400ns to fix the hang reliably.

Are you sure that it is in fact the timing that is important here and not
a barrier? I see that inb() is written in terms of readb(), but the
barrier requirements for I/O space are a bit different from those
on PCI memory space.

In the example you gave first, there is a an outb_p() followed by inb_p().
These are normally serialized by the bus, but I/O space also has the
requirement that an outb() completes before we get to the next
instruction (non-posted write), while writeb() is generally posted and
only needs a barrier before the write rather than both before and after
like outb.

    Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-07 13:30       ` Arnd Bergmann
@ 2020-05-07 14:09         ` Mikulas Patocka
  2020-05-07 15:08           ` Arnd Bergmann
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-07 14:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc



On Thu, 7 May 2020, Arnd Bergmann wrote:

> On Thu, May 7, 2020 at 10:06 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > On Wed, 6 May 2020, Mikulas Patocka wrote:
> > > On Wed, 6 May 2020, Arnd Bergmann wrote:
> > > > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > >
> > > > >  /*
> > > > >   * The yet supported machines all access the RTC index register via
> > > > >   * an ISA port access but the way to access the date register differs ...
> > > > > + *
> > > > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses,
> > > > > + * we need to add a small delay.
> > > > >   */
> > > > >  #define CMOS_READ(addr) ({ \
> > > > >  outb_p((addr),RTC_PORT(0)); \
> > > > > +udelay(2); \
> > > > >  inb_p(RTC_PORT(1)); \
> > > >
> > > >
> > > > The inb_p() / outb_p() functions are meant to already have a delay in them,
> > > > maybe we should just add it there for alpha?
> > > >
> > > >      Arnd
> > >
> > > Yes, that is possible too - it fixes the real time clock hang for me.
> > >
> > >
> > > -#define inb_p                inb
> > > -#define inw_p                inw
> > > -#define inl_p                inl
> > > +#define inb_p(x)     (ndelay(300), inb(x))
> > > +#define inw_p(x)     (ndelay(300), inw(x))
> > > +#define inl_p(x)     (ndelay(300), inl(x))
> > >  #define outb_p               outb
> > >  #define outw_p               outw
> > >  #define outl_p               outl
> >
> > 300ns was too low. We need at least 1400ns to fix the hang reliably.
> 
> Are you sure that it is in fact the timing that is important here and not
> a barrier? I see that inb() is written in terms of readb(), but the
> barrier requirements for I/O space are a bit different from those
> on PCI memory space.

The "in" and "out" instructions are serializing on x86. But alpha doesn't 
have dedicated instructions for accessing ports.

Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should 
be protected by two memory barriers, to emulate the x86 behavior?

> In the example you gave first, there is a an outb_p() followed by inb_p().
> These are normally serialized by the bus, but I/O space also has the
> requirement that an outb() completes before we get to the next
> instruction (non-posted write), while writeb() is generally posted and
> only needs a barrier before the write rather than both before and after
> like outb.
> 
>     Arnd

I think that the fact that "writeb" is posted is exactly the problem - it 
gets posted, the processor goes on, sends "readb" and they arrive 
back-to-back to the ISA bus. The ISA bus device doesn't like back-to-back 
accesses and locks up.

Anyway - you can change the "ndelay()" function in this patch to "mb()" - 
"mb()" will provide long enough delay that it fixes this bug.

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-07 14:09         ` Mikulas Patocka
@ 2020-05-07 15:08           ` Arnd Bergmann
  2020-05-07 15:45             ` Mikulas Patocka
                               ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Arnd Bergmann @ 2020-05-07 15:08 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

On Thu, May 7, 2020 at 4:09 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> On Thu, 7 May 2020, Arnd Bergmann wrote:
> > On Thu, May 7, 2020 at 10:06 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> > Are you sure that it is in fact the timing that is important here and not
> > a barrier? I see that inb() is written in terms of readb(), but the
> > barrier requirements for I/O space are a bit different from those
> > on PCI memory space.
>
> The "in" and "out" instructions are serializing on x86. But alpha doesn't
> have dedicated instructions for accessing ports.
>
> Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should
> be protected by two memory barriers, to emulate the x86 behavior?

That's what we do on some other architectures to emulate the non-posted
behavior of out[bwl], as required by PCI. I can't think of any reasons to
have a barrier before in[bwl], or after write[bwl], but we generally want
one after out[bwl]

> > In the example you gave first, there is a an outb_p() followed by inb_p().
> > These are normally serialized by the bus, but I/O space also has the
> > requirement that an outb() completes before we get to the next
> > instruction (non-posted write), while writeb() is generally posted and
> > only needs a barrier before the write rather than both before and after
> > like outb.
>
> I think that the fact that "writeb" is posted is exactly the problem - it
> gets posted, the processor goes on, sends "readb" and they arrive
> back-to-back to the ISA bus. The ISA bus device doesn't like back-to-back
> accesses and locks up.
>
> Anyway - you can change the "ndelay()" function in this patch to "mb()" -
> "mb()" will provide long enough delay that it fixes this bug.

My preference would be to have whatever makes most sense in theory
and also fixes the problem. If there is some documentation that
says you need a certain amount of time between accesses regardless
of the barriers, then that is fine. I do wonder if there is anything
enforcing the "rpcc" in _delay() to come after the store if there is no
barrier between the two, otherwise the delay method still seems
unreliable.

The barrier after the store at least makes sense to me based on
the theory, both with and without a delay in outb_p().

      Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-07 15:08           ` Arnd Bergmann
@ 2020-05-07 15:45             ` Mikulas Patocka
  2020-05-07 15:46             ` [PATCH v4] alpha: add a barrier after outb, outw and outl Mikulas Patocka
  2020-05-10  1:25             ` [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l Maciej W. Rozycki
  2 siblings, 0 replies; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-07 15:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc



On Thu, 7 May 2020, Arnd Bergmann wrote:

> On Thu, May 7, 2020 at 4:09 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > On Thu, 7 May 2020, Arnd Bergmann wrote:
> > > On Thu, May 7, 2020 at 10:06 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > Are you sure that it is in fact the timing that is important here and not
> > > a barrier? I see that inb() is written in terms of readb(), but the
> > > barrier requirements for I/O space are a bit different from those
> > > on PCI memory space.
> >
> > The "in" and "out" instructions are serializing on x86. But alpha doesn't
> > have dedicated instructions for accessing ports.
> >
> > Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should
> > be protected by two memory barriers, to emulate the x86 behavior?
> 
> That's what we do on some other architectures to emulate the non-posted
> behavior of out[bwl], as required by PCI. I can't think of any reasons to
> have a barrier before in[bwl], or after write[bwl], but we generally want
> one after out[bwl]

Yes - so we can add a barrier after out[bwl]. It also fixes the serial 
port issue, so we no longer need the serial driver patch for Greg.

> > > In the example you gave first, there is a an outb_p() followed by inb_p().
> > > These are normally serialized by the bus, but I/O space also has the
> > > requirement that an outb() completes before we get to the next
> > > instruction (non-posted write), while writeb() is generally posted and
> > > only needs a barrier before the write rather than both before and after
> > > like outb.
> >
> > I think that the fact that "writeb" is posted is exactly the problem - it
> > gets posted, the processor goes on, sends "readb" and they arrive
> > back-to-back to the ISA bus. The ISA bus device doesn't like back-to-back
> > accesses and locks up.
> >
> > Anyway - you can change the "ndelay()" function in this patch to "mb()" -
> > "mb()" will provide long enough delay that it fixes this bug.
> 
> My preference would be to have whatever makes most sense in theory
> and also fixes the problem. If there is some documentation that
> says you need a certain amount of time between accesses regardless
> of the barriers, then that is fine. I do wonder if there is anything
> enforcing the "rpcc" in _delay() to come after the store if there is no
> barrier between the two, otherwise the delay method still seems
> unreliable.

I measured ndelay - and the overhead of the instruction rpcc is already 
very high. ndelay(1) takes 300ns.

> The barrier after the store at least makes sense to me based on
> the theory, both with and without a delay in outb_p().
> 
>       Arnd

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4] alpha: add a barrier after outb, outw and outl
  2020-05-07 15:08           ` Arnd Bergmann
  2020-05-07 15:45             ` Mikulas Patocka
@ 2020-05-07 15:46             ` Mikulas Patocka
  2020-05-07 19:12               ` Arnd Bergmann
  2020-05-10  1:25             ` [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l Maciej W. Rozycki
  2 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-07 15:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder
barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on
the Alpha Avanti platform.

The patch changes timing between accesses to the ISA bus, in particular,
it reduces the time between "write" access and a subsequent "read" access.

This causes lock-up when accessing the real time clock and serial ports.

This patch fixes the bug by adding a memory barrier after the functions
that access the ISA ports - outb, outw, outl. The barrier causes that
there is some delay between the write to an IO port and a subsequent read.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org	# v4.17+

---
 arch/alpha/include/asm/io.h |    3 +++
 arch/alpha/kernel/io.c      |    3 +++
 2 files changed, 6 insertions(+)

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-07 17:36:58.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-07 17:36:58.000000000 +0200
@@ -347,11 +347,13 @@ extern inline u16 inw(unsigned long port
 extern inline void outb(u8 b, unsigned long port)
 {
 	iowrite8(b, ioport_map(port, 1));
+	mb();
 }
 
 extern inline void outw(u16 b, unsigned long port)
 {
 	iowrite16(b, ioport_map(port, 2));
+	mb();
 }
 #endif
 
@@ -377,6 +379,7 @@ extern inline u32 inl(unsigned long port
 extern inline void outl(u32 b, unsigned long port)
 {
 	iowrite32(b, ioport_map(port, 4));
+	mb();
 }
 #endif
 
Index: linux-stable/arch/alpha/kernel/io.c
===================================================================
--- linux-stable.orig/arch/alpha/kernel/io.c	2020-05-07 17:36:58.000000000 +0200
+++ linux-stable/arch/alpha/kernel/io.c	2020-05-07 17:36:58.000000000 +0200
@@ -78,16 +78,19 @@ u32 inl(unsigned long port)
 void outb(u8 b, unsigned long port)
 {
 	iowrite8(b, ioport_map(port, 1));
+	mb();
 }
 
 void outw(u16 b, unsigned long port)
 {
 	iowrite16(b, ioport_map(port, 2));
+	mb();
 }
 
 void outl(u32 b, unsigned long port)
 {
 	iowrite32(b, ioport_map(port, 4));
+	mb();
 }
 
 EXPORT_SYMBOL(inb);


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] alpha: add a barrier after outb, outw and outl
  2020-05-07 15:46             ` [PATCH v4] alpha: add a barrier after outb, outw and outl Mikulas Patocka
@ 2020-05-07 19:12               ` Arnd Bergmann
  2020-05-10  1:27                 ` Maciej W. Rozycki
  0 siblings, 1 reply; 45+ messages in thread
From: Arnd Bergmann @ 2020-05-07 19:12 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

On Thu, May 7, 2020 at 5:46 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder
> barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on
> the Alpha Avanti platform.
>
> The patch changes timing between accesses to the ISA bus, in particular,
> it reduces the time between "write" access and a subsequent "read" access.
>
> This causes lock-up when accessing the real time clock and serial ports.
>
> This patch fixes the bug by adding a memory barrier after the functions
> that access the ISA ports - outb, outw, outl. The barrier causes that
> there is some delay between the write to an IO port and a subsequent read.

Based on your earlier explanations, I would mention here that the barrier
avoids the back-to-back I/O instructions on the bus that seem to be causing
the problem. As I understand it (having very little alpha specific knowledge),
they should prevent them by design. However if you are sure it's just the
added delay rather than any actual barrier effect, that would also be worth
pointing out.

> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> Cc: stable@vger.kernel.org      # v4.17+

With or without any further clarification

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-07 15:08           ` Arnd Bergmann
  2020-05-07 15:45             ` Mikulas Patocka
  2020-05-07 15:46             ` [PATCH v4] alpha: add a barrier after outb, outw and outl Mikulas Patocka
@ 2020-05-10  1:25             ` Maciej W. Rozycki
  2020-05-10 18:50               ` Mikulas Patocka
  2 siblings, 1 reply; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-10  1:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mikulas Patocka, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

On Thu, 7 May 2020, Arnd Bergmann wrote:

> > Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should
> > be protected by two memory barriers, to emulate the x86 behavior?
> 
> That's what we do on some other architectures to emulate the non-posted
> behavior of out[bwl], as required by PCI. I can't think of any reasons to
> have a barrier before in[bwl], or after write[bwl], but we generally want
> one after out[bwl]

 Alpha is weakly ordered, also WRT MMIO.  The details are a bit obscure 
(and were discussed before in a previous iteration of these patches), but 
my understanding is multiple writes can be merged and writes can be 
reordered WRT reads, even on UP.  It's generally better for performance to 
have ordering barriers before MMIO operations rather than afterwards, 
unless a completion barrier is also required (e.g. for level-triggered 
interrupt acknowledgement).

 Currently we don't fully guarantee that `outX' won't be posted (from 
memory-barriers.txt):

" (*) inX(), outX():
[...]
        Device drivers may expect outX() to emit a non-posted write transaction
        that waits for a completion response from the I/O peripheral before
        returning. This is not guaranteed by all architectures and is therefore
        not part of the portable ordering semantics."

  Maciej



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] alpha: add a barrier after outb, outw and outl
  2020-05-07 19:12               ` Arnd Bergmann
@ 2020-05-10  1:27                 ` Maciej W. Rozycki
  0 siblings, 0 replies; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-10  1:27 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mikulas Patocka, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

On Thu, 7 May 2020, Arnd Bergmann wrote:

> > The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder
> > barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on
> > the Alpha Avanti platform.
> >
> > The patch changes timing between accesses to the ISA bus, in particular,
> > it reduces the time between "write" access and a subsequent "read" access.
> >
> > This causes lock-up when accessing the real time clock and serial ports.
> >
> > This patch fixes the bug by adding a memory barrier after the functions
> > that access the ISA ports - outb, outw, outl. The barrier causes that
> > there is some delay between the write to an IO port and a subsequent read.

 There is no delay guarantee, just an in-order completion guarantee:

#define mb()	__asm__ __volatile__("mb": : :"memory")

MB is a hardware memory barrier instruction.

> Based on your earlier explanations, I would mention here that the barrier
> avoids the back-to-back I/O instructions on the bus that seem to be causing
> the problem. As I understand it (having very little alpha specific knowledge),
> they should prevent them by design. However if you are sure it's just the
> added delay rather than any actual barrier effect, that would also be worth
> pointing out.

 Alpha is weakly ordered, also WRT MMIO.  Writing a simple test program to 
poke directly (e.g. using `ioremap' and then inline asm on the location 
obtained) at RTC and UART registers would be a good way to determine what 
is really going on here.

  Maciej



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-10  1:25             ` [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l Maciej W. Rozycki
@ 2020-05-10 18:50               ` Mikulas Patocka
  2020-05-11 14:58                 ` Maciej W. Rozycki
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-10 18:50 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc



On Sun, 10 May 2020, Maciej W. Rozycki wrote:

> On Thu, 7 May 2020, Arnd Bergmann wrote:
> 
> > > Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should
> > > be protected by two memory barriers, to emulate the x86 behavior?
> > 
> > That's what we do on some other architectures to emulate the non-posted
> > behavior of out[bwl], as required by PCI. I can't think of any reasons to
> > have a barrier before in[bwl], or after write[bwl], but we generally want
> > one after out[bwl]
> 
>  Alpha is weakly ordered, also WRT MMIO.  The details are a bit obscure 
> (and were discussed before in a previous iteration of these patches), but 
> my understanding is multiple writes can be merged and writes can be 
> reordered WRT reads, even on UP.  It's generally better for performance to 

We discussed it some times ago, and the conclusion was that reads and 
writes to the same device are not reordered on Alpha. Reads and writes to 
different devices or to memory may be reordered.

In these problematic cases, we only access serial port or real time clock 
using a few ports (and these devices don't have DMA, so there's not any 
interaction with memory) - so I conclude that it is timing problem and not 
I/O reordering problem.

> have ordering barriers before MMIO operations rather than afterwards, 
> unless a completion barrier is also required (e.g. for level-triggered 
> interrupt acknowledgement).
> 
>  Currently we don't fully guarantee that `outX' won't be posted (from 
> memory-barriers.txt):
> 
> " (*) inX(), outX():
> [...]
>         Device drivers may expect outX() to emit a non-posted write transaction
>         that waits for a completion response from the I/O peripheral before
>         returning. This is not guaranteed by all architectures and is therefore
>         not part of the portable ordering semantics."
> 
>   Maciej

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-10 18:50               ` Mikulas Patocka
@ 2020-05-11 14:58                 ` Maciej W. Rozycki
  2020-05-12 19:35                   ` Mikulas Patocka
  2020-05-13 14:41                   ` Ivan Kokshaysky
  0 siblings, 2 replies; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-11 14:58 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

On Sun, 10 May 2020, Mikulas Patocka wrote:

> > > That's what we do on some other architectures to emulate the non-posted
> > > behavior of out[bwl], as required by PCI. I can't think of any reasons to
> > > have a barrier before in[bwl], or after write[bwl], but we generally want
> > > one after out[bwl]
> > 
> >  Alpha is weakly ordered, also WRT MMIO.  The details are a bit obscure 
> > (and were discussed before in a previous iteration of these patches), but 
> > my understanding is multiple writes can be merged and writes can be 
> > reordered WRT reads, even on UP.  It's generally better for performance to 
> 
> We discussed it some times ago, and the conclusion was that reads and 
> writes to the same device are not reordered on Alpha. Reads and writes to 
> different devices or to memory may be reordered.

 Except that "device" in this context is a particular MMIO location; most 
peripherals span multiple locations, including the RTC and the UART in the 
Avanti line in particular.

> In these problematic cases, we only access serial port or real time clock 
> using a few ports (and these devices don't have DMA, so there's not any 
> interaction with memory) - so I conclude that it is timing problem and not 
> I/O reordering problem.

 Individual PCI port locations correspond to different MMIO locations, so 
yes, accesses to these can be reordered (merging won't happen due to the 
use of the sparse address space).

 As I noted using a small program to verify actual behaviour ought to 
reveal what the problem really is.  And /dev/mem can be mmapped for PCI 
port I/O access on Alpha (some X-servers do this), so it can be done even 
in the userland with a running system.

 And if timing is indeed the culprit, then I think it will be best fixed 
in the 82378IB southbridge, i.e.[1]:

"The I/O recovery mechanism in the SIO is used to add additional recovery 
delay between PCI originated 8-bit and 16-bit I/O cycles to the ISA Bus.  
The SIO automatically forces a minimum delay of four SYSCLKs between 
back-to-back 8 and 16 bit I/O cycles to the ISA Bus.  The delay is 
measured from the rising edge of the I/O command (IOR# or IOW#) to the 
falling edge of the next BALE.  If a delay of greater than four SYSCLKs is 
required, the ISA I/O Recovery Time Register can be programmed to increase 
the delay in increments of SYSCLKs.  Note that no additional delay is 
inserted for back-to-back I/O "sub cycles" generated as a result of byte 
assembly or disassembly.  This register defaults to 8 and 16-bit recovery 
enabled with two clocks added to the standard I/O recovery."

where it won't be causing unnecessary overhead for native PCI devices or 
indeed excessive one for ISA devices.  It might be interesting to note 
that later SIO versions like the 82378ZB increased the minimum to five 
SYSCLKs, so maybe a missing SYSCLK (that can still be inserted by suitably
programming the ICRT) is the source of the problem?

References:

[1] "82378IB System I/O (SIO)", April 1993, Intel Corporation, Order 
    Number: 290473-002, Section 4.1.17 "ICRT -- ISA Controller Recovery 
    Timer Register"

  Maciej



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-11 14:58                 ` Maciej W. Rozycki
@ 2020-05-12 19:35                   ` Mikulas Patocka
  2020-05-13 14:41                   ` Ivan Kokshaysky
  1 sibling, 0 replies; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-12 19:35 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc



On Mon, 11 May 2020, Maciej W. Rozycki wrote:

>  And if timing is indeed the culprit, then I think it will be best fixed 
> in the 82378IB southbridge, i.e.[1]:
> 
> "The I/O recovery mechanism in the SIO is used to add additional recovery 
> delay between PCI originated 8-bit and 16-bit I/O cycles to the ISA Bus.  
> The SIO automatically forces a minimum delay of four SYSCLKs between 
> back-to-back 8 and 16 bit I/O cycles to the ISA Bus.  The delay is 
> measured from the rising edge of the I/O command (IOR# or IOW#) to the 
> falling edge of the next BALE.  If a delay of greater than four SYSCLKs is 
> required, the ISA I/O Recovery Time Register can be programmed to increase 
> the delay in increments of SYSCLKs.  Note that no additional delay is 
> inserted for back-to-back I/O "sub cycles" generated as a result of byte 
> assembly or disassembly.  This register defaults to 8 and 16-bit recovery 
> enabled with two clocks added to the standard I/O recovery."
> 
> where it won't be causing unnecessary overhead for native PCI devices or 
> indeed excessive one for ISA devices.  It might be interesting to note 
> that later SIO versions like the 82378ZB increased the minimum to five 
> SYSCLKs, so maybe a missing SYSCLK (that can still be inserted by suitably
> programming the ICRT) is the source of the problem?
> 
> References:
> 
> [1] "82378IB System I/O (SIO)", April 1993, Intel Corporation, Order 
>     Number: 290473-002, Section 4.1.17 "ICRT -- ISA Controller Recovery 
>     Timer Register"
> 
>   Maciej

I tried to modify this register (I wrote 0x44 to it - it should correspond 
to the maximum delay) and it had no effect on the serial port and rtc 
lock-ups.

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-11 14:58                 ` Maciej W. Rozycki
  2020-05-12 19:35                   ` Mikulas Patocka
@ 2020-05-13 14:41                   ` Ivan Kokshaysky
  2020-05-13 16:13                     ` Greg Kroah-Hartman
                                       ` (2 more replies)
  1 sibling, 3 replies; 45+ messages in thread
From: Ivan Kokshaysky @ 2020-05-13 14:41 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Mikulas Patocka, Arnd Bergmann, Richard Henderson, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote:
>  Individual PCI port locations correspond to different MMIO locations, so 
> yes, accesses to these can be reordered (merging won't happen due to the 
> use of the sparse address space).

Correct, it's how Alpha write buffers work. According to 21064 hardware
reference manual, these buffers are flushed when one of the following
conditions is met:

1) The write buffer contains at least two valid entries.
2) The write buffer contains one valid entry and at least 256 CPU cycles
   have elapsed since the execution of the last write buffer-directed
   instruction.
3) The write buffer contains an MB, STQ_C or STL_C instruction.
4) A load miss is pending to an address currently valid in the write
   buffer that requires the write buffer to be flushed.

I'm certain that in these rtc/serial cases we've got readX arriving
to device *before* preceeding writeX because of 2). That's why small
delay (300-1400 ns, apparently depends on CPU frequency) seemingly
"fixes" the problem. The 4) is not met because loads and stores are
to different ports, and 3) has been broken by commit 92d7223a74.

So I believe that correct fix would be to revert 92d7223a74 and
add wmb() before [io]writeX macros to meet memory-barriers.txt
requirement. The "wmb" instruction is cheap enough and won't hurt
IO performance too much.

Ivan.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-13 14:41                   ` Ivan Kokshaysky
@ 2020-05-13 16:13                     ` Greg Kroah-Hartman
  2020-05-13 17:17                     ` Maciej W. Rozycki
  2020-05-22 13:26                     ` Mikulas Patocka
  2 siblings, 0 replies; 45+ messages in thread
From: Greg Kroah-Hartman @ 2020-05-13 16:13 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Maciej W. Rozycki, Mikulas Patocka, Arnd Bergmann,
	Richard Henderson, Matt Turner, alpha, linux-serial, linux-rtc

On Wed, May 13, 2020 at 03:41:28PM +0100, Ivan Kokshaysky wrote:
> On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote:
> >  Individual PCI port locations correspond to different MMIO locations, so 
> > yes, accesses to these can be reordered (merging won't happen due to the 
> > use of the sparse address space).
> 
> Correct, it's how Alpha write buffers work. According to 21064 hardware
> reference manual, these buffers are flushed when one of the following
> conditions is met:
> 
> 1) The write buffer contains at least two valid entries.
> 2) The write buffer contains one valid entry and at least 256 CPU cycles
>    have elapsed since the execution of the last write buffer-directed
>    instruction.
> 3) The write buffer contains an MB, STQ_C or STL_C instruction.
> 4) A load miss is pending to an address currently valid in the write
>    buffer that requires the write buffer to be flushed.
> 
> I'm certain that in these rtc/serial cases we've got readX arriving
> to device *before* preceeding writeX because of 2). That's why small
> delay (300-1400 ns, apparently depends on CPU frequency) seemingly
> "fixes" the problem. The 4) is not met because loads and stores are
> to different ports, and 3) has been broken by commit 92d7223a74.
> 
> So I believe that correct fix would be to revert 92d7223a74 and
> add wmb() before [io]writeX macros to meet memory-barriers.txt
> requirement. The "wmb" instruction is cheap enough and won't hurt
> IO performance too much.

I agree, that sounds easier, and work with the authors of
memory-barriers.txt in order to straighten things out.

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-13 14:41                   ` Ivan Kokshaysky
  2020-05-13 16:13                     ` Greg Kroah-Hartman
@ 2020-05-13 17:17                     ` Maciej W. Rozycki
  2020-05-22 13:03                       ` Mikulas Patocka
  2020-05-22 13:26                     ` Mikulas Patocka
  2 siblings, 1 reply; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-13 17:17 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Maciej W. Rozycki, Mikulas Patocka, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Wed, 13 May 2020, Ivan Kokshaysky wrote:

> >  Individual PCI port locations correspond to different MMIO locations, so 
> > yes, accesses to these can be reordered (merging won't happen due to the 
> > use of the sparse address space).
> 
> Correct, it's how Alpha write buffers work. According to 21064 hardware
> reference manual, these buffers are flushed when one of the following
> conditions is met:
> 
> 1) The write buffer contains at least two valid entries.
> 2) The write buffer contains one valid entry and at least 256 CPU cycles
>    have elapsed since the execution of the last write buffer-directed
>    instruction.
> 3) The write buffer contains an MB, STQ_C or STL_C instruction.
> 4) A load miss is pending to an address currently valid in the write
>    buffer that requires the write buffer to be flushed.
> 
> I'm certain that in these rtc/serial cases we've got readX arriving
> to device *before* preceeding writeX because of 2). That's why small
> delay (300-1400 ns, apparently depends on CPU frequency) seemingly
> "fixes" the problem. The 4) is not met because loads and stores are
> to different ports, and 3) has been broken by commit 92d7223a74.
> 
> So I believe that correct fix would be to revert 92d7223a74 and
> add wmb() before [io]writeX macros to meet memory-barriers.txt
> requirement. The "wmb" instruction is cheap enough and won't hurt
> IO performance too much.

 Hmm, having barriers *afterwards* across all the MMIO accessors, even 
ones that do not have such a requirement according to memory-barriers.txt, 
does hurt performance unnecessarily however.  What I think has to be done 
is adding barriers beforehand, and then only add barriers afterwards where 
necessary.  Commit 92d7223a74 did a part of that, but did not consistently 
update all the remaining accessors.

 So I don't think reverting 92d7223a74 permanently is the right way to go, 
however it certainly makes sense to do that temporarily to get rid of the 
fatal regression, sort all the corner cases and then apply the resulting 
replacement fix.  I think ultimately we do want the barriers beforehand, 
just like the MIPS port has (and survives) in arch/mips/include/asm/io.h.  
Observe that unlike the Alpha ISA the MIPS ISA does have nontrivial `rmb' 
aka the SYNC_RMB hardware instruction.

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-13 17:17                     ` Maciej W. Rozycki
@ 2020-05-22 13:03                       ` Mikulas Patocka
  2020-05-22 13:37                         ` Maciej W. Rozycki
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-22 13:03 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc



On Wed, 13 May 2020, Maciej W. Rozycki wrote:

> On Wed, 13 May 2020, Ivan Kokshaysky wrote:
> 
> > >  Individual PCI port locations correspond to different MMIO locations, so 
> > > yes, accesses to these can be reordered (merging won't happen due to the 
> > > use of the sparse address space).
> > 
> > Correct, it's how Alpha write buffers work. According to 21064 hardware
> > reference manual, these buffers are flushed when one of the following
> > conditions is met:
> > 
> > 1) The write buffer contains at least two valid entries.
> > 2) The write buffer contains one valid entry and at least 256 CPU cycles
> >    have elapsed since the execution of the last write buffer-directed
> >    instruction.
> > 3) The write buffer contains an MB, STQ_C or STL_C instruction.
> > 4) A load miss is pending to an address currently valid in the write
> >    buffer that requires the write buffer to be flushed.
> > 
> > I'm certain that in these rtc/serial cases we've got readX arriving
> > to device *before* preceeding writeX because of 2). That's why small
> > delay (300-1400 ns, apparently depends on CPU frequency) seemingly
> > "fixes" the problem. The 4) is not met because loads and stores are
> > to different ports, and 3) has been broken by commit 92d7223a74.
> > 
> > So I believe that correct fix would be to revert 92d7223a74 and
> > add wmb() before [io]writeX macros to meet memory-barriers.txt
> > requirement. The "wmb" instruction is cheap enough and won't hurt
> > IO performance too much.
> 
>  Hmm, having barriers *afterwards* across all the MMIO accessors, even 
> ones that do not have such a requirement according to memory-barriers.txt, 
> does hurt performance unnecessarily however.  What I think has to be done 
> is adding barriers beforehand, and then only add barriers afterwards where 
> necessary.  Commit 92d7223a74 did a part of that, but did not consistently 
> update all the remaining accessors.
> 
>  So I don't think reverting 92d7223a74 permanently is the right way to go, 
> however it certainly makes sense to do that temporarily to get rid of the 
> fatal regression, sort all the corner cases and then apply the resulting 
> replacement fix.

See Documentation/memory-barriers.txt, the section "KERNEL I/O BARRIER 
EFFECTS"

According to the specification, there must be a barrier before a write to 
the MMIO space (paragraph 3) and after a read from MMIO space (parahraph 
4) - if this causes unnecessary slowdown, the driver should use 
readX_relaxed or writeX_relaxed functions - the *_relaxed functions are 
ordered with each other (see the paragraph "(*) readX_relaxed(), 
writeX_relaxed()"), but they are not ordered with respect to memory 
access.

The commit 92d7223a74 fixes that requirement (although there is no real 
driver that was fixed by this), so I don't think it should be reverted. 
The proper fix should be to add delays to the serial port and readltime 
clock (or perhaps to all IO-port accesses).

> I think ultimately we do want the barriers beforehand, just like the 
> MIPS port has (and survives) in arch/mips/include/asm/io.h.  Observe 

If the MIPS port doesn't have MMIO barrier after read[bwl], then it is 
violating the specification. Perhaps there is no existing driver that is 
hurt by this violation, so this violation survived.

> that unlike the Alpha ISA the MIPS ISA does have nontrivial `rmb' aka 
> the SYNC_RMB hardware instruction.
> 
>   Maciej

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-13 14:41                   ` Ivan Kokshaysky
  2020-05-13 16:13                     ` Greg Kroah-Hartman
  2020-05-13 17:17                     ` Maciej W. Rozycki
@ 2020-05-22 13:26                     ` Mikulas Patocka
  2020-05-22 20:00                       ` Mikulas Patocka
  2 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-22 13:26 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc



On Wed, 13 May 2020, Ivan Kokshaysky wrote:

> On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote:
> >  Individual PCI port locations correspond to different MMIO locations, so 
> > yes, accesses to these can be reordered (merging won't happen due to the 
> > use of the sparse address space).
> 
> Correct, it's how Alpha write buffers work. According to 21064 hardware
> reference manual, these buffers are flushed when one of the following
> conditions is met:
> 
> 1) The write buffer contains at least two valid entries.
> 2) The write buffer contains one valid entry and at least 256 CPU cycles
>    have elapsed since the execution of the last write buffer-directed
>    instruction.
> 3) The write buffer contains an MB, STQ_C or STL_C instruction.
> 4) A load miss is pending to an address currently valid in the write
>    buffer that requires the write buffer to be flushed.
> 
> I'm certain that in these rtc/serial cases we've got readX arriving
> to device *before* preceeding writeX because of 2). That's why small
> delay (300-1400 ns, apparently depends on CPU frequency) seemingly
> "fixes" the problem. The 4) is not met because loads and stores are
> to different ports, and 3) has been broken by commit 92d7223a74.
> 
> So I believe that correct fix would be to revert 92d7223a74 and
> add wmb() before [io]writeX macros to meet memory-barriers.txt
> requirement. The "wmb" instruction is cheap enough and won't hurt
> IO performance too much.
> 
> Ivan.

I agree ... and what about readX_relaxed and writeX_relaxed? According to 
the memory-barriers specification, the _relaxed functions must be ordered 
w.r.t. each other. If Alpha can't keep them ordered, they should have 
barriers between them too.

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-22 13:03                       ` Mikulas Patocka
@ 2020-05-22 13:37                         ` Maciej W. Rozycki
  0 siblings, 0 replies; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-22 13:37 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Fri, 22 May 2020, Mikulas Patocka wrote:

> >  Hmm, having barriers *afterwards* across all the MMIO accessors, even 
> > ones that do not have such a requirement according to memory-barriers.txt, 
> > does hurt performance unnecessarily however.  What I think has to be done 
> > is adding barriers beforehand, and then only add barriers afterwards where 
> > necessary.  Commit 92d7223a74 did a part of that, but did not consistently 
> > update all the remaining accessors.
> > 
> >  So I don't think reverting 92d7223a74 permanently is the right way to go, 
> > however it certainly makes sense to do that temporarily to get rid of the 
> > fatal regression, sort all the corner cases and then apply the resulting 
> > replacement fix.
> 
> See Documentation/memory-barriers.txt, the section "KERNEL I/O BARRIER 
> EFFECTS"
> 
> According to the specification, there must be a barrier before a write to 
> the MMIO space (paragraph 3) and after a read from MMIO space (parahraph 
> 4) - if this causes unnecessary slowdown, the driver should use 
> readX_relaxed or writeX_relaxed functions - the *_relaxed functions are 
> ordered with each other (see the paragraph "(*) readX_relaxed(), 
> writeX_relaxed()"), but they are not ordered with respect to memory 
> access.

 The specification doesn't require a barrier *after* a write however, 
which is what I have been concerned about as it may cost hundreds of 
cycles wasted.  I'm not concerned about a barrier after a read (and one 
beforehand is of course also required).

> The commit 92d7223a74 fixes that requirement (although there is no real 
> driver that was fixed by this), so I don't think it should be reverted. 
> The proper fix should be to add delays to the serial port and readltime 
> clock (or perhaps to all IO-port accesses).

 Adding artificial delays will only paper over the real problem I'm 
afraid.

> > I think ultimately we do want the barriers beforehand, just like the 
> > MIPS port has (and survives) in arch/mips/include/asm/io.h.  Observe 
> 
> If the MIPS port doesn't have MMIO barrier after read[bwl], then it is 
> violating the specification. Perhaps there is no existing driver that is 
> hurt by this violation, so this violation survived.

 It does have a barrier, see:

	/* prevent prefetching of coherent DMA data prematurely */	\
	if (!relax)							\
		rmb();							\

In the light of #5 however:

        5. A readX() by a CPU thread from the peripheral will complete before
           any subsequent delay() loop can begin execution on the same thread.

I think it may have to be replaced with a completion barrier however, and 
that will be tricky because you cannot just issue a second read to the 
resource accessed after the `rmb' to make the first read complete, as a 
MMIO read may have side effects (e.g. clearing an interrupt request).  So 
the read would have to happen to a different location.

 Some architectures have a hardware completion barrier instruction, such 
as the modern MIPS ISA, which makes life sweet and easy (as much as life 
can be sweet and easy with a weakly ordered bus model) as no dummy read is 
then required, but surely not all do (including older MIPS ISA revisions).

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l
  2020-05-22 13:26                     ` Mikulas Patocka
@ 2020-05-22 20:00                       ` Mikulas Patocka
  2020-05-23 10:26                         ` [PATCH v4] alpha: fix memory barriers so that they conform to the specification Mikulas Patocka
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-22 20:00 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc



On Fri, 22 May 2020, Mikulas Patocka wrote:

> On Wed, 13 May 2020, Ivan Kokshaysky wrote:
> 
> > On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote:
> > >  Individual PCI port locations correspond to different MMIO locations, so 
> > > yes, accesses to these can be reordered (merging won't happen due to the 
> > > use of the sparse address space).
> > 
> > Correct, it's how Alpha write buffers work. According to 21064 hardware
> > reference manual, these buffers are flushed when one of the following
> > conditions is met:
> > 
> > 1) The write buffer contains at least two valid entries.
> > 2) The write buffer contains one valid entry and at least 256 CPU cycles
> >    have elapsed since the execution of the last write buffer-directed
> >    instruction.
> > 3) The write buffer contains an MB, STQ_C or STL_C instruction.
> > 4) A load miss is pending to an address currently valid in the write
> >    buffer that requires the write buffer to be flushed.
> > 
> > I'm certain that in these rtc/serial cases we've got readX arriving
> > to device *before* preceeding writeX because of 2). That's why small
> > delay (300-1400 ns, apparently depends on CPU frequency) seemingly
> > "fixes" the problem. The 4) is not met because loads and stores are
> > to different ports, and 3) has been broken by commit 92d7223a74.
> > 
> > So I believe that correct fix would be to revert 92d7223a74 and
> > add wmb() before [io]writeX macros to meet memory-barriers.txt
> > requirement. The "wmb" instruction is cheap enough and won't hurt
> > IO performance too much.
> > 
> > Ivan.
> 
> I agree ... and what about readX_relaxed and writeX_relaxed? According to 
> the memory-barriers specification, the _relaxed functions must be ordered 
> w.r.t. each other. If Alpha can't keep them ordered, they should have 
> barriers between them too.
> 
> Mikulas

I have found the chapter about I/O in the Alpha reference manual, in the 
section "5.6.4.7 Implications for Memory Mapped I/O", it says:

To reliably communicate shared data from a processor to an I/O device:
1. Write the shared data to a memory-like physical memory region on the 
processor.
2. Execute an MB instruction.
3. Write a flag (equivalently, send an interrupt or write a register 
location implemented in the I/O device).

The receiving I/O device must:
1. Read the flag (equivalently, detect the interrupt or detect the write 
to the register location implemented in the I/O device).
2. Execute the equivalent of an MB.
3. Read the shared data.

So, we must use MB, not WMB when ordering I/O accesses.

The WMB instruction specification says that it orders writes to 
memory-like locations with other writes to memory-like locations. And 
writes to non-memory-like locations with other writes to non-memory-like 
locations. But it doesn't order memory-like writes with I/O-like writes.

The section 5.6.3 claims that there are no implied barriers.



So, if we want to support the specifications:

read*_relaxed and write*_relaxed need at least one barrier before or after 
(memory-barriers.txt says that they must be ordered with each other, the 
alpha specification doesn't specify ordering).

read and write - read must have a barrier after it, write before it. There 
must be one more barrier before a read or after a write, to make sure that 
there is barrier between write+read.

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4] alpha: fix memory barriers so that they conform to the specification
  2020-05-22 20:00                       ` Mikulas Patocka
@ 2020-05-23 10:26                         ` Mikulas Patocka
  2020-05-23 15:10                           ` Ivan Kokshaysky
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-23 10:26 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti
platform. The patches move memory barriers after a write before the write.
The result is that if there's iowrite followed by ioread, there is no
barrier between them.

The Alpha architecture allows reordering of the accesses to the I/O space,
and the missing barrier between write and read causes hang with serial
port and real time clock.

This patch makes barriers confiorm to the specification.

1. We add mb() before readX_relaxed and writeX_relaxed -
   memory-barriers.txt claims that these functions must be ordered w.r.t.
   each other. Alpha doesn't order them, so we need an explicit barrier.
2. We add mb() before reads from the I/O space - so that if there's a
   write followed by a read, there should be a barrier between them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org      # v4.17+

---
 arch/alpha/include/asm/io.h |   43 ++++++++++++++++++++++++++++---------------
 arch/alpha/kernel/io.c      |   28 +++++++++++++++++++++-------
 2 files changed, 49 insertions(+), 22 deletions(-)

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
@@ -310,14 +310,18 @@ static inline int __is_mmio(const volati
 #if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
 extern inline unsigned int ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 extern inline unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
@@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned
 #if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
 extern inline unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -403,15 +409,18 @@ extern inline void __raw_writew(u16 b, v
 
 extern inline u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 extern inline u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
 	mb();
+	ret = __raw_readw(addr);
 	return ret;
 }
 
@@ -451,14 +460,18 @@ extern inline void __raw_writeq(u64 b, v
 
 extern inline u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 extern inline u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }
@@ -487,14 +500,14 @@ extern inline void writeq(u64 b, volatil
 #define outb_p		outb
 #define outw_p		outw
 #define outl_p		outl
-#define readb_relaxed(addr)	__raw_readb(addr)
-#define readw_relaxed(addr)	__raw_readw(addr)
-#define readl_relaxed(addr)	__raw_readl(addr)
-#define readq_relaxed(addr)	__raw_readq(addr)
-#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
-#define writew_relaxed(b, addr)	__raw_writew(b, addr)
-#define writel_relaxed(b, addr)	__raw_writel(b, addr)
-#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
+#define readb_relaxed(addr)	(mb(), __raw_readb(addr))
+#define readw_relaxed(addr)	(mb(), __raw_readw(addr))
+#define readl_relaxed(addr)	(mb(), __raw_readl(addr))
+#define readq_relaxed(addr)	(mb(), __raw_readq(addr))
+#define writeb_relaxed(b, addr)	(mb(), __raw_writeb(b, addr))
+#define writew_relaxed(b, addr)	(mb(), __raw_writew(b, addr))
+#define writel_relaxed(b, addr)	(mb(), __raw_writel(b, addr))
+#define writeq_relaxed(b, addr)	(mb(), __raw_writeq(b, addr))
 
 /*
  * String version of IO memory access ops:
Index: linux-stable/arch/alpha/kernel/io.c
===================================================================
--- linux-stable.orig/arch/alpha/kernel/io.c	2020-05-23 10:01:22.000000000 +0200
+++ linux-stable/arch/alpha/kernel/io.c	2020-05-23 10:01:22.000000000 +0200
@@ -16,21 +16,27 @@
 unsigned int
 ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq);
 
 u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
 
 u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification
  2020-05-23 10:26                         ` [PATCH v4] alpha: fix memory barriers so that they conform to the specification Mikulas Patocka
@ 2020-05-23 15:10                           ` Ivan Kokshaysky
  2020-05-23 15:34                             ` Mikulas Patocka
  0 siblings, 1 reply; 45+ messages in thread
From: Ivan Kokshaysky @ 2020-05-23 15:10 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

On Sat, May 23, 2020 at 06:26:54AM -0400, Mikulas Patocka wrote:
> The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti
> platform. The patches move memory barriers after a write before the write.
> The result is that if there's iowrite followed by ioread, there is no
> barrier between them.
> 
> The Alpha architecture allows reordering of the accesses to the I/O space,
> and the missing barrier between write and read causes hang with serial
> port and real time clock.
> 
> This patch makes barriers confiorm to the specification.
> 
> 1. We add mb() before readX_relaxed and writeX_relaxed -
>    memory-barriers.txt claims that these functions must be ordered w.r.t.
>    each other. Alpha doesn't order them, so we need an explicit barrier.
> 2. We add mb() before reads from the I/O space - so that if there's a
>    write followed by a read, there should be a barrier between them.

You missed the second mb() in extern inline u16 readw(). Otherwise,

Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> Cc: stable@vger.kernel.org      # v4.17+
> 
> ---
>  arch/alpha/include/asm/io.h |   43 ++++++++++++++++++++++++++++---------------
>  arch/alpha/kernel/io.c      |   28 +++++++++++++++++++++-------
>  2 files changed, 49 insertions(+), 22 deletions(-)
> 
> Index: linux-stable/arch/alpha/include/asm/io.h
> ===================================================================
> --- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
> +++ linux-stable/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
> @@ -310,14 +310,18 @@ static inline int __is_mmio(const volati
>  #if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
>  extern inline unsigned int ioread8(void __iomem *addr)
>  {
> -	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
> +	unsigned int ret;
> +	mb();
> +	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
>  	mb();
>  	return ret;
>  }
>  
>  extern inline unsigned int ioread16(void __iomem *addr)
>  {
> -	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
> +	unsigned int ret;
> +	mb();
> +	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
>  	mb();
>  	return ret;
>  }
> @@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned
>  #if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
>  extern inline unsigned int ioread32(void __iomem *addr)
>  {
> -	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
> +	unsigned int ret;
> +	mb();
> +	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
>  	mb();
>  	return ret;
>  }
> @@ -403,15 +409,18 @@ extern inline void __raw_writew(u16 b, v
>  
>  extern inline u8 readb(const volatile void __iomem *addr)
>  {
> -	u8 ret = __raw_readb(addr);
> +	u8 ret;
> +	mb();
> +	ret = __raw_readb(addr);
>  	mb();
>  	return ret;
>  }
>  
>  extern inline u16 readw(const volatile void __iomem *addr)
>  {
> -	u16 ret = __raw_readw(addr);
> +	u16 ret;
>  	mb();
> +	ret = __raw_readw(addr);
>  	return ret;
>  }
>  
> @@ -451,14 +460,18 @@ extern inline void __raw_writeq(u64 b, v
>  
>  extern inline u32 readl(const volatile void __iomem *addr)
>  {
> -	u32 ret = __raw_readl(addr);
> +	u32 ret;
> +	mb();
> +	ret = __raw_readl(addr);
>  	mb();
>  	return ret;
>  }
>  
>  extern inline u64 readq(const volatile void __iomem *addr)
>  {
> -	u64 ret = __raw_readq(addr);
> +	u64 ret;
> +	mb();
> +	ret = __raw_readq(addr);
>  	mb();
>  	return ret;
>  }
> @@ -487,14 +500,14 @@ extern inline void writeq(u64 b, volatil
>  #define outb_p		outb
>  #define outw_p		outw
>  #define outl_p		outl
> -#define readb_relaxed(addr)	__raw_readb(addr)
> -#define readw_relaxed(addr)	__raw_readw(addr)
> -#define readl_relaxed(addr)	__raw_readl(addr)
> -#define readq_relaxed(addr)	__raw_readq(addr)
> -#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
> -#define writew_relaxed(b, addr)	__raw_writew(b, addr)
> -#define writel_relaxed(b, addr)	__raw_writel(b, addr)
> -#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
> +#define readb_relaxed(addr)	(mb(), __raw_readb(addr))
> +#define readw_relaxed(addr)	(mb(), __raw_readw(addr))
> +#define readl_relaxed(addr)	(mb(), __raw_readl(addr))
> +#define readq_relaxed(addr)	(mb(), __raw_readq(addr))
> +#define writeb_relaxed(b, addr)	(mb(), __raw_writeb(b, addr))
> +#define writew_relaxed(b, addr)	(mb(), __raw_writew(b, addr))
> +#define writel_relaxed(b, addr)	(mb(), __raw_writel(b, addr))
> +#define writeq_relaxed(b, addr)	(mb(), __raw_writeq(b, addr))
>  
>  /*
>   * String version of IO memory access ops:
> Index: linux-stable/arch/alpha/kernel/io.c
> ===================================================================
> --- linux-stable.orig/arch/alpha/kernel/io.c	2020-05-23 10:01:22.000000000 +0200
> +++ linux-stable/arch/alpha/kernel/io.c	2020-05-23 10:01:22.000000000 +0200
> @@ -16,21 +16,27 @@
>  unsigned int
>  ioread8(void __iomem *addr)
>  {
> -	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
> +	unsigned int ret;
> +	mb();
> +	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
>  	mb();
>  	return ret;
>  }
>  
>  unsigned int ioread16(void __iomem *addr)
>  {
> -	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
> +	unsigned int ret;
> +	mb();
> +	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
>  	mb();
>  	return ret;
>  }
>  
>  unsigned int ioread32(void __iomem *addr)
>  {
> -	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
> +	unsigned int ret;
> +	mb();
> +	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
>  	mb();
>  	return ret;
>  }
> @@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq);
>  
>  u8 readb(const volatile void __iomem *addr)
>  {
> -	u8 ret = __raw_readb(addr);
> +	u8 ret;
> +	mb();
> +	ret = __raw_readb(addr);
>  	mb();
>  	return ret;
>  }
>  
>  u16 readw(const volatile void __iomem *addr)
>  {
> -	u16 ret = __raw_readw(addr);
> +	u16 ret;
> +	mb();
> +	ret = __raw_readw(addr);
>  	mb();
>  	return ret;
>  }
>  
>  u32 readl(const volatile void __iomem *addr)
>  {
> -	u32 ret = __raw_readl(addr);
> +	u32 ret;
> +	mb();
> +	ret = __raw_readl(addr);
>  	mb();
>  	return ret;
>  }
>  
>  u64 readq(const volatile void __iomem *addr)
>  {
> -	u64 ret = __raw_readq(addr);
> +	u64 ret;
> +	mb();
> +	ret = __raw_readq(addr);
>  	mb();
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification
  2020-05-23 15:10                           ` Ivan Kokshaysky
@ 2020-05-23 15:34                             ` Mikulas Patocka
  2020-05-23 15:37                               ` [PATCH v5] " Mikulas Patocka
  2020-05-23 16:44                               ` [PATCH v4] " Maciej W. Rozycki
  0 siblings, 2 replies; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-23 15:34 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc



On Sat, 23 May 2020, Ivan Kokshaysky wrote:

> On Sat, May 23, 2020 at 06:26:54AM -0400, Mikulas Patocka wrote:
> > The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti
> > platform. The patches move memory barriers after a write before the write.
> > The result is that if there's iowrite followed by ioread, there is no
> > barrier between them.
> > 
> > The Alpha architecture allows reordering of the accesses to the I/O space,
> > and the missing barrier between write and read causes hang with serial
> > port and real time clock.
> > 
> > This patch makes barriers confiorm to the specification.
> > 
> > 1. We add mb() before readX_relaxed and writeX_relaxed -
> >    memory-barriers.txt claims that these functions must be ordered w.r.t.
> >    each other. Alpha doesn't order them, so we need an explicit barrier.
> > 2. We add mb() before reads from the I/O space - so that if there's a
> >    write followed by a read, there should be a barrier between them.
> 
> You missed the second mb() in extern inline u16 readw(). Otherwise,
> 
> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

... and I also broke the *_relaxed macros and didn't notice it, because 
they are unused in my config. This won't compile, because mb() is a 
statement, not a function.

> > +#define readb_relaxed(addr)        (mb(), __raw_readb(addr))

I'll send a new version of the patch.

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-23 15:34                             ` Mikulas Patocka
@ 2020-05-23 15:37                               ` Mikulas Patocka
  2020-05-24 14:54                                 ` Maciej W. Rozycki
  2020-05-23 16:44                               ` [PATCH v4] " Maciej W. Rozycki
  1 sibling, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-23 15:37 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner,
	Greg Kroah-Hartman, alpha, linux-serial, linux-rtc

The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti
platform. The patches move memory barriers after a write before the write.
The result is that if there's iowrite followed by ioread, there is no
barrier between them.

The Alpha architecture allows reordering of the accesses to the I/O space,
and the missing barrier between write and read causes hang with serial
port and real time clock.

This patch makes barriers confiorm to the specification.

1. We add mb() before readX_relaxed and writeX_relaxed -
   memory-barriers.txt claims that these functions must be ordered w.r.t.
   each other. Alpha doesn't order them, so we need an explicit barrier.
2. We add mb() before reads from the I/O space - so that if there's a
   write followed by a read, there should be a barrier between them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org      # v4.17+
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

---
 arch/alpha/include/asm/io.h |   87 ++++++++++++++++++++++++++++++++++++--------
 arch/alpha/kernel/io.c      |   28 ++++++++++----
 2 files changed, 93 insertions(+), 22 deletions(-)

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-23 17:29:22.000000000 +0200
@@ -310,14 +310,18 @@ static inline int __is_mmio(const volati
 #if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
 extern inline unsigned int ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 extern inline unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
@@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned
 #if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
 extern inline unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -403,14 +409,18 @@ extern inline void __raw_writew(u16 b, v
 
 extern inline u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 extern inline u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
@@ -451,14 +461,18 @@ extern inline void __raw_writeq(u64 b, v
 
 extern inline u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 extern inline u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }
@@ -487,16 +501,59 @@ extern inline void writeq(u64 b, volatil
 #define outb_p		outb
 #define outw_p		outw
 #define outl_p		outl
-#define readb_relaxed(addr)	__raw_readb(addr)
-#define readw_relaxed(addr)	__raw_readw(addr)
-#define readl_relaxed(addr)	__raw_readl(addr)
-#define readq_relaxed(addr)	__raw_readq(addr)
-#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
-#define writew_relaxed(b, addr)	__raw_writew(b, addr)
-#define writel_relaxed(b, addr)	__raw_writel(b, addr)
-#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
 
 /*
+ * The _relaxed functions must be ordered w.r.t. each other, but they don't
+ * have to be ordered w.r.t. other memory accesses.
+ */
+static inline u8 readb_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readb(addr);
+}
+
+static inline u16 readw_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readw(addr);
+}
+
+static inline u32 readl_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readl(addr);
+}
+
+static inline u64 readq_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readq(addr);
+}
+
+static inline void writeb_relaxed(u8 b, volatile void __iomem *addr)
+{
+	mb();
+	__raw_writeb(b, addr);
+}
+
+static inline void writew_relaxed(u16 b, volatile void __iomem *addr)
+{
+	mb();
+	__raw_writew(b, addr);
+}
+
+static inline void writel_relaxed(u32 b, volatile void __iomem *addr)
+{
+	mb();
+	__raw_writel(b, addr);
+}
+
+static inline void writeq_relaxed(u64 b, volatile void __iomem *addr)
+{
+	mb();
+	__raw_writeq(b, addr);
+}
+/*
  * String version of IO memory access ops:
  */
 extern void memcpy_fromio(void *, const volatile void __iomem *, long);
Index: linux-stable/arch/alpha/kernel/io.c
===================================================================
--- linux-stable.orig/arch/alpha/kernel/io.c	2020-05-23 10:01:22.000000000 +0200
+++ linux-stable/arch/alpha/kernel/io.c	2020-05-23 10:01:22.000000000 +0200
@@ -16,21 +16,27 @@
 unsigned int
 ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq);
 
 u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
 
 u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification
  2020-05-23 15:34                             ` Mikulas Patocka
  2020-05-23 15:37                               ` [PATCH v5] " Mikulas Patocka
@ 2020-05-23 16:44                               ` Maciej W. Rozycki
  2020-05-23 17:09                                 ` Mikulas Patocka
  1 sibling, 1 reply; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-23 16:44 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Sat, 23 May 2020, Mikulas Patocka wrote:

> ... and I also broke the *_relaxed macros and didn't notice it, because 
> they are unused in my config. This won't compile, because mb() is a 
> statement, not a function.
> 
> > > +#define readb_relaxed(addr)        (mb(), __raw_readb(addr))

 A statement expression would do though, e.g.:

#define readb_relaxed(addr)	({ mb(); __raw_readb(addr); })

and might be preferable for code brevity to adding a zillion of inline 
functions.

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification
  2020-05-23 16:44                               ` [PATCH v4] " Maciej W. Rozycki
@ 2020-05-23 17:09                                 ` Mikulas Patocka
  2020-05-23 19:27                                   ` Maciej W. Rozycki
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-23 17:09 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc



On Sat, 23 May 2020, Maciej W. Rozycki wrote:

> On Sat, 23 May 2020, Mikulas Patocka wrote:
> 
> > ... and I also broke the *_relaxed macros and didn't notice it, because 
> > they are unused in my config. This won't compile, because mb() is a 
> > statement, not a function.
> > 
> > > > +#define readb_relaxed(addr)        (mb(), __raw_readb(addr))
> 
>  A statement expression would do though, e.g.:
> 
> #define readb_relaxed(addr)	({ mb(); __raw_readb(addr); })
> 
> and might be preferable for code brevity to adding a zillion of inline 
> functions.
> 
>   Maciej

I know, but that file uses inline functions everywhere else, so I wanted 
to make it consistent.

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification
  2020-05-23 17:09                                 ` Mikulas Patocka
@ 2020-05-23 19:27                                   ` Maciej W. Rozycki
  2020-05-23 20:11                                     ` Mikulas Patocka
  0 siblings, 1 reply; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-23 19:27 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Sat, 23 May 2020, Mikulas Patocka wrote:

> >  A statement expression would do though, e.g.:
> > 
> > #define readb_relaxed(addr)	({ mb(); __raw_readb(addr); })
> > 
> > and might be preferable for code brevity to adding a zillion of inline 
> > functions.
> > 
> >   Maciej
> 
> I know, but that file uses inline functions everywhere else, so I wanted 
> to make it consistent.

 Fair enough, fine with me.  I still can't access my Alpha system, have 
you verified your latest version at run time?

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification
  2020-05-23 19:27                                   ` Maciej W. Rozycki
@ 2020-05-23 20:11                                     ` Mikulas Patocka
  0 siblings, 0 replies; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-23 20:11 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc



On Sat, 23 May 2020, Maciej W. Rozycki wrote:

> On Sat, 23 May 2020, Mikulas Patocka wrote:
> 
> > >  A statement expression would do though, e.g.:
> > > 
> > > #define readb_relaxed(addr)	({ mb(); __raw_readb(addr); })
> > > 
> > > and might be preferable for code brevity to adding a zillion of inline 
> > > functions.
> > > 
> > >   Maciej
> > 
> > I know, but that file uses inline functions everywhere else, so I wanted 
> > to make it consistent.
> 
>  Fair enough, fine with me.  I still can't access my Alpha system, have 
> you verified your latest version at run time?
> 
>   Maciej

Yes - it runs without any hang.

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-23 15:37                               ` [PATCH v5] " Mikulas Patocka
@ 2020-05-24 14:54                                 ` Maciej W. Rozycki
  2020-05-25 13:56                                   ` Mikulas Patocka
  0 siblings, 1 reply; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-24 14:54 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

Hi Mikulas,

> This patch makes barriers confiorm to the specification.
> 
> 1. We add mb() before readX_relaxed and writeX_relaxed -
>    memory-barriers.txt claims that these functions must be ordered w.r.t.
>    each other. Alpha doesn't order them, so we need an explicit barrier.
> 2. We add mb() before reads from the I/O space - so that if there's a
>    write followed by a read, there should be a barrier between them.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> Cc: stable@vger.kernel.org      # v4.17+
> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

 Thank you for your effort to address this regression.  I have looked 
through your code and the context it is to be applied to.  Overall it 
looks good to me, however I still have one concern as detailed below 
(please accept my apologies if you find it tedious to address all the 
points raised in the course of this review).

> Index: linux-stable/arch/alpha/include/asm/io.h
> ===================================================================
> --- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
> +++ linux-stable/arch/alpha/include/asm/io.h	2020-05-23 17:29:22.000000000 +0200
[...]
> @@ -487,16 +501,59 @@ extern inline void writeq(u64 b, volatil
>  #define outb_p		outb
>  #define outw_p		outw
>  #define outl_p		outl
> -#define readb_relaxed(addr)	__raw_readb(addr)
> -#define readw_relaxed(addr)	__raw_readw(addr)
> -#define readl_relaxed(addr)	__raw_readl(addr)
> -#define readq_relaxed(addr)	__raw_readq(addr)
> -#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
> -#define writew_relaxed(b, addr)	__raw_writew(b, addr)
> -#define writel_relaxed(b, addr)	__raw_writel(b, addr)
> -#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
>  
>  /*
> + * The _relaxed functions must be ordered w.r.t. each other, but they don't
> + * have to be ordered w.r.t. other memory accesses.
> + */
> +static inline u8 readb_relaxed(const volatile void __iomem *addr)
> +{
> +	mb();
> +	return __raw_readb(addr);
> +}
[etc.]

 Please observe that changing the `*_relaxed' entry points from merely 
aliasing the corresponding `__raw_*' handlers to more elaborate code 
sequences (though indeed slightly only, but still) makes the situation 
analogous to one we have with the ordinary MMIO accessor entry points.  
Those regular entry points have been made `extern inline' and wrapped 
into:

#if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1

and:

#if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1

respectively, with corresponding out-of-line entry points available, so 
that there is no extra inline code produced where the call to the relevant 
MMIO accessor is going to end up with an actual function call, as this 
would not help performance in any way and would expand code unnecessarily 
at all call sites.

 Therefore I suggest that your new `static inline' functions follow the 
pattern, perhaps by grouping them with the corresponding ordinary accessor 
functions in arch/alpha/include/asm/io.h within the relevant existing 
#ifdef, and then by making them `extern inline' and providing out-of-line 
implementations in arch/alpha/kernel/io.c, with the individual symbols 
exported.  Within arch/alpha/kernel/io.c the compiler will still inline 
code as it sees fit as it already does, e.g. `__raw_readq' might get 
inlined in `readq' if it turns out cheaper than arranging for an actual 
call, including all the stack frame preparation for `ra' preservation; 
it's less likely with say `writeq' which probably always ends with a tail 
call to `__raw_writeq' as no stack frame is required in that case.

 That for the read accessors.

> +static inline void writeb_relaxed(u8 b, volatile void __iomem *addr)
> +{
> +	mb();
> +	__raw_writeb(b, addr);
> +}
[etc.]

 Conversely for the write accessors, keeping in mind what I have noted 
above, I suggest that you just redirect the existing aliases to the 
ordinary accessors, as there will be no difference anymore between the 
respective ordinary and relaxed accessors.  That is:

#define writeb_relaxed(b, addr)	writeb(b, addr)

etc.

 Let me know if you have any further questions or comments.

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-24 14:54                                 ` Maciej W. Rozycki
@ 2020-05-25 13:56                                   ` Mikulas Patocka
  2020-05-25 14:07                                     ` Arnd Bergmann
  2020-05-25 14:45                                     ` Maciej W. Rozycki
  0 siblings, 2 replies; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-25 13:56 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc



On Sun, 24 May 2020, Maciej W. Rozycki wrote:

> Hi Mikulas,
> 
> > This patch makes barriers confiorm to the specification.
> > 
> > 1. We add mb() before readX_relaxed and writeX_relaxed -
> >    memory-barriers.txt claims that these functions must be ordered w.r.t.
> >    each other. Alpha doesn't order them, so we need an explicit barrier.
> > 2. We add mb() before reads from the I/O space - so that if there's a
> >    write followed by a read, there should be a barrier between them.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
> > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> > Cc: stable@vger.kernel.org      # v4.17+
> > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> 
>  Thank you for your effort to address this regression.  I have looked 
> through your code and the context it is to be applied to.  Overall it 
> looks good to me, however I still have one concern as detailed below 
> (please accept my apologies if you find it tedious to address all the 
> points raised in the course of this review).
> 
> > Index: linux-stable/arch/alpha/include/asm/io.h
> > ===================================================================
> > --- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
> > +++ linux-stable/arch/alpha/include/asm/io.h	2020-05-23 17:29:22.000000000 +0200
> [...]
> > @@ -487,16 +501,59 @@ extern inline void writeq(u64 b, volatil
> >  #define outb_p		outb
> >  #define outw_p		outw
> >  #define outl_p		outl
> > -#define readb_relaxed(addr)	__raw_readb(addr)
> > -#define readw_relaxed(addr)	__raw_readw(addr)
> > -#define readl_relaxed(addr)	__raw_readl(addr)
> > -#define readq_relaxed(addr)	__raw_readq(addr)
> > -#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
> > -#define writew_relaxed(b, addr)	__raw_writew(b, addr)
> > -#define writel_relaxed(b, addr)	__raw_writel(b, addr)
> > -#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
> >  
> >  /*
> > + * The _relaxed functions must be ordered w.r.t. each other, but they don't
> > + * have to be ordered w.r.t. other memory accesses.
> > + */
> > +static inline u8 readb_relaxed(const volatile void __iomem *addr)
> > +{
> > +	mb();
> > +	return __raw_readb(addr);
> > +}
> [etc.]
> 
>  Please observe that changing the `*_relaxed' entry points from merely 
> aliasing the corresponding `__raw_*' handlers to more elaborate code 
> sequences (though indeed slightly only, but still) makes the situation 
> analogous to one we have with the ordinary MMIO accessor entry points.  
> Those regular entry points have been made `extern inline' and wrapped 
> into:
> 
> #if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1
> 
> and:
> 
> #if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1
> 
> respectively, with corresponding out-of-line entry points available, so 
> that there is no extra inline code produced where the call to the relevant 
> MMIO accessor is going to end up with an actual function call, as this 
> would not help performance in any way and would expand code unnecessarily 
> at all call sites.
> 
>  Therefore I suggest that your new `static inline' functions follow the 
> pattern, perhaps by grouping them with the corresponding ordinary accessor 
> functions in arch/alpha/include/asm/io.h within the relevant existing 
> #ifdef, and then by making them `extern inline' and providing out-of-line 
> implementations in arch/alpha/kernel/io.c, with the individual symbols 
> exported.  Within arch/alpha/kernel/io.c the compiler will still inline 
> code as it sees fit as it already does, e.g. `__raw_readq' might get 
> inlined in `readq' if it turns out cheaper than arranging for an actual 
> call, including all the stack frame preparation for `ra' preservation; 
> it's less likely with say `writeq' which probably always ends with a tail 
> call to `__raw_writeq' as no stack frame is required in that case.
> 
>  That for the read accessors.

I think that making the read*_relaxed functions extern inline just causes 
source code bloat with no practical gain - if we make them extern inline, 
we would need two implementations (one in the include file, the other in 
the C file) - and it is not good practice to duplicate code.

The functions __raw_read* are already extern inline, so the compiler will 
inline/noinline them depending on the macros trivial_io_bw and 
trivial_io_lq - so we can just call them from read*_relaxed without 
repeating the extern inline pattern.

> > +static inline void writeb_relaxed(u8 b, volatile void __iomem *addr)
> > +{
> > +	mb();
> > +	__raw_writeb(b, addr);
> > +}
> [etc.]
> 
>  Conversely for the write accessors, keeping in mind what I have noted 
> above, I suggest that you just redirect the existing aliases to the 
> ordinary accessors, as there will be no difference anymore between the 
> respective ordinary and relaxed accessors.  That is:
> 
> #define writeb_relaxed(b, addr)	writeb(b, addr)

Yes - that's a good point.

> etc.
> 
>  Let me know if you have any further questions or comments.
> 
>   Maciej

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-25 13:56                                   ` Mikulas Patocka
@ 2020-05-25 14:07                                     ` Arnd Bergmann
  2020-05-25 14:45                                     ` Maciej W. Rozycki
  1 sibling, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2020-05-25 14:07 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Maciej W. Rozycki, Ivan Kokshaysky, Maciej W. Rozycki,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Mon, May 25, 2020 at 3:56 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> On Sun, 24 May 2020, Maciej W. Rozycki wrote:

> >
> > respectively, with corresponding out-of-line entry points available, so
> > that there is no extra inline code produced where the call to the relevant
> > MMIO accessor is going to end up with an actual function call, as this
> > would not help performance in any way and would expand code unnecessarily
> > at all call sites.
> >
> >  Therefore I suggest that your new `static inline' functions follow the
> > pattern, perhaps by grouping them with the corresponding ordinary accessor
> > functions in arch/alpha/include/asm/io.h within the relevant existing
> > #ifdef, and then by making them `extern inline' and providing out-of-line
> > implementations in arch/alpha/kernel/io.c, with the individual symbols
> > exported.  Within arch/alpha/kernel/io.c the compiler will still inline
> > code as it sees fit as it already does, e.g. `__raw_readq' might get
> > inlined in `readq' if it turns out cheaper than arranging for an actual
> > call, including all the stack frame preparation for `ra' preservation;
> > it's less likely with say `writeq' which probably always ends with a tail
> > call to `__raw_writeq' as no stack frame is required in that case.
> >
> >  That for the read accessors.
>
> I think that making the read*_relaxed functions extern inline just causes
> source code bloat with no practical gain - if we make them extern inline,
> we would need two implementations (one in the include file, the other in
> the C file) - and it is not good practice to duplicate code.
>
> The functions __raw_read* are already extern inline, so the compiler will
> inline/noinline them depending on the macros trivial_io_bw and
> trivial_io_lq - so we can just call them from read*_relaxed without
> repeating the extern inline pattern.

You could consider using the helpers in include/asm-generic/io.h
to provide some of the wrappers and only provide the ones that
don't fit in that scheme already.

       Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-25 13:56                                   ` Mikulas Patocka
  2020-05-25 14:07                                     ` Arnd Bergmann
@ 2020-05-25 14:45                                     ` Maciej W. Rozycki
  2020-05-25 15:53                                       ` [PATCH v6] " Mikulas Patocka
  2020-05-25 15:54                                       ` [PATCH v5] " Mikulas Patocka
  1 sibling, 2 replies; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-25 14:45 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Mon, 25 May 2020, Mikulas Patocka wrote:

> >  Please observe that changing the `*_relaxed' entry points from merely 
> > aliasing the corresponding `__raw_*' handlers to more elaborate code 
> > sequences (though indeed slightly only, but still) makes the situation 
> > analogous to one we have with the ordinary MMIO accessor entry points.  
> > Those regular entry points have been made `extern inline' and wrapped 
> > into:
> > 
> > #if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1
> > 
> > and:
> > 
> > #if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1
> > 
> > respectively, with corresponding out-of-line entry points available, so 
> > that there is no extra inline code produced where the call to the relevant 
> > MMIO accessor is going to end up with an actual function call, as this 
> > would not help performance in any way and would expand code unnecessarily 
> > at all call sites.
> > 
> >  Therefore I suggest that your new `static inline' functions follow the 
> > pattern, perhaps by grouping them with the corresponding ordinary accessor 
> > functions in arch/alpha/include/asm/io.h within the relevant existing 
> > #ifdef, and then by making them `extern inline' and providing out-of-line 
> > implementations in arch/alpha/kernel/io.c, with the individual symbols 
> > exported.  Within arch/alpha/kernel/io.c the compiler will still inline 
> > code as it sees fit as it already does, e.g. `__raw_readq' might get 
> > inlined in `readq' if it turns out cheaper than arranging for an actual 
> > call, including all the stack frame preparation for `ra' preservation; 
> > it's less likely with say `writeq' which probably always ends with a tail 
> > call to `__raw_writeq' as no stack frame is required in that case.
> > 
> >  That for the read accessors.
> 
> I think that making the read*_relaxed functions extern inline just causes 
> source code bloat with no practical gain - if we make them extern inline, 
> we would need two implementations (one in the include file, the other in 
> the C file) - and it is not good practice to duplicate code.

 We do that already with the existing accessors, and while I agree the 
duplication is a bit unfortunate and could be solved with some macro 
hackery so that there is a single version in arch/alpha/include/asm/io.h 
that gets pasted to produce out-of-line copies in arch/alpha/kernel/io.c.  
That would be good having across all the accessors, but that is not 
relevant to the regression discussed here and therefore I do not request 
that you make such a clean-up as a part of this series.

> The functions __raw_read* are already extern inline, so the compiler will 
> inline/noinline them depending on the macros trivial_io_bw and 
> trivial_io_lq - so we can just call them from read*_relaxed without 
> repeating the extern inline pattern.

 The whole point of this peculiar arrangement for cooked accessors is to 
avoid having barriers inserted inline around out-of-line calls to raw 
accessors, and therefore I maintain that we need to have the arrangement 
applied consistently across all the cooked accessors.  Since you're 
creating new distinct cooked accessors (by making their names no longer 
alias to existing cooked accessors), they need to follow the pattern.

 NB this arrangement was added back in 2.6.9-rc3, with:

ChangeSet@1.1939.5.8, 2004-09-22 22:40:06-07:00, rth@kanga.twiddle.home
  [ALPHA] Implement new ioread interface.

I believe.  I don't know if any further details or discussion around that 
can be chased, but Richard might be able to chime in.

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v6] alpha: fix memory barriers so that they conform to the specification
  2020-05-25 14:45                                     ` Maciej W. Rozycki
@ 2020-05-25 15:53                                       ` Mikulas Patocka
  2020-05-26 14:47                                         ` [PATCH v7] " Mikulas Patocka
  2020-05-25 15:54                                       ` [PATCH v5] " Mikulas Patocka
  1 sibling, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-25 15:53 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti
platform. The patches move memory barriers after a write before the write.
The result is that if there's iowrite followed by ioread, there is no
barrier between them.

The Alpha architecture allows reordering of the accesses to the I/O space,
and the missing barrier between write and read causes hang with serial
port and real time clock.

This patch makes barriers confiorm to the specification.

1. We add mb() before readX_relaxed and writeX_relaxed -
   memory-barriers.txt claims that these functions must be ordered w.r.t.
   each other. Alpha doesn't order them, so we need an explicit barrier.
2. We add mb() before reads from the I/O space - so that if there's a
   write followed by a read, there should be a barrier between them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org      # v4.17+
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

---
 arch/alpha/include/asm/io.h |   74 +++++++++++++++++++++++++++++++++++---------
 arch/alpha/kernel/io.c      |   60 +++++++++++++++++++++++++++++++----
 2 files changed, 112 insertions(+), 22 deletions(-)

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-25 15:36:16.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-25 17:45:52.000000000 +0200
@@ -310,14 +310,18 @@ static inline int __is_mmio(const volati
 #if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
 extern inline unsigned int ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 extern inline unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
@@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned
 #if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
 extern inline unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -403,14 +409,18 @@ extern inline void __raw_writew(u16 b, v
 
 extern inline u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 extern inline u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
@@ -451,14 +461,18 @@ extern inline void __raw_writeq(u64 b, v
 
 extern inline u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 extern inline u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }
@@ -487,14 +501,44 @@ extern inline void writeq(u64 b, volatil
 #define outb_p		outb
 #define outw_p		outw
 #define outl_p		outl
-#define readb_relaxed(addr)	__raw_readb(addr)
-#define readw_relaxed(addr)	__raw_readw(addr)
-#define readl_relaxed(addr)	__raw_readl(addr)
-#define readq_relaxed(addr)	__raw_readq(addr)
-#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
-#define writew_relaxed(b, addr)	__raw_writew(b, addr)
-#define writel_relaxed(b, addr)	__raw_writel(b, addr)
-#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
+
+extern u8 readb_relaxed(const volatile void __iomem *addr);
+extern u16 readw_relaxed(const volatile void __iomem *addr);
+extern u32 readl_relaxed(const volatile void __iomem *addr);
+extern u64 readq_relaxed(const volatile void __iomem *addr);
+
+#if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
+extern inline u8 readb_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readb(addr);
+}
+
+extern inline u16 readw_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readw(addr);
+}
+#endif
+
+#if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
+extern inline u32 readl_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readl(addr);
+}
+
+extern inline u64 readq_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readq(addr);
+}
+#endif
+
+#define writeb_relaxed	writeb
+#define writeb_relaxew	writew
+#define writeb_relaxel	writel
+#define writeb_relaxeq	writeq
 
 /*
  * String version of IO memory access ops:
Index: linux-stable/arch/alpha/kernel/io.c
===================================================================
--- linux-stable.orig/arch/alpha/kernel/io.c	2020-05-25 15:36:16.000000000 +0200
+++ linux-stable/arch/alpha/kernel/io.c	2020-05-25 17:47:02.000000000 +0200
@@ -16,21 +16,27 @@
 unsigned int
 ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq);
 
 u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
 
 u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }
@@ -207,6 +221,38 @@ EXPORT_SYMBOL(writew);
 EXPORT_SYMBOL(writel);
 EXPORT_SYMBOL(writeq);
 
+/*
+ * The _relaxed functions must be ordered w.r.t. each other, but they don't
+ * have to be ordered w.r.t. other memory accesses.
+ */
+u8 readb_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readb(addr);
+}
+
+u16 readw_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readw(addr);
+}
+
+u32 readl_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readl(addr);
+}
+
+u64 readq_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readq(addr);
+}
+
+EXPORT_SYMBOL(readb_relaxed);
+EXPORT_SYMBOL(readw_relaxed);
+EXPORT_SYMBOL(readl_relaxed);
+EXPORT_SYMBOL(readq_relaxed);
 
 /*
  * Read COUNT 8-bit bytes from port PORT into memory starting at SRC.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-25 14:45                                     ` Maciej W. Rozycki
  2020-05-25 15:53                                       ` [PATCH v6] " Mikulas Patocka
@ 2020-05-25 15:54                                       ` Mikulas Patocka
  2020-05-25 16:39                                         ` Maciej W. Rozycki
  1 sibling, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-25 15:54 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc



On Mon, 25 May 2020, Maciej W. Rozycki wrote:

> On Mon, 25 May 2020, Mikulas Patocka wrote:
> 
> > The functions __raw_read* are already extern inline, so the compiler will 
> > inline/noinline them depending on the macros trivial_io_bw and 
> > trivial_io_lq - so we can just call them from read*_relaxed without 
> > repeating the extern inline pattern.
> 
>  The whole point of this peculiar arrangement for cooked accessors is to 
> avoid having barriers inserted inline around out-of-line calls to raw 
> accessors,

I see, but why do we want to avoid that? Linux kernel has no binary 
compatibility, so it doesn't matter if the barriers are inlined in the 
drivers or not.

Anyway, I've sent a next version of the patch that makes read*_relaxed 
extern inline.

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-25 15:54                                       ` [PATCH v5] " Mikulas Patocka
@ 2020-05-25 16:39                                         ` Maciej W. Rozycki
  2020-05-26 14:48                                           ` Mikulas Patocka
  0 siblings, 1 reply; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-25 16:39 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Mon, 25 May 2020, Mikulas Patocka wrote:

> > > The functions __raw_read* are already extern inline, so the compiler will 
> > > inline/noinline them depending on the macros trivial_io_bw and 
> > > trivial_io_lq - so we can just call them from read*_relaxed without 
> > > repeating the extern inline pattern.
> > 
> >  The whole point of this peculiar arrangement for cooked accessors is to 
> > avoid having barriers inserted inline around out-of-line calls to raw 
> > accessors,
> 
> I see, but why do we want to avoid that? Linux kernel has no binary 
> compatibility, so it doesn't matter if the barriers are inlined in the 
> drivers or not.

 It does matter as it expands code unnecessarily (at all call sites), as I 
noted in the original review.  This has nothing to do with compatibility.

> Anyway, I've sent a next version of the patch that makes read*_relaxed 
> extern inline.

 Thanks, I'll have a look.  And now that you have this update, please run 
`size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the 
difference is between `read*_relaxed' handlers `static inline' and keyed 
with `*trivial_rw_*'.

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v7] alpha: fix memory barriers so that they conform to the specification
  2020-05-25 15:53                                       ` [PATCH v6] " Mikulas Patocka
@ 2020-05-26 14:47                                         ` Mikulas Patocka
  2020-05-27  0:18                                           ` Maciej W. Rozycki
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-26 14:47 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti
platform. The patches move memory barriers after a write before the write.
The result is that if there's iowrite followed by ioread, there is no
barrier between them.

The Alpha architecture allows reordering of the accesses to the I/O space,
and the missing barrier between write and read causes hang with serial
port and real time clock.

This patch makes barriers confiorm to the specification.

1. We add mb() before readX_relaxed and writeX_relaxed -
   memory-barriers.txt claims that these functions must be ordered w.r.t.
   each other. Alpha doesn't order them, so we need an explicit barrier.
2. We add mb() before reads from the I/O space - so that if there's a
   write followed by a read, there should be a barrier between them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org      # v4.17+
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

---
 arch/alpha/include/asm/io.h |   74 +++++++++++++++++++++++++++++++++++---------
 arch/alpha/kernel/io.c      |   60 +++++++++++++++++++++++++++++++----
 2 files changed, 112 insertions(+), 22 deletions(-)

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-25 15:36:16.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-26 16:32:42.000000000 +0200
@@ -310,14 +310,18 @@ static inline int __is_mmio(const volati
 #if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
 extern inline unsigned int ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 extern inline unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
@@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned
 #if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
 extern inline unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -403,14 +409,18 @@ extern inline void __raw_writew(u16 b, v
 
 extern inline u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 extern inline u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
@@ -451,14 +461,18 @@ extern inline void __raw_writeq(u64 b, v
 
 extern inline u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 extern inline u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }
@@ -487,14 +501,44 @@ extern inline void writeq(u64 b, volatil
 #define outb_p		outb
 #define outw_p		outw
 #define outl_p		outl
-#define readb_relaxed(addr)	__raw_readb(addr)
-#define readw_relaxed(addr)	__raw_readw(addr)
-#define readl_relaxed(addr)	__raw_readl(addr)
-#define readq_relaxed(addr)	__raw_readq(addr)
-#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
-#define writew_relaxed(b, addr)	__raw_writew(b, addr)
-#define writel_relaxed(b, addr)	__raw_writel(b, addr)
-#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
+
+extern u8 readb_relaxed(const volatile void __iomem *addr);
+extern u16 readw_relaxed(const volatile void __iomem *addr);
+extern u32 readl_relaxed(const volatile void __iomem *addr);
+extern u64 readq_relaxed(const volatile void __iomem *addr);
+
+#if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
+extern inline u8 readb_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readb(addr);
+}
+
+extern inline u16 readw_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readw(addr);
+}
+#endif
+
+#if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
+extern inline u32 readl_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readl(addr);
+}
+
+extern inline u64 readq_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readq(addr);
+}
+#endif
+
+#define writeb_relaxed	writeb
+#define writew_relaxed	writew
+#define writel_relaxed	writel
+#define writeq_relaxed	writeq
 
 /*
  * String version of IO memory access ops:
Index: linux-stable/arch/alpha/kernel/io.c
===================================================================
--- linux-stable.orig/arch/alpha/kernel/io.c	2020-05-25 15:36:16.000000000 +0200
+++ linux-stable/arch/alpha/kernel/io.c	2020-05-25 17:47:02.000000000 +0200
@@ -16,21 +16,27 @@
 unsigned int
 ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq);
 
 u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
 
 u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }
@@ -207,6 +221,38 @@ EXPORT_SYMBOL(writew);
 EXPORT_SYMBOL(writel);
 EXPORT_SYMBOL(writeq);
 
+/*
+ * The _relaxed functions must be ordered w.r.t. each other, but they don't
+ * have to be ordered w.r.t. other memory accesses.
+ */
+u8 readb_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readb(addr);
+}
+
+u16 readw_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readw(addr);
+}
+
+u32 readl_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readl(addr);
+}
+
+u64 readq_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readq(addr);
+}
+
+EXPORT_SYMBOL(readb_relaxed);
+EXPORT_SYMBOL(readw_relaxed);
+EXPORT_SYMBOL(readl_relaxed);
+EXPORT_SYMBOL(readq_relaxed);
 
 /*
  * Read COUNT 8-bit bytes from port PORT into memory starting at SRC.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-25 16:39                                         ` Maciej W. Rozycki
@ 2020-05-26 14:48                                           ` Mikulas Patocka
  2020-05-27  0:23                                             ` Maciej W. Rozycki
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-05-26 14:48 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc



On Mon, 25 May 2020, Maciej W. Rozycki wrote:

> On Mon, 25 May 2020, Mikulas Patocka wrote:
> 
> > > > The functions __raw_read* are already extern inline, so the compiler will 
> > > > inline/noinline them depending on the macros trivial_io_bw and 
> > > > trivial_io_lq - so we can just call them from read*_relaxed without 
> > > > repeating the extern inline pattern.
> > > 
> > >  The whole point of this peculiar arrangement for cooked accessors is to 
> > > avoid having barriers inserted inline around out-of-line calls to raw 
> > > accessors,
> > 
> > I see, but why do we want to avoid that? Linux kernel has no binary 
> > compatibility, so it doesn't matter if the barriers are inlined in the 
> > drivers or not.
> 
>  It does matter as it expands code unnecessarily (at all call sites), as I 
> noted in the original review.  This has nothing to do with compatibility.
> 
> > Anyway, I've sent a next version of the patch that makes read*_relaxed 
> > extern inline.
> 
>  Thanks, I'll have a look.  And now that you have this update, please run 
> `size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the 
> difference is between `read*_relaxed' handlers `static inline' and keyed 
> with `*trivial_rw_*'.
> 
>   Maciej

The patch with static inline:
   text    data     bss     dec     hex filename
124207762       75953010        5426432 205587204       c410304 vmlinux

The patch with extern inline:
   text    data     bss     dec     hex filename
124184422       75953474        5426432 205564328       c40a9a8 vmlinux

(I've sent version 7 of the patch because I misnamed the "write_relaxed" 
function in version 6).

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7] alpha: fix memory barriers so that they conform to the specification
  2020-05-26 14:47                                         ` [PATCH v7] " Mikulas Patocka
@ 2020-05-27  0:18                                           ` Maciej W. Rozycki
  2020-06-08  6:58                                             ` Mikulas Patocka
  0 siblings, 1 reply; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-27  0:18 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Tue, 26 May 2020, Mikulas Patocka wrote:

> This patch makes barriers confiorm to the specification.
> 
> 1. We add mb() before readX_relaxed and writeX_relaxed -
>    memory-barriers.txt claims that these functions must be ordered w.r.t.
>    each other. Alpha doesn't order them, so we need an explicit barrier.
> 2. We add mb() before reads from the I/O space - so that if there's a
>    write followed by a read, there should be a barrier between them.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> Cc: stable@vger.kernel.org      # v4.17+
> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

 LGTM, thanks for persistence!

Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org>

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification
  2020-05-26 14:48                                           ` Mikulas Patocka
@ 2020-05-27  0:23                                             ` Maciej W. Rozycki
  0 siblings, 0 replies; 45+ messages in thread
From: Maciej W. Rozycki @ 2020-05-27  0:23 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Tue, 26 May 2020, Mikulas Patocka wrote:

> >  Thanks, I'll have a look.  And now that you have this update, please run 
> > `size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the 
> > difference is between `read*_relaxed' handlers `static inline' and keyed 
> > with `*trivial_rw_*'.
> > 
> >   Maciej
> 
> The patch with static inline:
>    text    data     bss     dec     hex filename
> 124207762       75953010        5426432 205587204       c410304 vmlinux
> 
> The patch with extern inline:
>    text    data     bss     dec     hex filename
> 124184422       75953474        5426432 205564328       c40a9a8 vmlinux

 I'm not sure why data grew, but still the gain in size is maybe small, 
yet it is there.

> (I've sent version 7 of the patch because I misnamed the "write_relaxed" 
> function in version 6).

 I have reviewed v7 now, thanks for your effort!

  Maciej

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7] alpha: fix memory barriers so that they conform to the specification
  2020-05-27  0:18                                           ` Maciej W. Rozycki
@ 2020-06-08  6:58                                             ` Mikulas Patocka
  2020-06-08 23:49                                               ` Matt Turner
  0 siblings, 1 reply; 45+ messages in thread
From: Mikulas Patocka @ 2020-06-08  6:58 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann,
	Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc



On Wed, 27 May 2020, Maciej W. Rozycki wrote:

> On Tue, 26 May 2020, Mikulas Patocka wrote:
> 
> > This patch makes barriers confiorm to the specification.
> > 
> > 1. We add mb() before readX_relaxed and writeX_relaxed -
> >    memory-barriers.txt claims that these functions must be ordered w.r.t.
> >    each other. Alpha doesn't order them, so we need an explicit barrier.
> > 2. We add mb() before reads from the I/O space - so that if there's a
> >    write followed by a read, there should be a barrier between them.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
> > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> > Cc: stable@vger.kernel.org      # v4.17+
> > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> 
>  LGTM, thanks for persistence!
> 
> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org>
> 
>   Maciej

Hi

Will you submit the patch to Linus' tree in this merge window?

Mikulas


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7] alpha: fix memory barriers so that they conform to the specification
  2020-06-08  6:58                                             ` Mikulas Patocka
@ 2020-06-08 23:49                                               ` Matt Turner
  0 siblings, 0 replies; 45+ messages in thread
From: Matt Turner @ 2020-06-08 23:49 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Maciej W. Rozycki, Ivan Kokshaysky, Maciej W. Rozycki,
	Arnd Bergmann, Richard Henderson, Greg Kroah-Hartman, alpha,
	linux-serial, linux-rtc

On Sun, Jun 7, 2020 at 11:58 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> Will you submit the patch to Linus' tree in this merge window?
>
> Mikulas

I will do it. Thank you, Mikulas!

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-06-08 23:49 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-06 11:21 [PATCH 1/2] alpha: add a delay between RTC port write and read Mikulas Patocka
2020-05-06 14:20 ` Arnd Bergmann
2020-05-06 17:12   ` [PATCH 1/2 v2] alpha: add a delay to inb_p, inb_w and inb_l Mikulas Patocka
2020-05-07  8:06     ` [PATCH 1/2 v3] " Mikulas Patocka
2020-05-07  8:20       ` Greg Kroah-Hartman
2020-05-07 10:53         ` Mikulas Patocka
2020-05-07 13:30       ` Arnd Bergmann
2020-05-07 14:09         ` Mikulas Patocka
2020-05-07 15:08           ` Arnd Bergmann
2020-05-07 15:45             ` Mikulas Patocka
2020-05-07 15:46             ` [PATCH v4] alpha: add a barrier after outb, outw and outl Mikulas Patocka
2020-05-07 19:12               ` Arnd Bergmann
2020-05-10  1:27                 ` Maciej W. Rozycki
2020-05-10  1:25             ` [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l Maciej W. Rozycki
2020-05-10 18:50               ` Mikulas Patocka
2020-05-11 14:58                 ` Maciej W. Rozycki
2020-05-12 19:35                   ` Mikulas Patocka
2020-05-13 14:41                   ` Ivan Kokshaysky
2020-05-13 16:13                     ` Greg Kroah-Hartman
2020-05-13 17:17                     ` Maciej W. Rozycki
2020-05-22 13:03                       ` Mikulas Patocka
2020-05-22 13:37                         ` Maciej W. Rozycki
2020-05-22 13:26                     ` Mikulas Patocka
2020-05-22 20:00                       ` Mikulas Patocka
2020-05-23 10:26                         ` [PATCH v4] alpha: fix memory barriers so that they conform to the specification Mikulas Patocka
2020-05-23 15:10                           ` Ivan Kokshaysky
2020-05-23 15:34                             ` Mikulas Patocka
2020-05-23 15:37                               ` [PATCH v5] " Mikulas Patocka
2020-05-24 14:54                                 ` Maciej W. Rozycki
2020-05-25 13:56                                   ` Mikulas Patocka
2020-05-25 14:07                                     ` Arnd Bergmann
2020-05-25 14:45                                     ` Maciej W. Rozycki
2020-05-25 15:53                                       ` [PATCH v6] " Mikulas Patocka
2020-05-26 14:47                                         ` [PATCH v7] " Mikulas Patocka
2020-05-27  0:18                                           ` Maciej W. Rozycki
2020-06-08  6:58                                             ` Mikulas Patocka
2020-06-08 23:49                                               ` Matt Turner
2020-05-25 15:54                                       ` [PATCH v5] " Mikulas Patocka
2020-05-25 16:39                                         ` Maciej W. Rozycki
2020-05-26 14:48                                           ` Mikulas Patocka
2020-05-27  0:23                                             ` Maciej W. Rozycki
2020-05-23 16:44                               ` [PATCH v4] " Maciej W. Rozycki
2020-05-23 17:09                                 ` Mikulas Patocka
2020-05-23 19:27                                   ` Maciej W. Rozycki
2020-05-23 20:11                                     ` Mikulas Patocka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).