* [PATCH 1/2] alpha: add a delay between RTC port write and read @ 2020-05-06 11:21 Mikulas Patocka 2020-05-06 14:20 ` Arnd Bergmann 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-06 11:21 UTC (permalink / raw) To: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman Cc: linux-alpha, Sinan Kaya, Arnd Bergmann, linux-serial, linux-rtc The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on the Alpha Avanti platform. The patch changes timing between accesses to the ISA bus, in particular, it reduces the time between "write" access and a subsequent "read" access. This causes lock-up when accessing the real time clock and serial ports. This patch fixes the real time clock by adding a small delay between outb_p and inb_p. Note that we don't have to add the delay to CMOS_WRITE, because it consists of two write accesses and they already have mb() between them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ --- arch/alpha/include/asm/mc146818rtc.h | 4 ++++ 1 file changed, 4 insertions(+) Index: linux-stable/arch/alpha/include/asm/mc146818rtc.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/mc146818rtc.h 2020-05-05 20:48:30.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/mc146818rtc.h 2020-05-05 21:05:53.000000000 +0200 @@ -15,9 +15,13 @@ /* * The yet supported machines all access the RTC index register via * an ISA port access but the way to access the date register differs ... + * + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses, + * we need to add a small delay. */ #define CMOS_READ(addr) ({ \ outb_p((addr),RTC_PORT(0)); \ +udelay(2); \ inb_p(RTC_PORT(1)); \ }) #define CMOS_WRITE(val, addr) ({ \ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] alpha: add a delay between RTC port write and read 2020-05-06 11:21 [PATCH 1/2] alpha: add a delay between RTC port write and read Mikulas Patocka @ 2020-05-06 14:20 ` Arnd Bergmann 2020-05-06 17:12 ` [PATCH 1/2 v2] alpha: add a delay to inb_p, inb_w and inb_l Mikulas Patocka 0 siblings, 1 reply; 45+ messages in thread From: Arnd Bergmann @ 2020-05-06 14:20 UTC (permalink / raw) To: Mikulas Patocka Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, Sinan Kaya, linux-serial, linux-rtc On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > /* > * The yet supported machines all access the RTC index register via > * an ISA port access but the way to access the date register differs ... > + * > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses, > + * we need to add a small delay. > */ > #define CMOS_READ(addr) ({ \ > outb_p((addr),RTC_PORT(0)); \ > +udelay(2); \ > inb_p(RTC_PORT(1)); \ The inb_p() / outb_p() functions are meant to already have a delay in them, maybe we should just add it there for alpha? Arnd ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/2 v2] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-06 14:20 ` Arnd Bergmann @ 2020-05-06 17:12 ` Mikulas Patocka 2020-05-07 8:06 ` [PATCH 1/2 v3] " Mikulas Patocka 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-06 17:12 UTC (permalink / raw) To: Arnd Bergmann Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, Sinan Kaya, linux-serial, linux-rtc On Wed, 6 May 2020, Arnd Bergmann wrote: > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > /* > > * The yet supported machines all access the RTC index register via > > * an ISA port access but the way to access the date register differs ... > > + * > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses, > > + * we need to add a small delay. > > */ > > #define CMOS_READ(addr) ({ \ > > outb_p((addr),RTC_PORT(0)); \ > > +udelay(2); \ > > inb_p(RTC_PORT(1)); \ > > > The inb_p() / outb_p() functions are meant to already have a delay in them, > maybe we should just add it there for alpha? > > Arnd Yes, that is possible too - it fixes the real time clock hang for me. From: Mikulas Patocka <mpatocka@redhat.com> The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on the Alpha Avanti platform. The patch changes timing between accesses to the ISA bus, in particular, it reduces the time between "write" access and a subsequent "read" access. This causes lock-up when accessing the real time clock and serial ports. This patch fixes the real time clock by adding a small delay to the inb_p, inw_p and inl_p macros. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ --- arch/alpha/include/asm/io.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) Index: linux-stable/arch/alpha/include/asm/io.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-06 08:23:47.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-06 18:33:47.000000000 +0200 @@ -6,6 +6,7 @@ #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/delay.h> #include <asm/compiler.h> #include <asm/pgtable.h> #include <asm/machvec.h> @@ -481,9 +482,9 @@ extern inline void writeq(u64 b, volatil #define iowrite16be(v,p) iowrite16(cpu_to_be16(v), (p)) #define iowrite32be(v,p) iowrite32(cpu_to_be32(v), (p)) -#define inb_p inb -#define inw_p inw -#define inl_p inl +#define inb_p(x) (ndelay(300), inb(x)) +#define inw_p(x) (ndelay(300), inw(x)) +#define inl_p(x) (ndelay(300), inl(x)) #define outb_p outb #define outw_p outw #define outl_p outl ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-06 17:12 ` [PATCH 1/2 v2] alpha: add a delay to inb_p, inb_w and inb_l Mikulas Patocka @ 2020-05-07 8:06 ` Mikulas Patocka 2020-05-07 8:20 ` Greg Kroah-Hartman 2020-05-07 13:30 ` Arnd Bergmann 0 siblings, 2 replies; 45+ messages in thread From: Mikulas Patocka @ 2020-05-07 8:06 UTC (permalink / raw) To: Arnd Bergmann Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, Sinan Kaya, linux-serial, linux-rtc On Wed, 6 May 2020, Mikulas Patocka wrote: > > > On Wed, 6 May 2020, Arnd Bergmann wrote: > > > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > /* > > > * The yet supported machines all access the RTC index register via > > > * an ISA port access but the way to access the date register differs ... > > > + * > > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses, > > > + * we need to add a small delay. > > > */ > > > #define CMOS_READ(addr) ({ \ > > > outb_p((addr),RTC_PORT(0)); \ > > > +udelay(2); \ > > > inb_p(RTC_PORT(1)); \ > > > > > > The inb_p() / outb_p() functions are meant to already have a delay in them, > > maybe we should just add it there for alpha? > > > > Arnd > > Yes, that is possible too - it fixes the real time clock hang for me. > > > -#define inb_p inb > -#define inw_p inw > -#define inl_p inl > +#define inb_p(x) (ndelay(300), inb(x)) > +#define inw_p(x) (ndelay(300), inw(x)) > +#define inl_p(x) (ndelay(300), inl(x)) > #define outb_p outb > #define outw_p outw > #define outl_p outl 300ns was too low. We need at least 1400ns to fix the hang reliably. Mikulas From: Mikulas Patocka <mpatocka@redhat.com> The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on the Alpha Avanti platform. The patch changes timing between accesses to the ISA bus, in particular, it reduces the time between "write" access and a subsequent "read" access. This causes lock-up when accessing the real time clock and serial ports. This patch fixes the real time clock by adding a small delay to the inb_p, inw_p and inl_p macros. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ --- arch/alpha/include/asm/io.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) Index: linux-stable/arch/alpha/include/asm/io.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-07 09:54:52.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-07 09:54:52.000000000 +0200 @@ -6,6 +6,7 @@ #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/delay.h> #include <asm/compiler.h> #include <asm/pgtable.h> #include <asm/machvec.h> @@ -481,9 +482,9 @@ extern inline void writeq(u64 b, volatil #define iowrite16be(v,p) iowrite16(cpu_to_be16(v), (p)) #define iowrite32be(v,p) iowrite32(cpu_to_be32(v), (p)) -#define inb_p inb -#define inw_p inw -#define inl_p inl +#define inb_p(x) (ndelay(1400), inb(x)) +#define inw_p(x) (ndelay(1400), inw(x)) +#define inl_p(x) (ndelay(1400), inl(x)) #define outb_p outb #define outw_p outw #define outl_p outl ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-07 8:06 ` [PATCH 1/2 v3] " Mikulas Patocka @ 2020-05-07 8:20 ` Greg Kroah-Hartman 2020-05-07 10:53 ` Mikulas Patocka 2020-05-07 13:30 ` Arnd Bergmann 1 sibling, 1 reply; 45+ messages in thread From: Greg Kroah-Hartman @ 2020-05-07 8:20 UTC (permalink / raw) To: Mikulas Patocka Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner, alpha, Sinan Kaya, linux-serial, linux-rtc On Thu, May 07, 2020 at 04:06:31AM -0400, Mikulas Patocka wrote: > > > On Wed, 6 May 2020, Mikulas Patocka wrote: > > > > > > > On Wed, 6 May 2020, Arnd Bergmann wrote: > > > > > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > /* > > > > * The yet supported machines all access the RTC index register via > > > > * an ISA port access but the way to access the date register differs ... > > > > + * > > > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses, > > > > + * we need to add a small delay. > > > > */ > > > > #define CMOS_READ(addr) ({ \ > > > > outb_p((addr),RTC_PORT(0)); \ > > > > +udelay(2); \ > > > > inb_p(RTC_PORT(1)); \ > > > > > > > > > The inb_p() / outb_p() functions are meant to already have a delay in them, > > > maybe we should just add it there for alpha? > > > > > > Arnd > > > > Yes, that is possible too - it fixes the real time clock hang for me. > > > > > > -#define inb_p inb > > -#define inw_p inw > > -#define inl_p inl > > +#define inb_p(x) (ndelay(300), inb(x)) > > +#define inw_p(x) (ndelay(300), inw(x)) > > +#define inl_p(x) (ndelay(300), inl(x)) > > #define outb_p outb > > #define outw_p outw > > #define outl_p outl > > 300ns was too low. We need at least 1400ns to fix the hang reliably. > > Mikulas > > > From: Mikulas Patocka <mpatocka@redhat.com> > > The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder > barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on > the Alpha Avanti platform. > > The patch changes timing between accesses to the ISA bus, in particular, > it reduces the time between "write" access and a subsequent "read" access. > > This causes lock-up when accessing the real time clock and serial ports. > > This patch fixes the real time clock by adding a small delay to the inb_p, > inw_p and inl_p macros. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") > Cc: stable@vger.kernel.org # v4.17+ > > --- > arch/alpha/include/asm/io.h | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) <snip> This is not in a format that anyone can apply it in, please resend in a proper way if you wish for it to be accepted. thanks, greg k-h ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-07 8:20 ` Greg Kroah-Hartman @ 2020-05-07 10:53 ` Mikulas Patocka 0 siblings, 0 replies; 45+ messages in thread From: Mikulas Patocka @ 2020-05-07 10:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner, alpha, linux-serial, linux-rtc alpha: add a delay to inb_p, inb_w and inb_l The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on the Alpha Avanti platform. The patch changes timing between accesses to the ISA bus, in particular, it reduces the time between "write" access and a subsequent "read" access. This causes lock-up when accessing the real time clock and serial ports. This patch fixes the real time clock by adding a small delay to the inb_p, inw_p and inl_p macros. Note that we don't have to add a delay before outb_p, outw_p and outl_p, because there is already a "mb()" instruction before them and this instruction slows down access sufficiently. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ --- arch/alpha/include/asm/io.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) Index: linux-stable/arch/alpha/include/asm/io.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-07 09:54:52.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-07 09:54:52.000000000 +0200 @@ -6,6 +6,7 @@ #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/delay.h> #include <asm/compiler.h> #include <asm/pgtable.h> #include <asm/machvec.h> @@ -481,9 +482,9 @@ extern inline void writeq(u64 b, volatil #define iowrite16be(v,p) iowrite16(cpu_to_be16(v), (p)) #define iowrite32be(v,p) iowrite32(cpu_to_be32(v), (p)) -#define inb_p inb -#define inw_p inw -#define inl_p inl +#define inb_p(x) (ndelay(1400), inb(x)) +#define inw_p(x) (ndelay(1400), inw(x)) +#define inl_p(x) (ndelay(1400), inl(x)) #define outb_p outb #define outw_p outw #define outl_p outl ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-07 8:06 ` [PATCH 1/2 v3] " Mikulas Patocka 2020-05-07 8:20 ` Greg Kroah-Hartman @ 2020-05-07 13:30 ` Arnd Bergmann 2020-05-07 14:09 ` Mikulas Patocka 1 sibling, 1 reply; 45+ messages in thread From: Arnd Bergmann @ 2020-05-07 13:30 UTC (permalink / raw) To: Mikulas Patocka Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, Sinan Kaya, linux-serial, linux-rtc On Thu, May 7, 2020 at 10:06 AM Mikulas Patocka <mpatocka@redhat.com> wrote: > On Wed, 6 May 2020, Mikulas Patocka wrote: > > On Wed, 6 May 2020, Arnd Bergmann wrote: > > > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > /* > > > > * The yet supported machines all access the RTC index register via > > > > * an ISA port access but the way to access the date register differs ... > > > > + * > > > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses, > > > > + * we need to add a small delay. > > > > */ > > > > #define CMOS_READ(addr) ({ \ > > > > outb_p((addr),RTC_PORT(0)); \ > > > > +udelay(2); \ > > > > inb_p(RTC_PORT(1)); \ > > > > > > > > > The inb_p() / outb_p() functions are meant to already have a delay in them, > > > maybe we should just add it there for alpha? > > > > > > Arnd > > > > Yes, that is possible too - it fixes the real time clock hang for me. > > > > > > -#define inb_p inb > > -#define inw_p inw > > -#define inl_p inl > > +#define inb_p(x) (ndelay(300), inb(x)) > > +#define inw_p(x) (ndelay(300), inw(x)) > > +#define inl_p(x) (ndelay(300), inl(x)) > > #define outb_p outb > > #define outw_p outw > > #define outl_p outl > > 300ns was too low. We need at least 1400ns to fix the hang reliably. Are you sure that it is in fact the timing that is important here and not a barrier? I see that inb() is written in terms of readb(), but the barrier requirements for I/O space are a bit different from those on PCI memory space. In the example you gave first, there is a an outb_p() followed by inb_p(). These are normally serialized by the bus, but I/O space also has the requirement that an outb() completes before we get to the next instruction (non-posted write), while writeb() is generally posted and only needs a barrier before the write rather than both before and after like outb. Arnd ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-07 13:30 ` Arnd Bergmann @ 2020-05-07 14:09 ` Mikulas Patocka 2020-05-07 15:08 ` Arnd Bergmann 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-07 14:09 UTC (permalink / raw) To: Arnd Bergmann Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Thu, 7 May 2020, Arnd Bergmann wrote: > On Thu, May 7, 2020 at 10:06 AM Mikulas Patocka <mpatocka@redhat.com> wrote: > > On Wed, 6 May 2020, Mikulas Patocka wrote: > > > On Wed, 6 May 2020, Arnd Bergmann wrote: > > > > On Wed, May 6, 2020 at 1:21 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > /* > > > > > * The yet supported machines all access the RTC index register via > > > > > * an ISA port access but the way to access the date register differs ... > > > > > + * > > > > > + * The ISA bus on Alpha Avanti doesn't like back-to-back accesses, > > > > > + * we need to add a small delay. > > > > > */ > > > > > #define CMOS_READ(addr) ({ \ > > > > > outb_p((addr),RTC_PORT(0)); \ > > > > > +udelay(2); \ > > > > > inb_p(RTC_PORT(1)); \ > > > > > > > > > > > > The inb_p() / outb_p() functions are meant to already have a delay in them, > > > > maybe we should just add it there for alpha? > > > > > > > > Arnd > > > > > > Yes, that is possible too - it fixes the real time clock hang for me. > > > > > > > > > -#define inb_p inb > > > -#define inw_p inw > > > -#define inl_p inl > > > +#define inb_p(x) (ndelay(300), inb(x)) > > > +#define inw_p(x) (ndelay(300), inw(x)) > > > +#define inl_p(x) (ndelay(300), inl(x)) > > > #define outb_p outb > > > #define outw_p outw > > > #define outl_p outl > > > > 300ns was too low. We need at least 1400ns to fix the hang reliably. > > Are you sure that it is in fact the timing that is important here and not > a barrier? I see that inb() is written in terms of readb(), but the > barrier requirements for I/O space are a bit different from those > on PCI memory space. The "in" and "out" instructions are serializing on x86. But alpha doesn't have dedicated instructions for accessing ports. Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should be protected by two memory barriers, to emulate the x86 behavior? > In the example you gave first, there is a an outb_p() followed by inb_p(). > These are normally serialized by the bus, but I/O space also has the > requirement that an outb() completes before we get to the next > instruction (non-posted write), while writeb() is generally posted and > only needs a barrier before the write rather than both before and after > like outb. > > Arnd I think that the fact that "writeb" is posted is exactly the problem - it gets posted, the processor goes on, sends "readb" and they arrive back-to-back to the ISA bus. The ISA bus device doesn't like back-to-back accesses and locks up. Anyway - you can change the "ndelay()" function in this patch to "mb()" - "mb()" will provide long enough delay that it fixes this bug. Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-07 14:09 ` Mikulas Patocka @ 2020-05-07 15:08 ` Arnd Bergmann 2020-05-07 15:45 ` Mikulas Patocka ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Arnd Bergmann @ 2020-05-07 15:08 UTC (permalink / raw) To: Mikulas Patocka Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Thu, May 7, 2020 at 4:09 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > On Thu, 7 May 2020, Arnd Bergmann wrote: > > On Thu, May 7, 2020 at 10:06 AM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > Are you sure that it is in fact the timing that is important here and not > > a barrier? I see that inb() is written in terms of readb(), but the > > barrier requirements for I/O space are a bit different from those > > on PCI memory space. > > The "in" and "out" instructions are serializing on x86. But alpha doesn't > have dedicated instructions for accessing ports. > > Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should > be protected by two memory barriers, to emulate the x86 behavior? That's what we do on some other architectures to emulate the non-posted behavior of out[bwl], as required by PCI. I can't think of any reasons to have a barrier before in[bwl], or after write[bwl], but we generally want one after out[bwl] > > In the example you gave first, there is a an outb_p() followed by inb_p(). > > These are normally serialized by the bus, but I/O space also has the > > requirement that an outb() completes before we get to the next > > instruction (non-posted write), while writeb() is generally posted and > > only needs a barrier before the write rather than both before and after > > like outb. > > I think that the fact that "writeb" is posted is exactly the problem - it > gets posted, the processor goes on, sends "readb" and they arrive > back-to-back to the ISA bus. The ISA bus device doesn't like back-to-back > accesses and locks up. > > Anyway - you can change the "ndelay()" function in this patch to "mb()" - > "mb()" will provide long enough delay that it fixes this bug. My preference would be to have whatever makes most sense in theory and also fixes the problem. If there is some documentation that says you need a certain amount of time between accesses regardless of the barriers, then that is fine. I do wonder if there is anything enforcing the "rpcc" in _delay() to come after the store if there is no barrier between the two, otherwise the delay method still seems unreliable. The barrier after the store at least makes sense to me based on the theory, both with and without a delay in outb_p(). Arnd ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-07 15:08 ` Arnd Bergmann @ 2020-05-07 15:45 ` Mikulas Patocka 2020-05-07 15:46 ` [PATCH v4] alpha: add a barrier after outb, outw and outl Mikulas Patocka 2020-05-10 1:25 ` [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l Maciej W. Rozycki 2 siblings, 0 replies; 45+ messages in thread From: Mikulas Patocka @ 2020-05-07 15:45 UTC (permalink / raw) To: Arnd Bergmann Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Thu, 7 May 2020, Arnd Bergmann wrote: > On Thu, May 7, 2020 at 4:09 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > On Thu, 7 May 2020, Arnd Bergmann wrote: > > > On Thu, May 7, 2020 at 10:06 AM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > Are you sure that it is in fact the timing that is important here and not > > > a barrier? I see that inb() is written in terms of readb(), but the > > > barrier requirements for I/O space are a bit different from those > > > on PCI memory space. > > > > The "in" and "out" instructions are serializing on x86. But alpha doesn't > > have dedicated instructions for accessing ports. > > > > Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should > > be protected by two memory barriers, to emulate the x86 behavior? > > That's what we do on some other architectures to emulate the non-posted > behavior of out[bwl], as required by PCI. I can't think of any reasons to > have a barrier before in[bwl], or after write[bwl], but we generally want > one after out[bwl] Yes - so we can add a barrier after out[bwl]. It also fixes the serial port issue, so we no longer need the serial driver patch for Greg. > > > In the example you gave first, there is a an outb_p() followed by inb_p(). > > > These are normally serialized by the bus, but I/O space also has the > > > requirement that an outb() completes before we get to the next > > > instruction (non-posted write), while writeb() is generally posted and > > > only needs a barrier before the write rather than both before and after > > > like outb. > > > > I think that the fact that "writeb" is posted is exactly the problem - it > > gets posted, the processor goes on, sends "readb" and they arrive > > back-to-back to the ISA bus. The ISA bus device doesn't like back-to-back > > accesses and locks up. > > > > Anyway - you can change the "ndelay()" function in this patch to "mb()" - > > "mb()" will provide long enough delay that it fixes this bug. > > My preference would be to have whatever makes most sense in theory > and also fixes the problem. If there is some documentation that > says you need a certain amount of time between accesses regardless > of the barriers, then that is fine. I do wonder if there is anything > enforcing the "rpcc" in _delay() to come after the store if there is no > barrier between the two, otherwise the delay method still seems > unreliable. I measured ndelay - and the overhead of the instruction rpcc is already very high. ndelay(1) takes 300ns. > The barrier after the store at least makes sense to me based on > the theory, both with and without a delay in outb_p(). > > Arnd Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v4] alpha: add a barrier after outb, outw and outl 2020-05-07 15:08 ` Arnd Bergmann 2020-05-07 15:45 ` Mikulas Patocka @ 2020-05-07 15:46 ` Mikulas Patocka 2020-05-07 19:12 ` Arnd Bergmann 2020-05-10 1:25 ` [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l Maciej W. Rozycki 2 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-07 15:46 UTC (permalink / raw) To: Arnd Bergmann Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on the Alpha Avanti platform. The patch changes timing between accesses to the ISA bus, in particular, it reduces the time between "write" access and a subsequent "read" access. This causes lock-up when accessing the real time clock and serial ports. This patch fixes the bug by adding a memory barrier after the functions that access the ISA ports - outb, outw, outl. The barrier causes that there is some delay between the write to an IO port and a subsequent read. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ --- arch/alpha/include/asm/io.h | 3 +++ arch/alpha/kernel/io.c | 3 +++ 2 files changed, 6 insertions(+) Index: linux-stable/arch/alpha/include/asm/io.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-07 17:36:58.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-07 17:36:58.000000000 +0200 @@ -347,11 +347,13 @@ extern inline u16 inw(unsigned long port extern inline void outb(u8 b, unsigned long port) { iowrite8(b, ioport_map(port, 1)); + mb(); } extern inline void outw(u16 b, unsigned long port) { iowrite16(b, ioport_map(port, 2)); + mb(); } #endif @@ -377,6 +379,7 @@ extern inline u32 inl(unsigned long port extern inline void outl(u32 b, unsigned long port) { iowrite32(b, ioport_map(port, 4)); + mb(); } #endif Index: linux-stable/arch/alpha/kernel/io.c =================================================================== --- linux-stable.orig/arch/alpha/kernel/io.c 2020-05-07 17:36:58.000000000 +0200 +++ linux-stable/arch/alpha/kernel/io.c 2020-05-07 17:36:58.000000000 +0200 @@ -78,16 +78,19 @@ u32 inl(unsigned long port) void outb(u8 b, unsigned long port) { iowrite8(b, ioport_map(port, 1)); + mb(); } void outw(u16 b, unsigned long port) { iowrite16(b, ioport_map(port, 2)); + mb(); } void outl(u32 b, unsigned long port) { iowrite32(b, ioport_map(port, 4)); + mb(); } EXPORT_SYMBOL(inb); ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v4] alpha: add a barrier after outb, outw and outl 2020-05-07 15:46 ` [PATCH v4] alpha: add a barrier after outb, outw and outl Mikulas Patocka @ 2020-05-07 19:12 ` Arnd Bergmann 2020-05-10 1:27 ` Maciej W. Rozycki 0 siblings, 1 reply; 45+ messages in thread From: Arnd Bergmann @ 2020-05-07 19:12 UTC (permalink / raw) To: Mikulas Patocka Cc: Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Thu, May 7, 2020 at 5:46 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder > barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on > the Alpha Avanti platform. > > The patch changes timing between accesses to the ISA bus, in particular, > it reduces the time between "write" access and a subsequent "read" access. > > This causes lock-up when accessing the real time clock and serial ports. > > This patch fixes the bug by adding a memory barrier after the functions > that access the ISA ports - outb, outw, outl. The barrier causes that > there is some delay between the write to an IO port and a subsequent read. Based on your earlier explanations, I would mention here that the barrier avoids the back-to-back I/O instructions on the bus that seem to be causing the problem. As I understand it (having very little alpha specific knowledge), they should prevent them by design. However if you are sure it's just the added delay rather than any actual barrier effect, that would also be worth pointing out. > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") > Cc: stable@vger.kernel.org # v4.17+ With or without any further clarification Acked-by: Arnd Bergmann <arnd@arndb.de> ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v4] alpha: add a barrier after outb, outw and outl 2020-05-07 19:12 ` Arnd Bergmann @ 2020-05-10 1:27 ` Maciej W. Rozycki 0 siblings, 0 replies; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-10 1:27 UTC (permalink / raw) To: Arnd Bergmann Cc: Mikulas Patocka, Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Thu, 7 May 2020, Arnd Bergmann wrote: > > The patch 92d7223a74235054f2aa7227d207d9c57f84dca0 ("alpha: io: reorder > > barriers to guarantee writeX() and iowriteX() ordering #2") broke boot on > > the Alpha Avanti platform. > > > > The patch changes timing between accesses to the ISA bus, in particular, > > it reduces the time between "write" access and a subsequent "read" access. > > > > This causes lock-up when accessing the real time clock and serial ports. > > > > This patch fixes the bug by adding a memory barrier after the functions > > that access the ISA ports - outb, outw, outl. The barrier causes that > > there is some delay between the write to an IO port and a subsequent read. There is no delay guarantee, just an in-order completion guarantee: #define mb() __asm__ __volatile__("mb": : :"memory") MB is a hardware memory barrier instruction. > Based on your earlier explanations, I would mention here that the barrier > avoids the back-to-back I/O instructions on the bus that seem to be causing > the problem. As I understand it (having very little alpha specific knowledge), > they should prevent them by design. However if you are sure it's just the > added delay rather than any actual barrier effect, that would also be worth > pointing out. Alpha is weakly ordered, also WRT MMIO. Writing a simple test program to poke directly (e.g. using `ioremap' and then inline asm on the location obtained) at RTC and UART registers would be a good way to determine what is really going on here. Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-07 15:08 ` Arnd Bergmann 2020-05-07 15:45 ` Mikulas Patocka 2020-05-07 15:46 ` [PATCH v4] alpha: add a barrier after outb, outw and outl Mikulas Patocka @ 2020-05-10 1:25 ` Maciej W. Rozycki 2020-05-10 18:50 ` Mikulas Patocka 2 siblings, 1 reply; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-10 1:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Mikulas Patocka, Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Thu, 7 May 2020, Arnd Bergmann wrote: > > Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should > > be protected by two memory barriers, to emulate the x86 behavior? > > That's what we do on some other architectures to emulate the non-posted > behavior of out[bwl], as required by PCI. I can't think of any reasons to > have a barrier before in[bwl], or after write[bwl], but we generally want > one after out[bwl] Alpha is weakly ordered, also WRT MMIO. The details are a bit obscure (and were discussed before in a previous iteration of these patches), but my understanding is multiple writes can be merged and writes can be reordered WRT reads, even on UP. It's generally better for performance to have ordering barriers before MMIO operations rather than afterwards, unless a completion barrier is also required (e.g. for level-triggered interrupt acknowledgement). Currently we don't fully guarantee that `outX' won't be posted (from memory-barriers.txt): " (*) inX(), outX(): [...] Device drivers may expect outX() to emit a non-posted write transaction that waits for a completion response from the I/O peripheral before returning. This is not guaranteed by all architectures and is therefore not part of the portable ordering semantics." Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-10 1:25 ` [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l Maciej W. Rozycki @ 2020-05-10 18:50 ` Mikulas Patocka 2020-05-11 14:58 ` Maciej W. Rozycki 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-10 18:50 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sun, 10 May 2020, Maciej W. Rozycki wrote: > On Thu, 7 May 2020, Arnd Bergmann wrote: > > > > Do you think that all the "in[bwl]" and "out[bwl]" macros on alpha should > > > be protected by two memory barriers, to emulate the x86 behavior? > > > > That's what we do on some other architectures to emulate the non-posted > > behavior of out[bwl], as required by PCI. I can't think of any reasons to > > have a barrier before in[bwl], or after write[bwl], but we generally want > > one after out[bwl] > > Alpha is weakly ordered, also WRT MMIO. The details are a bit obscure > (and were discussed before in a previous iteration of these patches), but > my understanding is multiple writes can be merged and writes can be > reordered WRT reads, even on UP. It's generally better for performance to We discussed it some times ago, and the conclusion was that reads and writes to the same device are not reordered on Alpha. Reads and writes to different devices or to memory may be reordered. In these problematic cases, we only access serial port or real time clock using a few ports (and these devices don't have DMA, so there's not any interaction with memory) - so I conclude that it is timing problem and not I/O reordering problem. > have ordering barriers before MMIO operations rather than afterwards, > unless a completion barrier is also required (e.g. for level-triggered > interrupt acknowledgement). > > Currently we don't fully guarantee that `outX' won't be posted (from > memory-barriers.txt): > > " (*) inX(), outX(): > [...] > Device drivers may expect outX() to emit a non-posted write transaction > that waits for a completion response from the I/O peripheral before > returning. This is not guaranteed by all architectures and is therefore > not part of the portable ordering semantics." > > Maciej Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-10 18:50 ` Mikulas Patocka @ 2020-05-11 14:58 ` Maciej W. Rozycki 2020-05-12 19:35 ` Mikulas Patocka 2020-05-13 14:41 ` Ivan Kokshaysky 0 siblings, 2 replies; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-11 14:58 UTC (permalink / raw) To: Mikulas Patocka Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sun, 10 May 2020, Mikulas Patocka wrote: > > > That's what we do on some other architectures to emulate the non-posted > > > behavior of out[bwl], as required by PCI. I can't think of any reasons to > > > have a barrier before in[bwl], or after write[bwl], but we generally want > > > one after out[bwl] > > > > Alpha is weakly ordered, also WRT MMIO. The details are a bit obscure > > (and were discussed before in a previous iteration of these patches), but > > my understanding is multiple writes can be merged and writes can be > > reordered WRT reads, even on UP. It's generally better for performance to > > We discussed it some times ago, and the conclusion was that reads and > writes to the same device are not reordered on Alpha. Reads and writes to > different devices or to memory may be reordered. Except that "device" in this context is a particular MMIO location; most peripherals span multiple locations, including the RTC and the UART in the Avanti line in particular. > In these problematic cases, we only access serial port or real time clock > using a few ports (and these devices don't have DMA, so there's not any > interaction with memory) - so I conclude that it is timing problem and not > I/O reordering problem. Individual PCI port locations correspond to different MMIO locations, so yes, accesses to these can be reordered (merging won't happen due to the use of the sparse address space). As I noted using a small program to verify actual behaviour ought to reveal what the problem really is. And /dev/mem can be mmapped for PCI port I/O access on Alpha (some X-servers do this), so it can be done even in the userland with a running system. And if timing is indeed the culprit, then I think it will be best fixed in the 82378IB southbridge, i.e.[1]: "The I/O recovery mechanism in the SIO is used to add additional recovery delay between PCI originated 8-bit and 16-bit I/O cycles to the ISA Bus. The SIO automatically forces a minimum delay of four SYSCLKs between back-to-back 8 and 16 bit I/O cycles to the ISA Bus. The delay is measured from the rising edge of the I/O command (IOR# or IOW#) to the falling edge of the next BALE. If a delay of greater than four SYSCLKs is required, the ISA I/O Recovery Time Register can be programmed to increase the delay in increments of SYSCLKs. Note that no additional delay is inserted for back-to-back I/O "sub cycles" generated as a result of byte assembly or disassembly. This register defaults to 8 and 16-bit recovery enabled with two clocks added to the standard I/O recovery." where it won't be causing unnecessary overhead for native PCI devices or indeed excessive one for ISA devices. It might be interesting to note that later SIO versions like the 82378ZB increased the minimum to five SYSCLKs, so maybe a missing SYSCLK (that can still be inserted by suitably programming the ICRT) is the source of the problem? References: [1] "82378IB System I/O (SIO)", April 1993, Intel Corporation, Order Number: 290473-002, Section 4.1.17 "ICRT -- ISA Controller Recovery Timer Register" Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-11 14:58 ` Maciej W. Rozycki @ 2020-05-12 19:35 ` Mikulas Patocka 2020-05-13 14:41 ` Ivan Kokshaysky 1 sibling, 0 replies; 45+ messages in thread From: Mikulas Patocka @ 2020-05-12 19:35 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Arnd Bergmann, Richard Henderson, Ivan Kokshaysky, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Mon, 11 May 2020, Maciej W. Rozycki wrote: > And if timing is indeed the culprit, then I think it will be best fixed > in the 82378IB southbridge, i.e.[1]: > > "The I/O recovery mechanism in the SIO is used to add additional recovery > delay between PCI originated 8-bit and 16-bit I/O cycles to the ISA Bus. > The SIO automatically forces a minimum delay of four SYSCLKs between > back-to-back 8 and 16 bit I/O cycles to the ISA Bus. The delay is > measured from the rising edge of the I/O command (IOR# or IOW#) to the > falling edge of the next BALE. If a delay of greater than four SYSCLKs is > required, the ISA I/O Recovery Time Register can be programmed to increase > the delay in increments of SYSCLKs. Note that no additional delay is > inserted for back-to-back I/O "sub cycles" generated as a result of byte > assembly or disassembly. This register defaults to 8 and 16-bit recovery > enabled with two clocks added to the standard I/O recovery." > > where it won't be causing unnecessary overhead for native PCI devices or > indeed excessive one for ISA devices. It might be interesting to note > that later SIO versions like the 82378ZB increased the minimum to five > SYSCLKs, so maybe a missing SYSCLK (that can still be inserted by suitably > programming the ICRT) is the source of the problem? > > References: > > [1] "82378IB System I/O (SIO)", April 1993, Intel Corporation, Order > Number: 290473-002, Section 4.1.17 "ICRT -- ISA Controller Recovery > Timer Register" > > Maciej I tried to modify this register (I wrote 0x44 to it - it should correspond to the maximum delay) and it had no effect on the serial port and rtc lock-ups. Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-11 14:58 ` Maciej W. Rozycki 2020-05-12 19:35 ` Mikulas Patocka @ 2020-05-13 14:41 ` Ivan Kokshaysky 2020-05-13 16:13 ` Greg Kroah-Hartman ` (2 more replies) 1 sibling, 3 replies; 45+ messages in thread From: Ivan Kokshaysky @ 2020-05-13 14:41 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Mikulas Patocka, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote: > Individual PCI port locations correspond to different MMIO locations, so > yes, accesses to these can be reordered (merging won't happen due to the > use of the sparse address space). Correct, it's how Alpha write buffers work. According to 21064 hardware reference manual, these buffers are flushed when one of the following conditions is met: 1) The write buffer contains at least two valid entries. 2) The write buffer contains one valid entry and at least 256 CPU cycles have elapsed since the execution of the last write buffer-directed instruction. 3) The write buffer contains an MB, STQ_C or STL_C instruction. 4) A load miss is pending to an address currently valid in the write buffer that requires the write buffer to be flushed. I'm certain that in these rtc/serial cases we've got readX arriving to device *before* preceeding writeX because of 2). That's why small delay (300-1400 ns, apparently depends on CPU frequency) seemingly "fixes" the problem. The 4) is not met because loads and stores are to different ports, and 3) has been broken by commit 92d7223a74. So I believe that correct fix would be to revert 92d7223a74 and add wmb() before [io]writeX macros to meet memory-barriers.txt requirement. The "wmb" instruction is cheap enough and won't hurt IO performance too much. Ivan. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-13 14:41 ` Ivan Kokshaysky @ 2020-05-13 16:13 ` Greg Kroah-Hartman 2020-05-13 17:17 ` Maciej W. Rozycki 2020-05-22 13:26 ` Mikulas Patocka 2 siblings, 0 replies; 45+ messages in thread From: Greg Kroah-Hartman @ 2020-05-13 16:13 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Mikulas Patocka, Arnd Bergmann, Richard Henderson, Matt Turner, alpha, linux-serial, linux-rtc On Wed, May 13, 2020 at 03:41:28PM +0100, Ivan Kokshaysky wrote: > On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote: > > Individual PCI port locations correspond to different MMIO locations, so > > yes, accesses to these can be reordered (merging won't happen due to the > > use of the sparse address space). > > Correct, it's how Alpha write buffers work. According to 21064 hardware > reference manual, these buffers are flushed when one of the following > conditions is met: > > 1) The write buffer contains at least two valid entries. > 2) The write buffer contains one valid entry and at least 256 CPU cycles > have elapsed since the execution of the last write buffer-directed > instruction. > 3) The write buffer contains an MB, STQ_C or STL_C instruction. > 4) A load miss is pending to an address currently valid in the write > buffer that requires the write buffer to be flushed. > > I'm certain that in these rtc/serial cases we've got readX arriving > to device *before* preceeding writeX because of 2). That's why small > delay (300-1400 ns, apparently depends on CPU frequency) seemingly > "fixes" the problem. The 4) is not met because loads and stores are > to different ports, and 3) has been broken by commit 92d7223a74. > > So I believe that correct fix would be to revert 92d7223a74 and > add wmb() before [io]writeX macros to meet memory-barriers.txt > requirement. The "wmb" instruction is cheap enough and won't hurt > IO performance too much. I agree, that sounds easier, and work with the authors of memory-barriers.txt in order to straighten things out. greg k-h ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-13 14:41 ` Ivan Kokshaysky 2020-05-13 16:13 ` Greg Kroah-Hartman @ 2020-05-13 17:17 ` Maciej W. Rozycki 2020-05-22 13:03 ` Mikulas Patocka 2020-05-22 13:26 ` Mikulas Patocka 2 siblings, 1 reply; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-13 17:17 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Mikulas Patocka, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Wed, 13 May 2020, Ivan Kokshaysky wrote: > > Individual PCI port locations correspond to different MMIO locations, so > > yes, accesses to these can be reordered (merging won't happen due to the > > use of the sparse address space). > > Correct, it's how Alpha write buffers work. According to 21064 hardware > reference manual, these buffers are flushed when one of the following > conditions is met: > > 1) The write buffer contains at least two valid entries. > 2) The write buffer contains one valid entry and at least 256 CPU cycles > have elapsed since the execution of the last write buffer-directed > instruction. > 3) The write buffer contains an MB, STQ_C or STL_C instruction. > 4) A load miss is pending to an address currently valid in the write > buffer that requires the write buffer to be flushed. > > I'm certain that in these rtc/serial cases we've got readX arriving > to device *before* preceeding writeX because of 2). That's why small > delay (300-1400 ns, apparently depends on CPU frequency) seemingly > "fixes" the problem. The 4) is not met because loads and stores are > to different ports, and 3) has been broken by commit 92d7223a74. > > So I believe that correct fix would be to revert 92d7223a74 and > add wmb() before [io]writeX macros to meet memory-barriers.txt > requirement. The "wmb" instruction is cheap enough and won't hurt > IO performance too much. Hmm, having barriers *afterwards* across all the MMIO accessors, even ones that do not have such a requirement according to memory-barriers.txt, does hurt performance unnecessarily however. What I think has to be done is adding barriers beforehand, and then only add barriers afterwards where necessary. Commit 92d7223a74 did a part of that, but did not consistently update all the remaining accessors. So I don't think reverting 92d7223a74 permanently is the right way to go, however it certainly makes sense to do that temporarily to get rid of the fatal regression, sort all the corner cases and then apply the resulting replacement fix. I think ultimately we do want the barriers beforehand, just like the MIPS port has (and survives) in arch/mips/include/asm/io.h. Observe that unlike the Alpha ISA the MIPS ISA does have nontrivial `rmb' aka the SYNC_RMB hardware instruction. Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-13 17:17 ` Maciej W. Rozycki @ 2020-05-22 13:03 ` Mikulas Patocka 2020-05-22 13:37 ` Maciej W. Rozycki 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-22 13:03 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Wed, 13 May 2020, Maciej W. Rozycki wrote: > On Wed, 13 May 2020, Ivan Kokshaysky wrote: > > > > Individual PCI port locations correspond to different MMIO locations, so > > > yes, accesses to these can be reordered (merging won't happen due to the > > > use of the sparse address space). > > > > Correct, it's how Alpha write buffers work. According to 21064 hardware > > reference manual, these buffers are flushed when one of the following > > conditions is met: > > > > 1) The write buffer contains at least two valid entries. > > 2) The write buffer contains one valid entry and at least 256 CPU cycles > > have elapsed since the execution of the last write buffer-directed > > instruction. > > 3) The write buffer contains an MB, STQ_C or STL_C instruction. > > 4) A load miss is pending to an address currently valid in the write > > buffer that requires the write buffer to be flushed. > > > > I'm certain that in these rtc/serial cases we've got readX arriving > > to device *before* preceeding writeX because of 2). That's why small > > delay (300-1400 ns, apparently depends on CPU frequency) seemingly > > "fixes" the problem. The 4) is not met because loads and stores are > > to different ports, and 3) has been broken by commit 92d7223a74. > > > > So I believe that correct fix would be to revert 92d7223a74 and > > add wmb() before [io]writeX macros to meet memory-barriers.txt > > requirement. The "wmb" instruction is cheap enough and won't hurt > > IO performance too much. > > Hmm, having barriers *afterwards* across all the MMIO accessors, even > ones that do not have such a requirement according to memory-barriers.txt, > does hurt performance unnecessarily however. What I think has to be done > is adding barriers beforehand, and then only add barriers afterwards where > necessary. Commit 92d7223a74 did a part of that, but did not consistently > update all the remaining accessors. > > So I don't think reverting 92d7223a74 permanently is the right way to go, > however it certainly makes sense to do that temporarily to get rid of the > fatal regression, sort all the corner cases and then apply the resulting > replacement fix. See Documentation/memory-barriers.txt, the section "KERNEL I/O BARRIER EFFECTS" According to the specification, there must be a barrier before a write to the MMIO space (paragraph 3) and after a read from MMIO space (parahraph 4) - if this causes unnecessary slowdown, the driver should use readX_relaxed or writeX_relaxed functions - the *_relaxed functions are ordered with each other (see the paragraph "(*) readX_relaxed(), writeX_relaxed()"), but they are not ordered with respect to memory access. The commit 92d7223a74 fixes that requirement (although there is no real driver that was fixed by this), so I don't think it should be reverted. The proper fix should be to add delays to the serial port and readltime clock (or perhaps to all IO-port accesses). > I think ultimately we do want the barriers beforehand, just like the > MIPS port has (and survives) in arch/mips/include/asm/io.h. Observe If the MIPS port doesn't have MMIO barrier after read[bwl], then it is violating the specification. Perhaps there is no existing driver that is hurt by this violation, so this violation survived. > that unlike the Alpha ISA the MIPS ISA does have nontrivial `rmb' aka > the SYNC_RMB hardware instruction. > > Maciej Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-22 13:03 ` Mikulas Patocka @ 2020-05-22 13:37 ` Maciej W. Rozycki 0 siblings, 0 replies; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-22 13:37 UTC (permalink / raw) To: Mikulas Patocka Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Fri, 22 May 2020, Mikulas Patocka wrote: > > Hmm, having barriers *afterwards* across all the MMIO accessors, even > > ones that do not have such a requirement according to memory-barriers.txt, > > does hurt performance unnecessarily however. What I think has to be done > > is adding barriers beforehand, and then only add barriers afterwards where > > necessary. Commit 92d7223a74 did a part of that, but did not consistently > > update all the remaining accessors. > > > > So I don't think reverting 92d7223a74 permanently is the right way to go, > > however it certainly makes sense to do that temporarily to get rid of the > > fatal regression, sort all the corner cases and then apply the resulting > > replacement fix. > > See Documentation/memory-barriers.txt, the section "KERNEL I/O BARRIER > EFFECTS" > > According to the specification, there must be a barrier before a write to > the MMIO space (paragraph 3) and after a read from MMIO space (parahraph > 4) - if this causes unnecessary slowdown, the driver should use > readX_relaxed or writeX_relaxed functions - the *_relaxed functions are > ordered with each other (see the paragraph "(*) readX_relaxed(), > writeX_relaxed()"), but they are not ordered with respect to memory > access. The specification doesn't require a barrier *after* a write however, which is what I have been concerned about as it may cost hundreds of cycles wasted. I'm not concerned about a barrier after a read (and one beforehand is of course also required). > The commit 92d7223a74 fixes that requirement (although there is no real > driver that was fixed by this), so I don't think it should be reverted. > The proper fix should be to add delays to the serial port and readltime > clock (or perhaps to all IO-port accesses). Adding artificial delays will only paper over the real problem I'm afraid. > > I think ultimately we do want the barriers beforehand, just like the > > MIPS port has (and survives) in arch/mips/include/asm/io.h. Observe > > If the MIPS port doesn't have MMIO barrier after read[bwl], then it is > violating the specification. Perhaps there is no existing driver that is > hurt by this violation, so this violation survived. It does have a barrier, see: /* prevent prefetching of coherent DMA data prematurely */ \ if (!relax) \ rmb(); \ In the light of #5 however: 5. A readX() by a CPU thread from the peripheral will complete before any subsequent delay() loop can begin execution on the same thread. I think it may have to be replaced with a completion barrier however, and that will be tricky because you cannot just issue a second read to the resource accessed after the `rmb' to make the first read complete, as a MMIO read may have side effects (e.g. clearing an interrupt request). So the read would have to happen to a different location. Some architectures have a hardware completion barrier instruction, such as the modern MIPS ISA, which makes life sweet and easy (as much as life can be sweet and easy with a weakly ordered bus model) as no dummy read is then required, but surely not all do (including older MIPS ISA revisions). Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-13 14:41 ` Ivan Kokshaysky 2020-05-13 16:13 ` Greg Kroah-Hartman 2020-05-13 17:17 ` Maciej W. Rozycki @ 2020-05-22 13:26 ` Mikulas Patocka 2020-05-22 20:00 ` Mikulas Patocka 2 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-22 13:26 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Wed, 13 May 2020, Ivan Kokshaysky wrote: > On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote: > > Individual PCI port locations correspond to different MMIO locations, so > > yes, accesses to these can be reordered (merging won't happen due to the > > use of the sparse address space). > > Correct, it's how Alpha write buffers work. According to 21064 hardware > reference manual, these buffers are flushed when one of the following > conditions is met: > > 1) The write buffer contains at least two valid entries. > 2) The write buffer contains one valid entry and at least 256 CPU cycles > have elapsed since the execution of the last write buffer-directed > instruction. > 3) The write buffer contains an MB, STQ_C or STL_C instruction. > 4) A load miss is pending to an address currently valid in the write > buffer that requires the write buffer to be flushed. > > I'm certain that in these rtc/serial cases we've got readX arriving > to device *before* preceeding writeX because of 2). That's why small > delay (300-1400 ns, apparently depends on CPU frequency) seemingly > "fixes" the problem. The 4) is not met because loads and stores are > to different ports, and 3) has been broken by commit 92d7223a74. > > So I believe that correct fix would be to revert 92d7223a74 and > add wmb() before [io]writeX macros to meet memory-barriers.txt > requirement. The "wmb" instruction is cheap enough and won't hurt > IO performance too much. > > Ivan. I agree ... and what about readX_relaxed and writeX_relaxed? According to the memory-barriers specification, the _relaxed functions must be ordered w.r.t. each other. If Alpha can't keep them ordered, they should have barriers between them too. Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l 2020-05-22 13:26 ` Mikulas Patocka @ 2020-05-22 20:00 ` Mikulas Patocka 2020-05-23 10:26 ` [PATCH v4] alpha: fix memory barriers so that they conform to the specification Mikulas Patocka 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-22 20:00 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Fri, 22 May 2020, Mikulas Patocka wrote: > On Wed, 13 May 2020, Ivan Kokshaysky wrote: > > > On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote: > > > Individual PCI port locations correspond to different MMIO locations, so > > > yes, accesses to these can be reordered (merging won't happen due to the > > > use of the sparse address space). > > > > Correct, it's how Alpha write buffers work. According to 21064 hardware > > reference manual, these buffers are flushed when one of the following > > conditions is met: > > > > 1) The write buffer contains at least two valid entries. > > 2) The write buffer contains one valid entry and at least 256 CPU cycles > > have elapsed since the execution of the last write buffer-directed > > instruction. > > 3) The write buffer contains an MB, STQ_C or STL_C instruction. > > 4) A load miss is pending to an address currently valid in the write > > buffer that requires the write buffer to be flushed. > > > > I'm certain that in these rtc/serial cases we've got readX arriving > > to device *before* preceeding writeX because of 2). That's why small > > delay (300-1400 ns, apparently depends on CPU frequency) seemingly > > "fixes" the problem. The 4) is not met because loads and stores are > > to different ports, and 3) has been broken by commit 92d7223a74. > > > > So I believe that correct fix would be to revert 92d7223a74 and > > add wmb() before [io]writeX macros to meet memory-barriers.txt > > requirement. The "wmb" instruction is cheap enough and won't hurt > > IO performance too much. > > > > Ivan. > > I agree ... and what about readX_relaxed and writeX_relaxed? According to > the memory-barriers specification, the _relaxed functions must be ordered > w.r.t. each other. If Alpha can't keep them ordered, they should have > barriers between them too. > > Mikulas I have found the chapter about I/O in the Alpha reference manual, in the section "5.6.4.7 Implications for Memory Mapped I/O", it says: To reliably communicate shared data from a processor to an I/O device: 1. Write the shared data to a memory-like physical memory region on the processor. 2. Execute an MB instruction. 3. Write a flag (equivalently, send an interrupt or write a register location implemented in the I/O device). The receiving I/O device must: 1. Read the flag (equivalently, detect the interrupt or detect the write to the register location implemented in the I/O device). 2. Execute the equivalent of an MB. 3. Read the shared data. So, we must use MB, not WMB when ordering I/O accesses. The WMB instruction specification says that it orders writes to memory-like locations with other writes to memory-like locations. And writes to non-memory-like locations with other writes to non-memory-like locations. But it doesn't order memory-like writes with I/O-like writes. The section 5.6.3 claims that there are no implied barriers. So, if we want to support the specifications: read*_relaxed and write*_relaxed need at least one barrier before or after (memory-barriers.txt says that they must be ordered with each other, the alpha specification doesn't specify ordering). read and write - read must have a barrier after it, write before it. There must be one more barrier before a read or after a write, to make sure that there is barrier between write+read. Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v4] alpha: fix memory barriers so that they conform to the specification 2020-05-22 20:00 ` Mikulas Patocka @ 2020-05-23 10:26 ` Mikulas Patocka 2020-05-23 15:10 ` Ivan Kokshaysky 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-23 10:26 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti platform. The patches move memory barriers after a write before the write. The result is that if there's iowrite followed by ioread, there is no barrier between them. The Alpha architecture allows reordering of the accesses to the I/O space, and the missing barrier between write and read causes hang with serial port and real time clock. This patch makes barriers confiorm to the specification. 1. We add mb() before readX_relaxed and writeX_relaxed - memory-barriers.txt claims that these functions must be ordered w.r.t. each other. Alpha doesn't order them, so we need an explicit barrier. 2. We add mb() before reads from the I/O space - so that if there's a write followed by a read, there should be a barrier between them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ --- arch/alpha/include/asm/io.h | 43 ++++++++++++++++++++++++++++--------------- arch/alpha/kernel/io.c | 28 +++++++++++++++++++++------- 2 files changed, 49 insertions(+), 22 deletions(-) Index: linux-stable/arch/alpha/include/asm/io.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-23 10:01:22.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-23 10:01:22.000000000 +0200 @@ -310,14 +310,18 @@ static inline int __is_mmio(const volati #if IO_CONCAT(__IO_PREFIX,trivial_io_bw) extern inline unsigned int ioread8(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); mb(); return ret; } extern inline unsigned int ioread16(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); mb(); return ret; } @@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned #if IO_CONCAT(__IO_PREFIX,trivial_io_lq) extern inline unsigned int ioread32(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); mb(); return ret; } @@ -403,15 +409,18 @@ extern inline void __raw_writew(u16 b, v extern inline u8 readb(const volatile void __iomem *addr) { - u8 ret = __raw_readb(addr); + u8 ret; + mb(); + ret = __raw_readb(addr); mb(); return ret; } extern inline u16 readw(const volatile void __iomem *addr) { - u16 ret = __raw_readw(addr); + u16 ret; mb(); + ret = __raw_readw(addr); return ret; } @@ -451,14 +460,18 @@ extern inline void __raw_writeq(u64 b, v extern inline u32 readl(const volatile void __iomem *addr) { - u32 ret = __raw_readl(addr); + u32 ret; + mb(); + ret = __raw_readl(addr); mb(); return ret; } extern inline u64 readq(const volatile void __iomem *addr) { - u64 ret = __raw_readq(addr); + u64 ret; + mb(); + ret = __raw_readq(addr); mb(); return ret; } @@ -487,14 +500,14 @@ extern inline void writeq(u64 b, volatil #define outb_p outb #define outw_p outw #define outl_p outl -#define readb_relaxed(addr) __raw_readb(addr) -#define readw_relaxed(addr) __raw_readw(addr) -#define readl_relaxed(addr) __raw_readl(addr) -#define readq_relaxed(addr) __raw_readq(addr) -#define writeb_relaxed(b, addr) __raw_writeb(b, addr) -#define writew_relaxed(b, addr) __raw_writew(b, addr) -#define writel_relaxed(b, addr) __raw_writel(b, addr) -#define writeq_relaxed(b, addr) __raw_writeq(b, addr) +#define readb_relaxed(addr) (mb(), __raw_readb(addr)) +#define readw_relaxed(addr) (mb(), __raw_readw(addr)) +#define readl_relaxed(addr) (mb(), __raw_readl(addr)) +#define readq_relaxed(addr) (mb(), __raw_readq(addr)) +#define writeb_relaxed(b, addr) (mb(), __raw_writeb(b, addr)) +#define writew_relaxed(b, addr) (mb(), __raw_writew(b, addr)) +#define writel_relaxed(b, addr) (mb(), __raw_writel(b, addr)) +#define writeq_relaxed(b, addr) (mb(), __raw_writeq(b, addr)) /* * String version of IO memory access ops: Index: linux-stable/arch/alpha/kernel/io.c =================================================================== --- linux-stable.orig/arch/alpha/kernel/io.c 2020-05-23 10:01:22.000000000 +0200 +++ linux-stable/arch/alpha/kernel/io.c 2020-05-23 10:01:22.000000000 +0200 @@ -16,21 +16,27 @@ unsigned int ioread8(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); mb(); return ret; } unsigned int ioread16(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); mb(); return ret; } unsigned int ioread32(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); mb(); return ret; } @@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq); u8 readb(const volatile void __iomem *addr) { - u8 ret = __raw_readb(addr); + u8 ret; + mb(); + ret = __raw_readb(addr); mb(); return ret; } u16 readw(const volatile void __iomem *addr) { - u16 ret = __raw_readw(addr); + u16 ret; + mb(); + ret = __raw_readw(addr); mb(); return ret; } u32 readl(const volatile void __iomem *addr) { - u32 ret = __raw_readl(addr); + u32 ret; + mb(); + ret = __raw_readl(addr); mb(); return ret; } u64 readq(const volatile void __iomem *addr) { - u64 ret = __raw_readq(addr); + u64 ret; + mb(); + ret = __raw_readq(addr); mb(); return ret; } ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification 2020-05-23 10:26 ` [PATCH v4] alpha: fix memory barriers so that they conform to the specification Mikulas Patocka @ 2020-05-23 15:10 ` Ivan Kokshaysky 2020-05-23 15:34 ` Mikulas Patocka 0 siblings, 1 reply; 45+ messages in thread From: Ivan Kokshaysky @ 2020-05-23 15:10 UTC (permalink / raw) To: Mikulas Patocka Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sat, May 23, 2020 at 06:26:54AM -0400, Mikulas Patocka wrote: > The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti > platform. The patches move memory barriers after a write before the write. > The result is that if there's iowrite followed by ioread, there is no > barrier between them. > > The Alpha architecture allows reordering of the accesses to the I/O space, > and the missing barrier between write and read causes hang with serial > port and real time clock. > > This patch makes barriers confiorm to the specification. > > 1. We add mb() before readX_relaxed and writeX_relaxed - > memory-barriers.txt claims that these functions must be ordered w.r.t. > each other. Alpha doesn't order them, so we need an explicit barrier. > 2. We add mb() before reads from the I/O space - so that if there's a > write followed by a read, there should be a barrier between them. You missed the second mb() in extern inline u16 readw(). Otherwise, Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") > Cc: stable@vger.kernel.org # v4.17+ > > --- > arch/alpha/include/asm/io.h | 43 ++++++++++++++++++++++++++++--------------- > arch/alpha/kernel/io.c | 28 +++++++++++++++++++++------- > 2 files changed, 49 insertions(+), 22 deletions(-) > > Index: linux-stable/arch/alpha/include/asm/io.h > =================================================================== > --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-23 10:01:22.000000000 +0200 > +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-23 10:01:22.000000000 +0200 > @@ -310,14 +310,18 @@ static inline int __is_mmio(const volati > #if IO_CONCAT(__IO_PREFIX,trivial_io_bw) > extern inline unsigned int ioread8(void __iomem *addr) > { > - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); > + unsigned int ret; > + mb(); > + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); > mb(); > return ret; > } > > extern inline unsigned int ioread16(void __iomem *addr) > { > - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); > + unsigned int ret; > + mb(); > + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); > mb(); > return ret; > } > @@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned > #if IO_CONCAT(__IO_PREFIX,trivial_io_lq) > extern inline unsigned int ioread32(void __iomem *addr) > { > - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); > + unsigned int ret; > + mb(); > + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); > mb(); > return ret; > } > @@ -403,15 +409,18 @@ extern inline void __raw_writew(u16 b, v > > extern inline u8 readb(const volatile void __iomem *addr) > { > - u8 ret = __raw_readb(addr); > + u8 ret; > + mb(); > + ret = __raw_readb(addr); > mb(); > return ret; > } > > extern inline u16 readw(const volatile void __iomem *addr) > { > - u16 ret = __raw_readw(addr); > + u16 ret; > mb(); > + ret = __raw_readw(addr); > return ret; > } > > @@ -451,14 +460,18 @@ extern inline void __raw_writeq(u64 b, v > > extern inline u32 readl(const volatile void __iomem *addr) > { > - u32 ret = __raw_readl(addr); > + u32 ret; > + mb(); > + ret = __raw_readl(addr); > mb(); > return ret; > } > > extern inline u64 readq(const volatile void __iomem *addr) > { > - u64 ret = __raw_readq(addr); > + u64 ret; > + mb(); > + ret = __raw_readq(addr); > mb(); > return ret; > } > @@ -487,14 +500,14 @@ extern inline void writeq(u64 b, volatil > #define outb_p outb > #define outw_p outw > #define outl_p outl > -#define readb_relaxed(addr) __raw_readb(addr) > -#define readw_relaxed(addr) __raw_readw(addr) > -#define readl_relaxed(addr) __raw_readl(addr) > -#define readq_relaxed(addr) __raw_readq(addr) > -#define writeb_relaxed(b, addr) __raw_writeb(b, addr) > -#define writew_relaxed(b, addr) __raw_writew(b, addr) > -#define writel_relaxed(b, addr) __raw_writel(b, addr) > -#define writeq_relaxed(b, addr) __raw_writeq(b, addr) > +#define readb_relaxed(addr) (mb(), __raw_readb(addr)) > +#define readw_relaxed(addr) (mb(), __raw_readw(addr)) > +#define readl_relaxed(addr) (mb(), __raw_readl(addr)) > +#define readq_relaxed(addr) (mb(), __raw_readq(addr)) > +#define writeb_relaxed(b, addr) (mb(), __raw_writeb(b, addr)) > +#define writew_relaxed(b, addr) (mb(), __raw_writew(b, addr)) > +#define writel_relaxed(b, addr) (mb(), __raw_writel(b, addr)) > +#define writeq_relaxed(b, addr) (mb(), __raw_writeq(b, addr)) > > /* > * String version of IO memory access ops: > Index: linux-stable/arch/alpha/kernel/io.c > =================================================================== > --- linux-stable.orig/arch/alpha/kernel/io.c 2020-05-23 10:01:22.000000000 +0200 > +++ linux-stable/arch/alpha/kernel/io.c 2020-05-23 10:01:22.000000000 +0200 > @@ -16,21 +16,27 @@ > unsigned int > ioread8(void __iomem *addr) > { > - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); > + unsigned int ret; > + mb(); > + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); > mb(); > return ret; > } > > unsigned int ioread16(void __iomem *addr) > { > - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); > + unsigned int ret; > + mb(); > + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); > mb(); > return ret; > } > > unsigned int ioread32(void __iomem *addr) > { > - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); > + unsigned int ret; > + mb(); > + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); > mb(); > return ret; > } > @@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq); > > u8 readb(const volatile void __iomem *addr) > { > - u8 ret = __raw_readb(addr); > + u8 ret; > + mb(); > + ret = __raw_readb(addr); > mb(); > return ret; > } > > u16 readw(const volatile void __iomem *addr) > { > - u16 ret = __raw_readw(addr); > + u16 ret; > + mb(); > + ret = __raw_readw(addr); > mb(); > return ret; > } > > u32 readl(const volatile void __iomem *addr) > { > - u32 ret = __raw_readl(addr); > + u32 ret; > + mb(); > + ret = __raw_readl(addr); > mb(); > return ret; > } > > u64 readq(const volatile void __iomem *addr) > { > - u64 ret = __raw_readq(addr); > + u64 ret; > + mb(); > + ret = __raw_readq(addr); > mb(); > return ret; > } > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification 2020-05-23 15:10 ` Ivan Kokshaysky @ 2020-05-23 15:34 ` Mikulas Patocka 2020-05-23 15:37 ` [PATCH v5] " Mikulas Patocka 2020-05-23 16:44 ` [PATCH v4] " Maciej W. Rozycki 0 siblings, 2 replies; 45+ messages in thread From: Mikulas Patocka @ 2020-05-23 15:34 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sat, 23 May 2020, Ivan Kokshaysky wrote: > On Sat, May 23, 2020 at 06:26:54AM -0400, Mikulas Patocka wrote: > > The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti > > platform. The patches move memory barriers after a write before the write. > > The result is that if there's iowrite followed by ioread, there is no > > barrier between them. > > > > The Alpha architecture allows reordering of the accesses to the I/O space, > > and the missing barrier between write and read causes hang with serial > > port and real time clock. > > > > This patch makes barriers confiorm to the specification. > > > > 1. We add mb() before readX_relaxed and writeX_relaxed - > > memory-barriers.txt claims that these functions must be ordered w.r.t. > > each other. Alpha doesn't order them, so we need an explicit barrier. > > 2. We add mb() before reads from the I/O space - so that if there's a > > write followed by a read, there should be a barrier between them. > > You missed the second mb() in extern inline u16 readw(). Otherwise, > > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> ... and I also broke the *_relaxed macros and didn't notice it, because they are unused in my config. This won't compile, because mb() is a statement, not a function. > > +#define readb_relaxed(addr) (mb(), __raw_readb(addr)) I'll send a new version of the patch. Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-23 15:34 ` Mikulas Patocka @ 2020-05-23 15:37 ` Mikulas Patocka 2020-05-24 14:54 ` Maciej W. Rozycki 2020-05-23 16:44 ` [PATCH v4] " Maciej W. Rozycki 1 sibling, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-23 15:37 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti platform. The patches move memory barriers after a write before the write. The result is that if there's iowrite followed by ioread, there is no barrier between them. The Alpha architecture allows reordering of the accesses to the I/O space, and the missing barrier between write and read causes hang with serial port and real time clock. This patch makes barriers confiorm to the specification. 1. We add mb() before readX_relaxed and writeX_relaxed - memory-barriers.txt claims that these functions must be ordered w.r.t. each other. Alpha doesn't order them, so we need an explicit barrier. 2. We add mb() before reads from the I/O space - so that if there's a write followed by a read, there should be a barrier between them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> --- arch/alpha/include/asm/io.h | 87 ++++++++++++++++++++++++++++++++++++-------- arch/alpha/kernel/io.c | 28 ++++++++++---- 2 files changed, 93 insertions(+), 22 deletions(-) Index: linux-stable/arch/alpha/include/asm/io.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-23 10:01:22.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-23 17:29:22.000000000 +0200 @@ -310,14 +310,18 @@ static inline int __is_mmio(const volati #if IO_CONCAT(__IO_PREFIX,trivial_io_bw) extern inline unsigned int ioread8(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); mb(); return ret; } extern inline unsigned int ioread16(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); mb(); return ret; } @@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned #if IO_CONCAT(__IO_PREFIX,trivial_io_lq) extern inline unsigned int ioread32(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); mb(); return ret; } @@ -403,14 +409,18 @@ extern inline void __raw_writew(u16 b, v extern inline u8 readb(const volatile void __iomem *addr) { - u8 ret = __raw_readb(addr); + u8 ret; + mb(); + ret = __raw_readb(addr); mb(); return ret; } extern inline u16 readw(const volatile void __iomem *addr) { - u16 ret = __raw_readw(addr); + u16 ret; + mb(); + ret = __raw_readw(addr); mb(); return ret; } @@ -451,14 +461,18 @@ extern inline void __raw_writeq(u64 b, v extern inline u32 readl(const volatile void __iomem *addr) { - u32 ret = __raw_readl(addr); + u32 ret; + mb(); + ret = __raw_readl(addr); mb(); return ret; } extern inline u64 readq(const volatile void __iomem *addr) { - u64 ret = __raw_readq(addr); + u64 ret; + mb(); + ret = __raw_readq(addr); mb(); return ret; } @@ -487,16 +501,59 @@ extern inline void writeq(u64 b, volatil #define outb_p outb #define outw_p outw #define outl_p outl -#define readb_relaxed(addr) __raw_readb(addr) -#define readw_relaxed(addr) __raw_readw(addr) -#define readl_relaxed(addr) __raw_readl(addr) -#define readq_relaxed(addr) __raw_readq(addr) -#define writeb_relaxed(b, addr) __raw_writeb(b, addr) -#define writew_relaxed(b, addr) __raw_writew(b, addr) -#define writel_relaxed(b, addr) __raw_writel(b, addr) -#define writeq_relaxed(b, addr) __raw_writeq(b, addr) /* + * The _relaxed functions must be ordered w.r.t. each other, but they don't + * have to be ordered w.r.t. other memory accesses. + */ +static inline u8 readb_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readb(addr); +} + +static inline u16 readw_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readw(addr); +} + +static inline u32 readl_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readl(addr); +} + +static inline u64 readq_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readq(addr); +} + +static inline void writeb_relaxed(u8 b, volatile void __iomem *addr) +{ + mb(); + __raw_writeb(b, addr); +} + +static inline void writew_relaxed(u16 b, volatile void __iomem *addr) +{ + mb(); + __raw_writew(b, addr); +} + +static inline void writel_relaxed(u32 b, volatile void __iomem *addr) +{ + mb(); + __raw_writel(b, addr); +} + +static inline void writeq_relaxed(u64 b, volatile void __iomem *addr) +{ + mb(); + __raw_writeq(b, addr); +} +/* * String version of IO memory access ops: */ extern void memcpy_fromio(void *, const volatile void __iomem *, long); Index: linux-stable/arch/alpha/kernel/io.c =================================================================== --- linux-stable.orig/arch/alpha/kernel/io.c 2020-05-23 10:01:22.000000000 +0200 +++ linux-stable/arch/alpha/kernel/io.c 2020-05-23 10:01:22.000000000 +0200 @@ -16,21 +16,27 @@ unsigned int ioread8(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); mb(); return ret; } unsigned int ioread16(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); mb(); return ret; } unsigned int ioread32(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); mb(); return ret; } @@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq); u8 readb(const volatile void __iomem *addr) { - u8 ret = __raw_readb(addr); + u8 ret; + mb(); + ret = __raw_readb(addr); mb(); return ret; } u16 readw(const volatile void __iomem *addr) { - u16 ret = __raw_readw(addr); + u16 ret; + mb(); + ret = __raw_readw(addr); mb(); return ret; } u32 readl(const volatile void __iomem *addr) { - u32 ret = __raw_readl(addr); + u32 ret; + mb(); + ret = __raw_readl(addr); mb(); return ret; } u64 readq(const volatile void __iomem *addr) { - u64 ret = __raw_readq(addr); + u64 ret; + mb(); + ret = __raw_readq(addr); mb(); return ret; } ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-23 15:37 ` [PATCH v5] " Mikulas Patocka @ 2020-05-24 14:54 ` Maciej W. Rozycki 2020-05-25 13:56 ` Mikulas Patocka 0 siblings, 1 reply; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-24 14:54 UTC (permalink / raw) To: Mikulas Patocka Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc Hi Mikulas, > This patch makes barriers confiorm to the specification. > > 1. We add mb() before readX_relaxed and writeX_relaxed - > memory-barriers.txt claims that these functions must be ordered w.r.t. > each other. Alpha doesn't order them, so we need an explicit barrier. > 2. We add mb() before reads from the I/O space - so that if there's a > write followed by a read, there should be a barrier between them. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") > Cc: stable@vger.kernel.org # v4.17+ > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Thank you for your effort to address this regression. I have looked through your code and the context it is to be applied to. Overall it looks good to me, however I still have one concern as detailed below (please accept my apologies if you find it tedious to address all the points raised in the course of this review). > Index: linux-stable/arch/alpha/include/asm/io.h > =================================================================== > --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-23 10:01:22.000000000 +0200 > +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-23 17:29:22.000000000 +0200 [...] > @@ -487,16 +501,59 @@ extern inline void writeq(u64 b, volatil > #define outb_p outb > #define outw_p outw > #define outl_p outl > -#define readb_relaxed(addr) __raw_readb(addr) > -#define readw_relaxed(addr) __raw_readw(addr) > -#define readl_relaxed(addr) __raw_readl(addr) > -#define readq_relaxed(addr) __raw_readq(addr) > -#define writeb_relaxed(b, addr) __raw_writeb(b, addr) > -#define writew_relaxed(b, addr) __raw_writew(b, addr) > -#define writel_relaxed(b, addr) __raw_writel(b, addr) > -#define writeq_relaxed(b, addr) __raw_writeq(b, addr) > > /* > + * The _relaxed functions must be ordered w.r.t. each other, but they don't > + * have to be ordered w.r.t. other memory accesses. > + */ > +static inline u8 readb_relaxed(const volatile void __iomem *addr) > +{ > + mb(); > + return __raw_readb(addr); > +} [etc.] Please observe that changing the `*_relaxed' entry points from merely aliasing the corresponding `__raw_*' handlers to more elaborate code sequences (though indeed slightly only, but still) makes the situation analogous to one we have with the ordinary MMIO accessor entry points. Those regular entry points have been made `extern inline' and wrapped into: #if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1 and: #if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1 respectively, with corresponding out-of-line entry points available, so that there is no extra inline code produced where the call to the relevant MMIO accessor is going to end up with an actual function call, as this would not help performance in any way and would expand code unnecessarily at all call sites. Therefore I suggest that your new `static inline' functions follow the pattern, perhaps by grouping them with the corresponding ordinary accessor functions in arch/alpha/include/asm/io.h within the relevant existing #ifdef, and then by making them `extern inline' and providing out-of-line implementations in arch/alpha/kernel/io.c, with the individual symbols exported. Within arch/alpha/kernel/io.c the compiler will still inline code as it sees fit as it already does, e.g. `__raw_readq' might get inlined in `readq' if it turns out cheaper than arranging for an actual call, including all the stack frame preparation for `ra' preservation; it's less likely with say `writeq' which probably always ends with a tail call to `__raw_writeq' as no stack frame is required in that case. That for the read accessors. > +static inline void writeb_relaxed(u8 b, volatile void __iomem *addr) > +{ > + mb(); > + __raw_writeb(b, addr); > +} [etc.] Conversely for the write accessors, keeping in mind what I have noted above, I suggest that you just redirect the existing aliases to the ordinary accessors, as there will be no difference anymore between the respective ordinary and relaxed accessors. That is: #define writeb_relaxed(b, addr) writeb(b, addr) etc. Let me know if you have any further questions or comments. Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-24 14:54 ` Maciej W. Rozycki @ 2020-05-25 13:56 ` Mikulas Patocka 2020-05-25 14:07 ` Arnd Bergmann 2020-05-25 14:45 ` Maciej W. Rozycki 0 siblings, 2 replies; 45+ messages in thread From: Mikulas Patocka @ 2020-05-25 13:56 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sun, 24 May 2020, Maciej W. Rozycki wrote: > Hi Mikulas, > > > This patch makes barriers confiorm to the specification. > > > > 1. We add mb() before readX_relaxed and writeX_relaxed - > > memory-barriers.txt claims that these functions must be ordered w.r.t. > > each other. Alpha doesn't order them, so we need an explicit barrier. > > 2. We add mb() before reads from the I/O space - so that if there's a > > write followed by a read, there should be a barrier between them. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") > > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") > > Cc: stable@vger.kernel.org # v4.17+ > > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> > > Thank you for your effort to address this regression. I have looked > through your code and the context it is to be applied to. Overall it > looks good to me, however I still have one concern as detailed below > (please accept my apologies if you find it tedious to address all the > points raised in the course of this review). > > > Index: linux-stable/arch/alpha/include/asm/io.h > > =================================================================== > > --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-23 10:01:22.000000000 +0200 > > +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-23 17:29:22.000000000 +0200 > [...] > > @@ -487,16 +501,59 @@ extern inline void writeq(u64 b, volatil > > #define outb_p outb > > #define outw_p outw > > #define outl_p outl > > -#define readb_relaxed(addr) __raw_readb(addr) > > -#define readw_relaxed(addr) __raw_readw(addr) > > -#define readl_relaxed(addr) __raw_readl(addr) > > -#define readq_relaxed(addr) __raw_readq(addr) > > -#define writeb_relaxed(b, addr) __raw_writeb(b, addr) > > -#define writew_relaxed(b, addr) __raw_writew(b, addr) > > -#define writel_relaxed(b, addr) __raw_writel(b, addr) > > -#define writeq_relaxed(b, addr) __raw_writeq(b, addr) > > > > /* > > + * The _relaxed functions must be ordered w.r.t. each other, but they don't > > + * have to be ordered w.r.t. other memory accesses. > > + */ > > +static inline u8 readb_relaxed(const volatile void __iomem *addr) > > +{ > > + mb(); > > + return __raw_readb(addr); > > +} > [etc.] > > Please observe that changing the `*_relaxed' entry points from merely > aliasing the corresponding `__raw_*' handlers to more elaborate code > sequences (though indeed slightly only, but still) makes the situation > analogous to one we have with the ordinary MMIO accessor entry points. > Those regular entry points have been made `extern inline' and wrapped > into: > > #if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1 > > and: > > #if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1 > > respectively, with corresponding out-of-line entry points available, so > that there is no extra inline code produced where the call to the relevant > MMIO accessor is going to end up with an actual function call, as this > would not help performance in any way and would expand code unnecessarily > at all call sites. > > Therefore I suggest that your new `static inline' functions follow the > pattern, perhaps by grouping them with the corresponding ordinary accessor > functions in arch/alpha/include/asm/io.h within the relevant existing > #ifdef, and then by making them `extern inline' and providing out-of-line > implementations in arch/alpha/kernel/io.c, with the individual symbols > exported. Within arch/alpha/kernel/io.c the compiler will still inline > code as it sees fit as it already does, e.g. `__raw_readq' might get > inlined in `readq' if it turns out cheaper than arranging for an actual > call, including all the stack frame preparation for `ra' preservation; > it's less likely with say `writeq' which probably always ends with a tail > call to `__raw_writeq' as no stack frame is required in that case. > > That for the read accessors. I think that making the read*_relaxed functions extern inline just causes source code bloat with no practical gain - if we make them extern inline, we would need two implementations (one in the include file, the other in the C file) - and it is not good practice to duplicate code. The functions __raw_read* are already extern inline, so the compiler will inline/noinline them depending on the macros trivial_io_bw and trivial_io_lq - so we can just call them from read*_relaxed without repeating the extern inline pattern. > > +static inline void writeb_relaxed(u8 b, volatile void __iomem *addr) > > +{ > > + mb(); > > + __raw_writeb(b, addr); > > +} > [etc.] > > Conversely for the write accessors, keeping in mind what I have noted > above, I suggest that you just redirect the existing aliases to the > ordinary accessors, as there will be no difference anymore between the > respective ordinary and relaxed accessors. That is: > > #define writeb_relaxed(b, addr) writeb(b, addr) Yes - that's a good point. > etc. > > Let me know if you have any further questions or comments. > > Maciej Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-25 13:56 ` Mikulas Patocka @ 2020-05-25 14:07 ` Arnd Bergmann 2020-05-25 14:45 ` Maciej W. Rozycki 1 sibling, 0 replies; 45+ messages in thread From: Arnd Bergmann @ 2020-05-25 14:07 UTC (permalink / raw) To: Mikulas Patocka Cc: Maciej W. Rozycki, Ivan Kokshaysky, Maciej W. Rozycki, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Mon, May 25, 2020 at 3:56 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > On Sun, 24 May 2020, Maciej W. Rozycki wrote: > > > > respectively, with corresponding out-of-line entry points available, so > > that there is no extra inline code produced where the call to the relevant > > MMIO accessor is going to end up with an actual function call, as this > > would not help performance in any way and would expand code unnecessarily > > at all call sites. > > > > Therefore I suggest that your new `static inline' functions follow the > > pattern, perhaps by grouping them with the corresponding ordinary accessor > > functions in arch/alpha/include/asm/io.h within the relevant existing > > #ifdef, and then by making them `extern inline' and providing out-of-line > > implementations in arch/alpha/kernel/io.c, with the individual symbols > > exported. Within arch/alpha/kernel/io.c the compiler will still inline > > code as it sees fit as it already does, e.g. `__raw_readq' might get > > inlined in `readq' if it turns out cheaper than arranging for an actual > > call, including all the stack frame preparation for `ra' preservation; > > it's less likely with say `writeq' which probably always ends with a tail > > call to `__raw_writeq' as no stack frame is required in that case. > > > > That for the read accessors. > > I think that making the read*_relaxed functions extern inline just causes > source code bloat with no practical gain - if we make them extern inline, > we would need two implementations (one in the include file, the other in > the C file) - and it is not good practice to duplicate code. > > The functions __raw_read* are already extern inline, so the compiler will > inline/noinline them depending on the macros trivial_io_bw and > trivial_io_lq - so we can just call them from read*_relaxed without > repeating the extern inline pattern. You could consider using the helpers in include/asm-generic/io.h to provide some of the wrappers and only provide the ones that don't fit in that scheme already. Arnd ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-25 13:56 ` Mikulas Patocka 2020-05-25 14:07 ` Arnd Bergmann @ 2020-05-25 14:45 ` Maciej W. Rozycki 2020-05-25 15:53 ` [PATCH v6] " Mikulas Patocka 2020-05-25 15:54 ` [PATCH v5] " Mikulas Patocka 1 sibling, 2 replies; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-25 14:45 UTC (permalink / raw) To: Mikulas Patocka Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Mon, 25 May 2020, Mikulas Patocka wrote: > > Please observe that changing the `*_relaxed' entry points from merely > > aliasing the corresponding `__raw_*' handlers to more elaborate code > > sequences (though indeed slightly only, but still) makes the situation > > analogous to one we have with the ordinary MMIO accessor entry points. > > Those regular entry points have been made `extern inline' and wrapped > > into: > > > > #if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1 > > > > and: > > > > #if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1 > > > > respectively, with corresponding out-of-line entry points available, so > > that there is no extra inline code produced where the call to the relevant > > MMIO accessor is going to end up with an actual function call, as this > > would not help performance in any way and would expand code unnecessarily > > at all call sites. > > > > Therefore I suggest that your new `static inline' functions follow the > > pattern, perhaps by grouping them with the corresponding ordinary accessor > > functions in arch/alpha/include/asm/io.h within the relevant existing > > #ifdef, and then by making them `extern inline' and providing out-of-line > > implementations in arch/alpha/kernel/io.c, with the individual symbols > > exported. Within arch/alpha/kernel/io.c the compiler will still inline > > code as it sees fit as it already does, e.g. `__raw_readq' might get > > inlined in `readq' if it turns out cheaper than arranging for an actual > > call, including all the stack frame preparation for `ra' preservation; > > it's less likely with say `writeq' which probably always ends with a tail > > call to `__raw_writeq' as no stack frame is required in that case. > > > > That for the read accessors. > > I think that making the read*_relaxed functions extern inline just causes > source code bloat with no practical gain - if we make them extern inline, > we would need two implementations (one in the include file, the other in > the C file) - and it is not good practice to duplicate code. We do that already with the existing accessors, and while I agree the duplication is a bit unfortunate and could be solved with some macro hackery so that there is a single version in arch/alpha/include/asm/io.h that gets pasted to produce out-of-line copies in arch/alpha/kernel/io.c. That would be good having across all the accessors, but that is not relevant to the regression discussed here and therefore I do not request that you make such a clean-up as a part of this series. > The functions __raw_read* are already extern inline, so the compiler will > inline/noinline them depending on the macros trivial_io_bw and > trivial_io_lq - so we can just call them from read*_relaxed without > repeating the extern inline pattern. The whole point of this peculiar arrangement for cooked accessors is to avoid having barriers inserted inline around out-of-line calls to raw accessors, and therefore I maintain that we need to have the arrangement applied consistently across all the cooked accessors. Since you're creating new distinct cooked accessors (by making their names no longer alias to existing cooked accessors), they need to follow the pattern. NB this arrangement was added back in 2.6.9-rc3, with: ChangeSet@1.1939.5.8, 2004-09-22 22:40:06-07:00, rth@kanga.twiddle.home [ALPHA] Implement new ioread interface. I believe. I don't know if any further details or discussion around that can be chased, but Richard might be able to chime in. Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v6] alpha: fix memory barriers so that they conform to the specification 2020-05-25 14:45 ` Maciej W. Rozycki @ 2020-05-25 15:53 ` Mikulas Patocka 2020-05-26 14:47 ` [PATCH v7] " Mikulas Patocka 2020-05-25 15:54 ` [PATCH v5] " Mikulas Patocka 1 sibling, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-25 15:53 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti platform. The patches move memory barriers after a write before the write. The result is that if there's iowrite followed by ioread, there is no barrier between them. The Alpha architecture allows reordering of the accesses to the I/O space, and the missing barrier between write and read causes hang with serial port and real time clock. This patch makes barriers confiorm to the specification. 1. We add mb() before readX_relaxed and writeX_relaxed - memory-barriers.txt claims that these functions must be ordered w.r.t. each other. Alpha doesn't order them, so we need an explicit barrier. 2. We add mb() before reads from the I/O space - so that if there's a write followed by a read, there should be a barrier between them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> --- arch/alpha/include/asm/io.h | 74 +++++++++++++++++++++++++++++++++++--------- arch/alpha/kernel/io.c | 60 +++++++++++++++++++++++++++++++---- 2 files changed, 112 insertions(+), 22 deletions(-) Index: linux-stable/arch/alpha/include/asm/io.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-25 15:36:16.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-25 17:45:52.000000000 +0200 @@ -310,14 +310,18 @@ static inline int __is_mmio(const volati #if IO_CONCAT(__IO_PREFIX,trivial_io_bw) extern inline unsigned int ioread8(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); mb(); return ret; } extern inline unsigned int ioread16(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); mb(); return ret; } @@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned #if IO_CONCAT(__IO_PREFIX,trivial_io_lq) extern inline unsigned int ioread32(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); mb(); return ret; } @@ -403,14 +409,18 @@ extern inline void __raw_writew(u16 b, v extern inline u8 readb(const volatile void __iomem *addr) { - u8 ret = __raw_readb(addr); + u8 ret; + mb(); + ret = __raw_readb(addr); mb(); return ret; } extern inline u16 readw(const volatile void __iomem *addr) { - u16 ret = __raw_readw(addr); + u16 ret; + mb(); + ret = __raw_readw(addr); mb(); return ret; } @@ -451,14 +461,18 @@ extern inline void __raw_writeq(u64 b, v extern inline u32 readl(const volatile void __iomem *addr) { - u32 ret = __raw_readl(addr); + u32 ret; + mb(); + ret = __raw_readl(addr); mb(); return ret; } extern inline u64 readq(const volatile void __iomem *addr) { - u64 ret = __raw_readq(addr); + u64 ret; + mb(); + ret = __raw_readq(addr); mb(); return ret; } @@ -487,14 +501,44 @@ extern inline void writeq(u64 b, volatil #define outb_p outb #define outw_p outw #define outl_p outl -#define readb_relaxed(addr) __raw_readb(addr) -#define readw_relaxed(addr) __raw_readw(addr) -#define readl_relaxed(addr) __raw_readl(addr) -#define readq_relaxed(addr) __raw_readq(addr) -#define writeb_relaxed(b, addr) __raw_writeb(b, addr) -#define writew_relaxed(b, addr) __raw_writew(b, addr) -#define writel_relaxed(b, addr) __raw_writel(b, addr) -#define writeq_relaxed(b, addr) __raw_writeq(b, addr) + +extern u8 readb_relaxed(const volatile void __iomem *addr); +extern u16 readw_relaxed(const volatile void __iomem *addr); +extern u32 readl_relaxed(const volatile void __iomem *addr); +extern u64 readq_relaxed(const volatile void __iomem *addr); + +#if IO_CONCAT(__IO_PREFIX,trivial_io_bw) +extern inline u8 readb_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readb(addr); +} + +extern inline u16 readw_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readw(addr); +} +#endif + +#if IO_CONCAT(__IO_PREFIX,trivial_io_lq) +extern inline u32 readl_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readl(addr); +} + +extern inline u64 readq_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readq(addr); +} +#endif + +#define writeb_relaxed writeb +#define writeb_relaxew writew +#define writeb_relaxel writel +#define writeb_relaxeq writeq /* * String version of IO memory access ops: Index: linux-stable/arch/alpha/kernel/io.c =================================================================== --- linux-stable.orig/arch/alpha/kernel/io.c 2020-05-25 15:36:16.000000000 +0200 +++ linux-stable/arch/alpha/kernel/io.c 2020-05-25 17:47:02.000000000 +0200 @@ -16,21 +16,27 @@ unsigned int ioread8(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); mb(); return ret; } unsigned int ioread16(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); mb(); return ret; } unsigned int ioread32(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); mb(); return ret; } @@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq); u8 readb(const volatile void __iomem *addr) { - u8 ret = __raw_readb(addr); + u8 ret; + mb(); + ret = __raw_readb(addr); mb(); return ret; } u16 readw(const volatile void __iomem *addr) { - u16 ret = __raw_readw(addr); + u16 ret; + mb(); + ret = __raw_readw(addr); mb(); return ret; } u32 readl(const volatile void __iomem *addr) { - u32 ret = __raw_readl(addr); + u32 ret; + mb(); + ret = __raw_readl(addr); mb(); return ret; } u64 readq(const volatile void __iomem *addr) { - u64 ret = __raw_readq(addr); + u64 ret; + mb(); + ret = __raw_readq(addr); mb(); return ret; } @@ -207,6 +221,38 @@ EXPORT_SYMBOL(writew); EXPORT_SYMBOL(writel); EXPORT_SYMBOL(writeq); +/* + * The _relaxed functions must be ordered w.r.t. each other, but they don't + * have to be ordered w.r.t. other memory accesses. + */ +u8 readb_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readb(addr); +} + +u16 readw_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readw(addr); +} + +u32 readl_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readl(addr); +} + +u64 readq_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readq(addr); +} + +EXPORT_SYMBOL(readb_relaxed); +EXPORT_SYMBOL(readw_relaxed); +EXPORT_SYMBOL(readl_relaxed); +EXPORT_SYMBOL(readq_relaxed); /* * Read COUNT 8-bit bytes from port PORT into memory starting at SRC. ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v7] alpha: fix memory barriers so that they conform to the specification 2020-05-25 15:53 ` [PATCH v6] " Mikulas Patocka @ 2020-05-26 14:47 ` Mikulas Patocka 2020-05-27 0:18 ` Maciej W. Rozycki 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-26 14:47 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti platform. The patches move memory barriers after a write before the write. The result is that if there's iowrite followed by ioread, there is no barrier between them. The Alpha architecture allows reordering of the accesses to the I/O space, and the missing barrier between write and read causes hang with serial port and real time clock. This patch makes barriers confiorm to the specification. 1. We add mb() before readX_relaxed and writeX_relaxed - memory-barriers.txt claims that these functions must be ordered w.r.t. each other. Alpha doesn't order them, so we need an explicit barrier. 2. We add mb() before reads from the I/O space - so that if there's a write followed by a read, there should be a barrier between them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> --- arch/alpha/include/asm/io.h | 74 +++++++++++++++++++++++++++++++++++--------- arch/alpha/kernel/io.c | 60 +++++++++++++++++++++++++++++++---- 2 files changed, 112 insertions(+), 22 deletions(-) Index: linux-stable/arch/alpha/include/asm/io.h =================================================================== --- linux-stable.orig/arch/alpha/include/asm/io.h 2020-05-25 15:36:16.000000000 +0200 +++ linux-stable/arch/alpha/include/asm/io.h 2020-05-26 16:32:42.000000000 +0200 @@ -310,14 +310,18 @@ static inline int __is_mmio(const volati #if IO_CONCAT(__IO_PREFIX,trivial_io_bw) extern inline unsigned int ioread8(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); mb(); return ret; } extern inline unsigned int ioread16(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); mb(); return ret; } @@ -358,7 +362,9 @@ extern inline void outw(u16 b, unsigned #if IO_CONCAT(__IO_PREFIX,trivial_io_lq) extern inline unsigned int ioread32(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); mb(); return ret; } @@ -403,14 +409,18 @@ extern inline void __raw_writew(u16 b, v extern inline u8 readb(const volatile void __iomem *addr) { - u8 ret = __raw_readb(addr); + u8 ret; + mb(); + ret = __raw_readb(addr); mb(); return ret; } extern inline u16 readw(const volatile void __iomem *addr) { - u16 ret = __raw_readw(addr); + u16 ret; + mb(); + ret = __raw_readw(addr); mb(); return ret; } @@ -451,14 +461,18 @@ extern inline void __raw_writeq(u64 b, v extern inline u32 readl(const volatile void __iomem *addr) { - u32 ret = __raw_readl(addr); + u32 ret; + mb(); + ret = __raw_readl(addr); mb(); return ret; } extern inline u64 readq(const volatile void __iomem *addr) { - u64 ret = __raw_readq(addr); + u64 ret; + mb(); + ret = __raw_readq(addr); mb(); return ret; } @@ -487,14 +501,44 @@ extern inline void writeq(u64 b, volatil #define outb_p outb #define outw_p outw #define outl_p outl -#define readb_relaxed(addr) __raw_readb(addr) -#define readw_relaxed(addr) __raw_readw(addr) -#define readl_relaxed(addr) __raw_readl(addr) -#define readq_relaxed(addr) __raw_readq(addr) -#define writeb_relaxed(b, addr) __raw_writeb(b, addr) -#define writew_relaxed(b, addr) __raw_writew(b, addr) -#define writel_relaxed(b, addr) __raw_writel(b, addr) -#define writeq_relaxed(b, addr) __raw_writeq(b, addr) + +extern u8 readb_relaxed(const volatile void __iomem *addr); +extern u16 readw_relaxed(const volatile void __iomem *addr); +extern u32 readl_relaxed(const volatile void __iomem *addr); +extern u64 readq_relaxed(const volatile void __iomem *addr); + +#if IO_CONCAT(__IO_PREFIX,trivial_io_bw) +extern inline u8 readb_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readb(addr); +} + +extern inline u16 readw_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readw(addr); +} +#endif + +#if IO_CONCAT(__IO_PREFIX,trivial_io_lq) +extern inline u32 readl_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readl(addr); +} + +extern inline u64 readq_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readq(addr); +} +#endif + +#define writeb_relaxed writeb +#define writew_relaxed writew +#define writel_relaxed writel +#define writeq_relaxed writeq /* * String version of IO memory access ops: Index: linux-stable/arch/alpha/kernel/io.c =================================================================== --- linux-stable.orig/arch/alpha/kernel/io.c 2020-05-25 15:36:16.000000000 +0200 +++ linux-stable/arch/alpha/kernel/io.c 2020-05-25 17:47:02.000000000 +0200 @@ -16,21 +16,27 @@ unsigned int ioread8(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr); mb(); return ret; } unsigned int ioread16(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr); mb(); return ret; } unsigned int ioread32(void __iomem *addr) { - unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); + unsigned int ret; + mb(); + ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr); mb(); return ret; } @@ -148,28 +154,36 @@ EXPORT_SYMBOL(__raw_writeq); u8 readb(const volatile void __iomem *addr) { - u8 ret = __raw_readb(addr); + u8 ret; + mb(); + ret = __raw_readb(addr); mb(); return ret; } u16 readw(const volatile void __iomem *addr) { - u16 ret = __raw_readw(addr); + u16 ret; + mb(); + ret = __raw_readw(addr); mb(); return ret; } u32 readl(const volatile void __iomem *addr) { - u32 ret = __raw_readl(addr); + u32 ret; + mb(); + ret = __raw_readl(addr); mb(); return ret; } u64 readq(const volatile void __iomem *addr) { - u64 ret = __raw_readq(addr); + u64 ret; + mb(); + ret = __raw_readq(addr); mb(); return ret; } @@ -207,6 +221,38 @@ EXPORT_SYMBOL(writew); EXPORT_SYMBOL(writel); EXPORT_SYMBOL(writeq); +/* + * The _relaxed functions must be ordered w.r.t. each other, but they don't + * have to be ordered w.r.t. other memory accesses. + */ +u8 readb_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readb(addr); +} + +u16 readw_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readw(addr); +} + +u32 readl_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readl(addr); +} + +u64 readq_relaxed(const volatile void __iomem *addr) +{ + mb(); + return __raw_readq(addr); +} + +EXPORT_SYMBOL(readb_relaxed); +EXPORT_SYMBOL(readw_relaxed); +EXPORT_SYMBOL(readl_relaxed); +EXPORT_SYMBOL(readq_relaxed); /* * Read COUNT 8-bit bytes from port PORT into memory starting at SRC. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v7] alpha: fix memory barriers so that they conform to the specification 2020-05-26 14:47 ` [PATCH v7] " Mikulas Patocka @ 2020-05-27 0:18 ` Maciej W. Rozycki 2020-06-08 6:58 ` Mikulas Patocka 0 siblings, 1 reply; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-27 0:18 UTC (permalink / raw) To: Mikulas Patocka Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Tue, 26 May 2020, Mikulas Patocka wrote: > This patch makes barriers confiorm to the specification. > > 1. We add mb() before readX_relaxed and writeX_relaxed - > memory-barriers.txt claims that these functions must be ordered w.r.t. > each other. Alpha doesn't order them, so we need an explicit barrier. > 2. We add mb() before reads from the I/O space - so that if there's a > write followed by a read, there should be a barrier between them. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") > Cc: stable@vger.kernel.org # v4.17+ > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> LGTM, thanks for persistence! Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v7] alpha: fix memory barriers so that they conform to the specification 2020-05-27 0:18 ` Maciej W. Rozycki @ 2020-06-08 6:58 ` Mikulas Patocka 2020-06-08 23:49 ` Matt Turner 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-06-08 6:58 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Wed, 27 May 2020, Maciej W. Rozycki wrote: > On Tue, 26 May 2020, Mikulas Patocka wrote: > > > This patch makes barriers confiorm to the specification. > > > > 1. We add mb() before readX_relaxed and writeX_relaxed - > > memory-barriers.txt claims that these functions must be ordered w.r.t. > > each other. Alpha doesn't order them, so we need an explicit barrier. > > 2. We add mb() before reads from the I/O space - so that if there's a > > write followed by a read, there should be a barrier between them. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") > > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") > > Cc: stable@vger.kernel.org # v4.17+ > > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> > > LGTM, thanks for persistence! > > Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> > > Maciej Hi Will you submit the patch to Linus' tree in this merge window? Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v7] alpha: fix memory barriers so that they conform to the specification 2020-06-08 6:58 ` Mikulas Patocka @ 2020-06-08 23:49 ` Matt Turner 0 siblings, 0 replies; 45+ messages in thread From: Matt Turner @ 2020-06-08 23:49 UTC (permalink / raw) To: Mikulas Patocka Cc: Maciej W. Rozycki, Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sun, Jun 7, 2020 at 11:58 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > Will you submit the patch to Linus' tree in this merge window? > > Mikulas I will do it. Thank you, Mikulas! ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-25 14:45 ` Maciej W. Rozycki 2020-05-25 15:53 ` [PATCH v6] " Mikulas Patocka @ 2020-05-25 15:54 ` Mikulas Patocka 2020-05-25 16:39 ` Maciej W. Rozycki 1 sibling, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-25 15:54 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Mon, 25 May 2020, Maciej W. Rozycki wrote: > On Mon, 25 May 2020, Mikulas Patocka wrote: > > > The functions __raw_read* are already extern inline, so the compiler will > > inline/noinline them depending on the macros trivial_io_bw and > > trivial_io_lq - so we can just call them from read*_relaxed without > > repeating the extern inline pattern. > > The whole point of this peculiar arrangement for cooked accessors is to > avoid having barriers inserted inline around out-of-line calls to raw > accessors, I see, but why do we want to avoid that? Linux kernel has no binary compatibility, so it doesn't matter if the barriers are inlined in the drivers or not. Anyway, I've sent a next version of the patch that makes read*_relaxed extern inline. Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-25 15:54 ` [PATCH v5] " Mikulas Patocka @ 2020-05-25 16:39 ` Maciej W. Rozycki 2020-05-26 14:48 ` Mikulas Patocka 0 siblings, 1 reply; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-25 16:39 UTC (permalink / raw) To: Mikulas Patocka Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Mon, 25 May 2020, Mikulas Patocka wrote: > > > The functions __raw_read* are already extern inline, so the compiler will > > > inline/noinline them depending on the macros trivial_io_bw and > > > trivial_io_lq - so we can just call them from read*_relaxed without > > > repeating the extern inline pattern. > > > > The whole point of this peculiar arrangement for cooked accessors is to > > avoid having barriers inserted inline around out-of-line calls to raw > > accessors, > > I see, but why do we want to avoid that? Linux kernel has no binary > compatibility, so it doesn't matter if the barriers are inlined in the > drivers or not. It does matter as it expands code unnecessarily (at all call sites), as I noted in the original review. This has nothing to do with compatibility. > Anyway, I've sent a next version of the patch that makes read*_relaxed > extern inline. Thanks, I'll have a look. And now that you have this update, please run `size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the difference is between `read*_relaxed' handlers `static inline' and keyed with `*trivial_rw_*'. Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-25 16:39 ` Maciej W. Rozycki @ 2020-05-26 14:48 ` Mikulas Patocka 2020-05-27 0:23 ` Maciej W. Rozycki 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-26 14:48 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Mon, 25 May 2020, Maciej W. Rozycki wrote: > On Mon, 25 May 2020, Mikulas Patocka wrote: > > > > > The functions __raw_read* are already extern inline, so the compiler will > > > > inline/noinline them depending on the macros trivial_io_bw and > > > > trivial_io_lq - so we can just call them from read*_relaxed without > > > > repeating the extern inline pattern. > > > > > > The whole point of this peculiar arrangement for cooked accessors is to > > > avoid having barriers inserted inline around out-of-line calls to raw > > > accessors, > > > > I see, but why do we want to avoid that? Linux kernel has no binary > > compatibility, so it doesn't matter if the barriers are inlined in the > > drivers or not. > > It does matter as it expands code unnecessarily (at all call sites), as I > noted in the original review. This has nothing to do with compatibility. > > > Anyway, I've sent a next version of the patch that makes read*_relaxed > > extern inline. > > Thanks, I'll have a look. And now that you have this update, please run > `size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the > difference is between `read*_relaxed' handlers `static inline' and keyed > with `*trivial_rw_*'. > > Maciej The patch with static inline: text data bss dec hex filename 124207762 75953010 5426432 205587204 c410304 vmlinux The patch with extern inline: text data bss dec hex filename 124184422 75953474 5426432 205564328 c40a9a8 vmlinux (I've sent version 7 of the patch because I misnamed the "write_relaxed" function in version 6). Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5] alpha: fix memory barriers so that they conform to the specification 2020-05-26 14:48 ` Mikulas Patocka @ 2020-05-27 0:23 ` Maciej W. Rozycki 0 siblings, 0 replies; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-27 0:23 UTC (permalink / raw) To: Mikulas Patocka Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Tue, 26 May 2020, Mikulas Patocka wrote: > > Thanks, I'll have a look. And now that you have this update, please run > > `size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the > > difference is between `read*_relaxed' handlers `static inline' and keyed > > with `*trivial_rw_*'. > > > > Maciej > > The patch with static inline: > text data bss dec hex filename > 124207762 75953010 5426432 205587204 c410304 vmlinux > > The patch with extern inline: > text data bss dec hex filename > 124184422 75953474 5426432 205564328 c40a9a8 vmlinux I'm not sure why data grew, but still the gain in size is maybe small, yet it is there. > (I've sent version 7 of the patch because I misnamed the "write_relaxed" > function in version 6). I have reviewed v7 now, thanks for your effort! Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification 2020-05-23 15:34 ` Mikulas Patocka 2020-05-23 15:37 ` [PATCH v5] " Mikulas Patocka @ 2020-05-23 16:44 ` Maciej W. Rozycki 2020-05-23 17:09 ` Mikulas Patocka 1 sibling, 1 reply; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-23 16:44 UTC (permalink / raw) To: Mikulas Patocka Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sat, 23 May 2020, Mikulas Patocka wrote: > ... and I also broke the *_relaxed macros and didn't notice it, because > they are unused in my config. This won't compile, because mb() is a > statement, not a function. > > > > +#define readb_relaxed(addr) (mb(), __raw_readb(addr)) A statement expression would do though, e.g.: #define readb_relaxed(addr) ({ mb(); __raw_readb(addr); }) and might be preferable for code brevity to adding a zillion of inline functions. Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification 2020-05-23 16:44 ` [PATCH v4] " Maciej W. Rozycki @ 2020-05-23 17:09 ` Mikulas Patocka 2020-05-23 19:27 ` Maciej W. Rozycki 0 siblings, 1 reply; 45+ messages in thread From: Mikulas Patocka @ 2020-05-23 17:09 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sat, 23 May 2020, Maciej W. Rozycki wrote: > On Sat, 23 May 2020, Mikulas Patocka wrote: > > > ... and I also broke the *_relaxed macros and didn't notice it, because > > they are unused in my config. This won't compile, because mb() is a > > statement, not a function. > > > > > > +#define readb_relaxed(addr) (mb(), __raw_readb(addr)) > > A statement expression would do though, e.g.: > > #define readb_relaxed(addr) ({ mb(); __raw_readb(addr); }) > > and might be preferable for code brevity to adding a zillion of inline > functions. > > Maciej I know, but that file uses inline functions everywhere else, so I wanted to make it consistent. Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification 2020-05-23 17:09 ` Mikulas Patocka @ 2020-05-23 19:27 ` Maciej W. Rozycki 2020-05-23 20:11 ` Mikulas Patocka 0 siblings, 1 reply; 45+ messages in thread From: Maciej W. Rozycki @ 2020-05-23 19:27 UTC (permalink / raw) To: Mikulas Patocka Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sat, 23 May 2020, Mikulas Patocka wrote: > > A statement expression would do though, e.g.: > > > > #define readb_relaxed(addr) ({ mb(); __raw_readb(addr); }) > > > > and might be preferable for code brevity to adding a zillion of inline > > functions. > > > > Maciej > > I know, but that file uses inline functions everywhere else, so I wanted > to make it consistent. Fair enough, fine with me. I still can't access my Alpha system, have you verified your latest version at run time? Maciej ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v4] alpha: fix memory barriers so that they conform to the specification 2020-05-23 19:27 ` Maciej W. Rozycki @ 2020-05-23 20:11 ` Mikulas Patocka 0 siblings, 0 replies; 45+ messages in thread From: Mikulas Patocka @ 2020-05-23 20:11 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Maciej W. Rozycki, Arnd Bergmann, Richard Henderson, Matt Turner, Greg Kroah-Hartman, alpha, linux-serial, linux-rtc On Sat, 23 May 2020, Maciej W. Rozycki wrote: > On Sat, 23 May 2020, Mikulas Patocka wrote: > > > > A statement expression would do though, e.g.: > > > > > > #define readb_relaxed(addr) ({ mb(); __raw_readb(addr); }) > > > > > > and might be preferable for code brevity to adding a zillion of inline > > > functions. > > > > > > Maciej > > > > I know, but that file uses inline functions everywhere else, so I wanted > > to make it consistent. > > Fair enough, fine with me. I still can't access my Alpha system, have > you verified your latest version at run time? > > Maciej Yes - it runs without any hang. Mikulas ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2020-06-08 23:49 UTC | newest] Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-06 11:21 [PATCH 1/2] alpha: add a delay between RTC port write and read Mikulas Patocka 2020-05-06 14:20 ` Arnd Bergmann 2020-05-06 17:12 ` [PATCH 1/2 v2] alpha: add a delay to inb_p, inb_w and inb_l Mikulas Patocka 2020-05-07 8:06 ` [PATCH 1/2 v3] " Mikulas Patocka 2020-05-07 8:20 ` Greg Kroah-Hartman 2020-05-07 10:53 ` Mikulas Patocka 2020-05-07 13:30 ` Arnd Bergmann 2020-05-07 14:09 ` Mikulas Patocka 2020-05-07 15:08 ` Arnd Bergmann 2020-05-07 15:45 ` Mikulas Patocka 2020-05-07 15:46 ` [PATCH v4] alpha: add a barrier after outb, outw and outl Mikulas Patocka 2020-05-07 19:12 ` Arnd Bergmann 2020-05-10 1:27 ` Maciej W. Rozycki 2020-05-10 1:25 ` [PATCH 1/2 v3] alpha: add a delay to inb_p, inb_w and inb_l Maciej W. Rozycki 2020-05-10 18:50 ` Mikulas Patocka 2020-05-11 14:58 ` Maciej W. Rozycki 2020-05-12 19:35 ` Mikulas Patocka 2020-05-13 14:41 ` Ivan Kokshaysky 2020-05-13 16:13 ` Greg Kroah-Hartman 2020-05-13 17:17 ` Maciej W. Rozycki 2020-05-22 13:03 ` Mikulas Patocka 2020-05-22 13:37 ` Maciej W. Rozycki 2020-05-22 13:26 ` Mikulas Patocka 2020-05-22 20:00 ` Mikulas Patocka 2020-05-23 10:26 ` [PATCH v4] alpha: fix memory barriers so that they conform to the specification Mikulas Patocka 2020-05-23 15:10 ` Ivan Kokshaysky 2020-05-23 15:34 ` Mikulas Patocka 2020-05-23 15:37 ` [PATCH v5] " Mikulas Patocka 2020-05-24 14:54 ` Maciej W. Rozycki 2020-05-25 13:56 ` Mikulas Patocka 2020-05-25 14:07 ` Arnd Bergmann 2020-05-25 14:45 ` Maciej W. Rozycki 2020-05-25 15:53 ` [PATCH v6] " Mikulas Patocka 2020-05-26 14:47 ` [PATCH v7] " Mikulas Patocka 2020-05-27 0:18 ` Maciej W. Rozycki 2020-06-08 6:58 ` Mikulas Patocka 2020-06-08 23:49 ` Matt Turner 2020-05-25 15:54 ` [PATCH v5] " Mikulas Patocka 2020-05-25 16:39 ` Maciej W. Rozycki 2020-05-26 14:48 ` Mikulas Patocka 2020-05-27 0:23 ` Maciej W. Rozycki 2020-05-23 16:44 ` [PATCH v4] " Maciej W. Rozycki 2020-05-23 17:09 ` Mikulas Patocka 2020-05-23 19:27 ` Maciej W. Rozycki 2020-05-23 20:11 ` Mikulas Patocka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).