[1/2] asm-generic/io: Pass result on inX() accessor to __io_par()
diff mbox series

Message ID 20190211174544.4302-2-will.deacon@arm.com
State Superseded
Headers show
Series
  • Ensure inX() is ordered wrt delay() routines
Related show

Commit Message

Will Deacon Feb. 11, 2019, 5:45 p.m. UTC
The inX() I/O accessors must enforce ordering against subsequent calls
to the delay() routines, so that a read-back from a device can be used
to postpone a subsequent write to the same device.

On some architectures, including arm64, this ordering can only be
achieved by creating a dependency on the value returned by the inX()
operation, so we need to pass the value we read to the __io_par()
macro in this case.

Reported-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/io.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Arnd Bergmann Feb. 12, 2019, 11:55 a.m. UTC | #1
On Mon, Feb 11, 2019 at 6:45 PM Will Deacon <will.deacon@arm.com> wrote:
>
> The inX() I/O accessors must enforce ordering against subsequent calls
> to the delay() routines, so that a read-back from a device can be used
> to postpone a subsequent write to the same device.
>
> On some architectures, including arm64, this ordering can only be
> achieved by creating a dependency on the value returned by the inX()
> operation, so we need to pass the value we read to the __io_par()
> macro in this case.
>
> Reported-by: Andrew Murray <andrew.murray@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  include/asm-generic/io.h | 8 ++++----

For changing the asm-generic file in the arm64 tree,

Acked-by: Arnd Bergmann <arnd@arndb.de>

For all I can see, this should not conflict with the usage of the
same macros on RISC-V, though it does make add a significant
difference, so I'd like to see an Ack from the RISC-V folks as
well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
to do a corresponding change.

       Arnd
Will Deacon Feb. 13, 2019, 5:46 p.m. UTC | #2
On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
> On Mon, Feb 11, 2019 at 6:45 PM Will Deacon <will.deacon@arm.com> wrote:
> >
> > The inX() I/O accessors must enforce ordering against subsequent calls
> > to the delay() routines, so that a read-back from a device can be used
> > to postpone a subsequent write to the same device.
> >
> > On some architectures, including arm64, this ordering can only be
> > achieved by creating a dependency on the value returned by the inX()
> > operation, so we need to pass the value we read to the __io_par()
> > macro in this case.
> >
> > Reported-by: Andrew Murray <andrew.murray@arm.com>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> >  include/asm-generic/io.h | 8 ++++----
> 
> For changing the asm-generic file in the arm64 tree,
> 
> Acked-by: Arnd Bergmann <arnd@arndb.de>

Thanks, Arnd.

> For all I can see, this should not conflict with the usage of the
> same macros on RISC-V, though it does make add a significant
> difference, so I'd like to see an Ack from the RISC-V folks as
> well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
> to do a corresponding change.

There's already a comment in that header which says that the accesses are
ordered wrt timer reads, so I don't think anything needs to change there.
For consistency with the macro arguments, I could augment their __io_par to
take the read value as an unused argument, if that's what you mean?

Will
Arnd Bergmann Feb. 13, 2019, 8:59 p.m. UTC | #3
On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote:

> On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
>
> > For all I can see, this should not conflict with the usage of the
> > same macros on RISC-V, though it does make add a significant
> > difference, so I'd like to see an Ack from the RISC-V folks as
> > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
> > to do a corresponding change.
>
> There's already a comment in that header which says that the accesses are
> ordered wrt timer reads, so I don't think anything needs to change there.
> For consistency with the macro arguments, I could augment their __io_par to
> take the read value as an unused argument, if that's what you mean?

Yes, that's what I meant, I should have been clearer there.

     Arnd
Palmer Dabbelt Feb. 13, 2019, 9:57 p.m. UTC | #4
On Wed, 13 Feb 2019 12:59:28 PST (-0800), Arnd Bergmann wrote:
> On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote:
>
>> On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
>>
>> > For all I can see, this should not conflict with the usage of the
>> > same macros on RISC-V, though it does make add a significant
>> > difference, so I'd like to see an Ack from the RISC-V folks as
>> > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
>> > to do a corresponding change.

Thanks, the original patches didn't make it through my filters.

>> There's already a comment in that header which says that the accesses are
>> ordered wrt timer reads, so I don't think anything needs to change there.
>> For consistency with the macro arguments, I could augment their __io_par to
>> take the read value as an unused argument, if that's what you mean?

FWIW, we don't really have a way to mandate this in the ISA yet as there's no 
formal model for either CSR orderings or the IO memory space.  

> Yes, that's what I meant, I should have been clearer there.

That sounds reasonable to me.  It looks like we can also go ahead and delete a 
bunch of arch/riscv/include/asm/io.h now that this stuff is in asm-generic, 
which would cause us to actually start using these things.  I didn't know this 
had all been moved to asm-generic otherwise I would have cleaned this up 
earlier.

I think this should do it, but this does bring up a bit of an issue: the RISC-V 
versions of reads and friends put barriers outside the loop, while the 
asm-generic version don't.  What are these actually supposed to do?

Either way that resolves, feel free to consider something like

diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
index b269451e7e85..378975f180a7 100644
--- a/arch/riscv/include/asm/io.h
+++ b/arch/riscv/include/asm/io.h
@@ -198,20 +198,20 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
  * writes.
  */
 #define __io_pbr()	__asm__ __volatile__ ("fence io,i"  : : : "memory");
-#define __io_par()	__asm__ __volatile__ ("fence i,ior" : : : "memory");
+#define __io_par(v)	__asm__ __volatile__ ("fence i,ior" : : : "memory");
 #define __io_pbw()	__asm__ __volatile__ ("fence iow,o" : : : "memory");
 #define __io_paw()	__asm__ __volatile__ ("fence o,io"  : : : "memory");

-#define inb(c)		({ u8  __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
-#define inw(c)		({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
-#define inl(c)		({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
+#define inb(c)		({ u8  __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
+#define inw(c)		({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
+#define inl(c)		({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })

 #define outb(v,c)	({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
 #define outw(v,c)	({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
 #define outl(v,c)	({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })

 #ifdef CONFIG_64BIT
-#define inq(c)		({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(); __v; })
+#define inq(c)		({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(__v); __v; })
 #define outq(v,c)	({ __io_pbw(); writeq_cpu((v),(void*)(c)); __io_paw(); })
 #endif

@@ -261,9 +261,9 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar())
 #define readsw(addr, buffer, count) __readsw(addr, buffer, count)
 #define readsl(addr, buffer, count) __readsl(addr, buffer, count)

-__io_reads_ins(ins,  u8, b, __io_pbr(), __io_par())
-__io_reads_ins(ins, u16, w, __io_pbr(), __io_par())
-__io_reads_ins(ins, u32, l, __io_pbr(), __io_par())
+__io_reads_ins(ins,  u8, b, __io_pbr(), __io_par(addr))
+__io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr))
+__io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr))
 #define insb(addr, buffer, count) __insb((void __iomem *)(long)addr, buffer, count)
 #define insw(addr, buffer, count) __insw((void __iomem *)(long)addr, buffer, count)
 #define insl(addr, buffer, count) __insl((void __iomem *)(long)addr, buffer, count)

as

Revewied-by: Palmer Dabbelt <palmer@sifive.com>

when included along with the other diff.  That way we can at least keep the 
macro signatures matching, the cleanup can come later...

Thanks!
Will Deacon Feb. 18, 2019, 3:56 p.m. UTC | #5
On Wed, Feb 13, 2019 at 01:57:50PM -0800, Palmer Dabbelt wrote:
> On Wed, 13 Feb 2019 12:59:28 PST (-0800), Arnd Bergmann wrote:
> > On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote:
> > 
> > > On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote:
> > > 
> > > > For all I can see, this should not conflict with the usage of the
> > > > same macros on RISC-V, though it does make add a significant
> > > > difference, so I'd like to see an Ack from the RISC-V folks as
> > > > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h
> > > > to do a corresponding change.
> 
> Thanks, the original patches didn't make it through my filters.
> 
> > > There's already a comment in that header which says that the accesses are
> > > ordered wrt timer reads, so I don't think anything needs to change there.
> > > For consistency with the macro arguments, I could augment their __io_par to
> > > take the read value as an unused argument, if that's what you mean?
> 
> FWIW, we don't really have a way to mandate this in the ISA yet as there's
> no formal model for either CSR orderings or the IO memory space.

Ah, so you may end up needing the dependency trick too, depending on where
you land with the ISA.

> > Yes, that's what I meant, I should have been clearer there.
> 
> That sounds reasonable to me.  It looks like we can also go ahead and delete
> a bunch of arch/riscv/include/asm/io.h now that this stuff is in
> asm-generic, which would cause us to actually start using these things.  I
> didn't know this had all been moved to asm-generic otherwise I would have
> cleaned this up earlier.
> 
> I think this should do it, but this does bring up a bit of an issue: the
> RISC-V versions of reads and friends put barriers outside the loop, while
> the asm-generic version don't.  What are these actually supposed to do?

You're referring to the string accessors (e.g. insb() and readsw()), right?
arm and arm64 don't provide barriers here either, and I don't think they
should have to given that these routines are usually used to poll data
register-based FIFOs and therefore don't need to provide ordering guarantees
against DMA operations. However, this is woefully undocumented and I shall
try to address this in the next version of my memory-barriers.txt patch
relating to this area [1].

> Either way that resolves, feel free to consider something like
> 
> diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
> index b269451e7e85..378975f180a7 100644
> --- a/arch/riscv/include/asm/io.h
> +++ b/arch/riscv/include/asm/io.h
> @@ -198,20 +198,20 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
>  * writes.
>  */
> #define __io_pbr()	__asm__ __volatile__ ("fence io,i"  : : : "memory");
> -#define __io_par()	__asm__ __volatile__ ("fence i,ior" : : : "memory");
> +#define __io_par(v)	__asm__ __volatile__ ("fence i,ior" : : : "memory");
> #define __io_pbw()	__asm__ __volatile__ ("fence iow,o" : : : "memory");
> #define __io_paw()	__asm__ __volatile__ ("fence o,io"  : : : "memory");
> 
> -#define inb(c)		({ u8  __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
> -#define inw(c)		({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
> -#define inl(c)		({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; })
> +#define inb(c)		({ u8  __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
> +#define inw(c)		({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
> +#define inl(c)		({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
> 
> #define outb(v,c)	({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
> #define outw(v,c)	({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
> #define outl(v,c)	({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
> 
> #ifdef CONFIG_64BIT
> -#define inq(c)		({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(); __v; })
> +#define inq(c)		({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(__v); __v; })
> #define outq(v,c)	({ __io_pbw(); writeq_cpu((v),(void*)(c)); __io_paw(); })
> #endif
> 
> @@ -261,9 +261,9 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar())
> #define readsw(addr, buffer, count) __readsw(addr, buffer, count)
> #define readsl(addr, buffer, count) __readsl(addr, buffer, count)
> 
> -__io_reads_ins(ins,  u8, b, __io_pbr(), __io_par())
> -__io_reads_ins(ins, u16, w, __io_pbr(), __io_par())
> -__io_reads_ins(ins, u32, l, __io_pbr(), __io_par())
> +__io_reads_ins(ins,  u8, b, __io_pbr(), __io_par(addr))
> +__io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr))
> +__io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr))
> #define insb(addr, buffer, count) __insb((void __iomem *)(long)addr, buffer, count)
> #define insw(addr, buffer, count) __insw((void __iomem *)(long)addr, buffer, count)
> #define insl(addr, buffer, count) __insl((void __iomem *)(long)addr, buffer, count)
> 
> as
> 
> Revewied-by: Palmer Dabbelt <palmer@sifive.com>
> 
> when included along with the other diff.  That way we can at least keep the
> macro signatures matching, the cleanup can come later...

Thanks, Palmer! I'll send a v2 of this patch, updated to fix up insq() as
well as the readX() macros too, since they're likely to suffer the exact
same issues as inX() in this regard.

Will

[1] https://lkml.org/lkml/2019/2/11/1803

Patch
diff mbox series

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index d356f802945a..b5737c0d8316 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -65,7 +65,7 @@ 
 #endif
 
 #ifndef __io_par
-#define __io_par()     __io_ar()
+#define __io_par(v)     __io_ar()
 #endif
 
 
@@ -471,7 +471,7 @@  static inline u8 inb(unsigned long addr)
 
 	__io_pbr();
 	val = __raw_readb(PCI_IOBASE + addr);
-	__io_par();
+	__io_par(val);
 	return val;
 }
 #endif
@@ -484,7 +484,7 @@  static inline u16 inw(unsigned long addr)
 
 	__io_pbr();
 	val = __le16_to_cpu(__raw_readw(PCI_IOBASE + addr));
-	__io_par();
+	__io_par(val);
 	return val;
 }
 #endif
@@ -497,7 +497,7 @@  static inline u32 inl(unsigned long addr)
 
 	__io_pbr();
 	val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
-	__io_par();
+	__io_par(val);
 	return val;
 }
 #endif