linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging
@ 2021-10-15  9:05 Xiongfeng Wang
  2021-10-15 17:59 ` Catalin Marinas
  0 siblings, 1 reply; 6+ messages in thread
From: Xiongfeng Wang @ 2021-10-15  9:05 UTC (permalink / raw)
  To: mark.rutland, catalin.marinas, will
  Cc: linux-arm-kernel, linux-kernel, moyufeng, wangxiongfeng2

DGH (Data Gathering Hint) prohibits merging memory accesses with
Normal-NC or Device-GRE attributes before the hint instruction with any
memory accesses appearing after the hint instruction. It can be used for
performance benefit to prevent the prior access waiting too long just to
be merged. This patch provide a macro to expose it to the arch code.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>

---
v1->v2:
    extract this single patch from the origin patchset.
    remove the macro in the assemble code.
    add comments for macro dgh().
    modify the commit message a little bit.
---
 arch/arm64/include/asm/barrier.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 451e11e5fd23..d71a7457d619 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -18,6 +18,14 @@
 #define wfe()		asm volatile("wfe" : : : "memory")
 #define wfi()		asm volatile("wfi" : : : "memory")
 
+/*
+ * Data Gathering Hint:
+ * This instruction prohibits merging memory accesses with Normal-NC or
+ * Device-GRE attributes before the hint instruction with any memory accesses
+ * appearing after the hint instruction.
+ */
+#define dgh()		asm volatile("hint #6" : : : "memory")
+
 #define isb()		asm volatile("isb" : : : "memory")
 #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
 #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging
  2021-10-15  9:05 [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging Xiongfeng Wang
@ 2021-10-15 17:59 ` Catalin Marinas
  2021-10-22  1:51   ` Xiongfeng Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2021-10-15 17:59 UTC (permalink / raw)
  To: Xiongfeng Wang
  Cc: mark.rutland, will, linux-arm-kernel, linux-kernel, moyufeng

On Fri, Oct 15, 2021 at 05:05:11PM +0800, Xiongfeng Wang wrote:
> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> index 451e11e5fd23..d71a7457d619 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -18,6 +18,14 @@
>  #define wfe()		asm volatile("wfe" : : : "memory")
>  #define wfi()		asm volatile("wfi" : : : "memory")
>  
> +/*
> + * Data Gathering Hint:
> + * This instruction prohibits merging memory accesses with Normal-NC or
> + * Device-GRE attributes before the hint instruction with any memory accesses
> + * appearing after the hint instruction.
> + */
> +#define dgh()		asm volatile("hint #6" : : : "memory")

On its own, this patch doesn't do anything. It's more interesting to see
how it will be used and maybe come up with a common name that other
architectures would share (or just implement as no-opp). I'm not sure
there was any conclusion last time we discussed this.

-- 
Catalin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging
  2021-10-15 17:59 ` Catalin Marinas
@ 2021-10-22  1:51   ` Xiongfeng Wang
  2021-12-03 12:47     ` Catalin Marinas
  0 siblings, 1 reply; 6+ messages in thread
From: Xiongfeng Wang @ 2021-10-22  1:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: mark.rutland, will, linux-arm-kernel, linux-kernel, moyufeng

Hi Catalin,

On 2021/10/16 1:59, Catalin Marinas wrote:
> On Fri, Oct 15, 2021 at 05:05:11PM +0800, Xiongfeng Wang wrote:
>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> index 451e11e5fd23..d71a7457d619 100644
>> --- a/arch/arm64/include/asm/barrier.h
>> +++ b/arch/arm64/include/asm/barrier.h
>> @@ -18,6 +18,14 @@
>>  #define wfe()		asm volatile("wfe" : : : "memory")
>>  #define wfi()		asm volatile("wfi" : : : "memory")
>>  
>> +/*
>> + * Data Gathering Hint:
>> + * This instruction prohibits merging memory accesses with Normal-NC or
>> + * Device-GRE attributes before the hint instruction with any memory accesses
>> + * appearing after the hint instruction.
>> + */
>> +#define dgh()		asm volatile("hint #6" : : : "memory")
> 
> On its own, this patch doesn't do anything. It's more interesting to see
> how it will be used and maybe come up with a common name that other
> architectures would share (or just implement as no-opp). I'm not sure
> there was any conclusion last time we discussed this.

In the last mail, I was suggested to investigate the code in other architecture
to find if there exists similar interface. I searched 'merg' in the code and
didn't find similar interface.

The only thing similar I found is in Intel Software Developer's Manual. It says
"Write Combining (WC) ... Writes may be delayed and combined in the write
combining buffer (WC buffer) to reduce memory accesses. If the WC buffer is
partially filled, the writes may be delayed until the next occurrence of a
serializing event; such as an SFENCE or MFENCE instruction, CPUID or other
serializing instruction, a read or write to uncached memory, an interrupt
occurrence, or an execution of a LOCK instruction (including one with an
XACQUIRE or XRELEASE prefix)."
Maybe this is more like the write combine buffer flushing, not prevent merging.
Sorry I still didn't understand the difference clearly.

How about a common name called 'merge_prohibit_hint()'? Could you give me some
suggestions ?

Thanks,
Xiongfeng

> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging
  2021-10-22  1:51   ` Xiongfeng Wang
@ 2021-12-03 12:47     ` Catalin Marinas
  2021-12-14 10:46       ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2021-12-03 12:47 UTC (permalink / raw)
  To: Xiongfeng Wang
  Cc: mark.rutland, will, linux-arm-kernel, linux-kernel, moyufeng

Hi Xiongfeng,

On Fri, Oct 22, 2021 at 09:51:40AM +0800, Xiongfeng Wang wrote:
> On 2021/10/16 1:59, Catalin Marinas wrote:
> > On Fri, Oct 15, 2021 at 05:05:11PM +0800, Xiongfeng Wang wrote:
> >> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> >> index 451e11e5fd23..d71a7457d619 100644
> >> --- a/arch/arm64/include/asm/barrier.h
> >> +++ b/arch/arm64/include/asm/barrier.h
> >> @@ -18,6 +18,14 @@
> >>  #define wfe()		asm volatile("wfe" : : : "memory")
> >>  #define wfi()		asm volatile("wfi" : : : "memory")
> >>  
> >> +/*
> >> + * Data Gathering Hint:
> >> + * This instruction prohibits merging memory accesses with Normal-NC or
> >> + * Device-GRE attributes before the hint instruction with any memory accesses
> >> + * appearing after the hint instruction.
> >> + */
> >> +#define dgh()		asm volatile("hint #6" : : : "memory")
> > 
> > On its own, this patch doesn't do anything. It's more interesting to see
> > how it will be used and maybe come up with a common name that other
> > architectures would share (or just implement as no-opp). I'm not sure
> > there was any conclusion last time we discussed this.
> 
> In the last mail, I was suggested to investigate the code in other architecture
> to find if there exists similar interface. I searched 'merg' in the code and
> didn't find similar interface.

Maybe no other architecture has such hint. They have write buffer
draining but that's more expensive.

> The only thing similar I found is in Intel Software Developer's Manual. It says
> "Write Combining (WC) ... Writes may be delayed and combined in the write
> combining buffer (WC buffer) to reduce memory accesses. If the WC buffer is
> partially filled, the writes may be delayed until the next occurrence of a
> serializing event; such as an SFENCE or MFENCE instruction, CPUID or other
> serializing instruction, a read or write to uncached memory, an interrupt
> occurrence, or an execution of a LOCK instruction (including one with an
> XACQUIRE or XRELEASE prefix)."
> Maybe this is more like the write combine buffer flushing, not prevent merging.
> Sorry I still didn't understand the difference clearly.

IIUC those drivers on x86 just rely on the microarchitecture aspects of
the draining (e.g. fill 64 bytes). On Arm we don't have such guarantee
as there's a wide variation in implementations, hence the DGH
instruction.

> How about a common name called 'merge_prohibit_hint()'? Could you give me some
> suggestions ?

I think "prohibit" looks more like not allowing any write-buffer merge.
Maybe stop_merge_hint(), stop_write_buffer_merge(),
stop_write_combining() (any other ideas?). It would be a NOP on all
other architectures.

-- 
Catalin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging
  2021-12-03 12:47     ` Catalin Marinas
@ 2021-12-14 10:46       ` Will Deacon
  2021-12-14 11:20         ` Catalin Marinas
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2021-12-14 10:46 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Xiongfeng Wang, mark.rutland, linux-arm-kernel, linux-kernel, moyufeng

On Fri, Dec 03, 2021 at 12:47:47PM +0000, Catalin Marinas wrote:
> On Fri, Oct 22, 2021 at 09:51:40AM +0800, Xiongfeng Wang wrote:
> > On 2021/10/16 1:59, Catalin Marinas wrote:
> > > On Fri, Oct 15, 2021 at 05:05:11PM +0800, Xiongfeng Wang wrote:
> > >> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> > >> index 451e11e5fd23..d71a7457d619 100644
> > >> --- a/arch/arm64/include/asm/barrier.h
> > >> +++ b/arch/arm64/include/asm/barrier.h
> > >> @@ -18,6 +18,14 @@
> > >>  #define wfe()		asm volatile("wfe" : : : "memory")
> > >>  #define wfi()		asm volatile("wfi" : : : "memory")
> > >>  
> > >> +/*
> > >> + * Data Gathering Hint:
> > >> + * This instruction prohibits merging memory accesses with Normal-NC or
> > >> + * Device-GRE attributes before the hint instruction with any memory accesses
> > >> + * appearing after the hint instruction.
> > >> + */
> > >> +#define dgh()		asm volatile("hint #6" : : : "memory")
> > > 
> > > On its own, this patch doesn't do anything. It's more interesting to see
> > > how it will be used and maybe come up with a common name that other
> > > architectures would share (or just implement as no-opp). I'm not sure
> > > there was any conclusion last time we discussed this.
> > 
> > In the last mail, I was suggested to investigate the code in other architecture
> > to find if there exists similar interface. I searched 'merg' in the code and
> > didn't find similar interface.
> 
> Maybe no other architecture has such hint. They have write buffer
> draining but that's more expensive.
> 
> > The only thing similar I found is in Intel Software Developer's Manual. It says
> > "Write Combining (WC) ... Writes may be delayed and combined in the write
> > combining buffer (WC buffer) to reduce memory accesses. If the WC buffer is
> > partially filled, the writes may be delayed until the next occurrence of a
> > serializing event; such as an SFENCE or MFENCE instruction, CPUID or other
> > serializing instruction, a read or write to uncached memory, an interrupt
> > occurrence, or an execution of a LOCK instruction (including one with an
> > XACQUIRE or XRELEASE prefix)."
> > Maybe this is more like the write combine buffer flushing, not prevent merging.
> > Sorry I still didn't understand the difference clearly.
> 
> IIUC those drivers on x86 just rely on the microarchitecture aspects of
> the draining (e.g. fill 64 bytes). On Arm we don't have such guarantee
> as there's a wide variation in implementations, hence the DGH
> instruction.
> 
> > How about a common name called 'merge_prohibit_hint()'? Could you give me some
> > suggestions ?
> 
> I think "prohibit" looks more like not allowing any write-buffer merge.
> Maybe stop_merge_hint(), stop_write_buffer_merge(),
> stop_write_combining() (any other ideas?). It would be a NOP on all
> other architectures.

Given that DGH only affects prior memory accesses to Normal-NC or Device-GRE
memory, we probably want to avoid making this too broad. We don't use GRE in
Linux and I think Normal-NC is only exposed as an explicit part of the I/O
accessors in ioremap_wc(). So I'd be inclined to add something like:

	io_stop_wc()

with corresponding documentation in Documentation/memory-barriers.txt.

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging
  2021-12-14 10:46       ` Will Deacon
@ 2021-12-14 11:20         ` Catalin Marinas
  0 siblings, 0 replies; 6+ messages in thread
From: Catalin Marinas @ 2021-12-14 11:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: Xiongfeng Wang, mark.rutland, linux-arm-kernel, linux-kernel, moyufeng

On Tue, Dec 14, 2021 at 10:46:20AM +0000, Will Deacon wrote:
> On Fri, Dec 03, 2021 at 12:47:47PM +0000, Catalin Marinas wrote:
> > On Fri, Oct 22, 2021 at 09:51:40AM +0800, Xiongfeng Wang wrote:
> > > On 2021/10/16 1:59, Catalin Marinas wrote:
> > > > On Fri, Oct 15, 2021 at 05:05:11PM +0800, Xiongfeng Wang wrote:
> > > >> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> > > >> index 451e11e5fd23..d71a7457d619 100644
> > > >> --- a/arch/arm64/include/asm/barrier.h
> > > >> +++ b/arch/arm64/include/asm/barrier.h
> > > >> @@ -18,6 +18,14 @@
> > > >>  #define wfe()		asm volatile("wfe" : : : "memory")
> > > >>  #define wfi()		asm volatile("wfi" : : : "memory")
> > > >>  
> > > >> +/*
> > > >> + * Data Gathering Hint:
> > > >> + * This instruction prohibits merging memory accesses with Normal-NC or
> > > >> + * Device-GRE attributes before the hint instruction with any memory accesses
> > > >> + * appearing after the hint instruction.
> > > >> + */
> > > >> +#define dgh()		asm volatile("hint #6" : : : "memory")
> > > > 
> > > > On its own, this patch doesn't do anything. It's more interesting to see
> > > > how it will be used and maybe come up with a common name that other
> > > > architectures would share (or just implement as no-opp). I'm not sure
> > > > there was any conclusion last time we discussed this.
> > > 
> > > In the last mail, I was suggested to investigate the code in other architecture
> > > to find if there exists similar interface. I searched 'merg' in the code and
> > > didn't find similar interface.
> > 
> > Maybe no other architecture has such hint. They have write buffer
> > draining but that's more expensive.
> > 
> > > The only thing similar I found is in Intel Software Developer's Manual. It says
> > > "Write Combining (WC) ... Writes may be delayed and combined in the write
> > > combining buffer (WC buffer) to reduce memory accesses. If the WC buffer is
> > > partially filled, the writes may be delayed until the next occurrence of a
> > > serializing event; such as an SFENCE or MFENCE instruction, CPUID or other
> > > serializing instruction, a read or write to uncached memory, an interrupt
> > > occurrence, or an execution of a LOCK instruction (including one with an
> > > XACQUIRE or XRELEASE prefix)."
> > > Maybe this is more like the write combine buffer flushing, not prevent merging.
> > > Sorry I still didn't understand the difference clearly.
> > 
> > IIUC those drivers on x86 just rely on the microarchitecture aspects of
> > the draining (e.g. fill 64 bytes). On Arm we don't have such guarantee
> > as there's a wide variation in implementations, hence the DGH
> > instruction.
> > 
> > > How about a common name called 'merge_prohibit_hint()'? Could you give me some
> > > suggestions ?
> > 
> > I think "prohibit" looks more like not allowing any write-buffer merge.
> > Maybe stop_merge_hint(), stop_write_buffer_merge(),
> > stop_write_combining() (any other ideas?). It would be a NOP on all
> > other architectures.
> 
> Given that DGH only affects prior memory accesses to Normal-NC or Device-GRE
> memory, we probably want to avoid making this too broad. We don't use GRE in
> Linux and I think Normal-NC is only exposed as an explicit part of the I/O
> accessors in ioremap_wc(). So I'd be inclined to add something like:
> 
> 	io_stop_wc()
> 
> with corresponding documentation in Documentation/memory-barriers.txt.

Looks fine to me.

-- 
Catalin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-14 11:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15  9:05 [RFC PATCH v2] arm64: barrier: add macro dgh() to control memory accesses merging Xiongfeng Wang
2021-10-15 17:59 ` Catalin Marinas
2021-10-22  1:51   ` Xiongfeng Wang
2021-12-03 12:47     ` Catalin Marinas
2021-12-14 10:46       ` Will Deacon
2021-12-14 11:20         ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).