linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
@ 2024-03-15  2:45 Huacai Chen
  2024-03-20 10:27 ` Xi Ruoyao
  0 siblings, 1 reply; 3+ messages in thread
From: Huacai Chen @ 2024-03-15  2:45 UTC (permalink / raw)
  To: Arnd Bergmann, Huacai Chen
  Cc: loongarch, linux-arch, Xuefeng Li, Guo Ren, Xuerui Wang,
	Jiaxun Yang, linux-kernel, loongson-kernel, Huacai Chen, stable,
	Xiaotian Wu, Miao Wang, Xing Li, Hongchen Zhang, Rui Wang

From GCC commit 3f13154553f8546a ("df-scan: remove ad-hoc handling of
global regs in asms"), global registers will no longer be forced to add
to the def-use chain. Then current_thread_info(), current_stack_pointer
and __my_cpu_offset may be lifted out of the loop because they are no
longer treated as "volatile variables".

This optimization is still correct for the current_thread_info() and
current_stack_pointer usages because they are associated to a thread.
However it is wrong for __my_cpu_offset because it is associated to a
CPU rather than a thread: if the thread migrates to a different CPU in
the loop, __my_cpu_offset should be changed.

Change __my_cpu_offset definition to treat it as a "volatile variable",
in order to avoid such a mis-optimization.

Cc: stable@vger.kernel.org
Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Reported-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Xing Li <lixing@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Rui Wang <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 arch/loongarch/include/asm/percpu.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/percpu.h b/arch/loongarch/include/asm/percpu.h
index 9b36ac003f89..03b98491d301 100644
--- a/arch/loongarch/include/asm/percpu.h
+++ b/arch/loongarch/include/asm/percpu.h
@@ -29,7 +29,12 @@ static inline void set_my_cpu_offset(unsigned long off)
 	__my_cpu_offset = off;
 	csr_write64(off, PERCPU_BASE_KS);
 }
-#define __my_cpu_offset __my_cpu_offset
+
+#define __my_cpu_offset					\
+({							\
+	__asm__ __volatile__("":"+r"(__my_cpu_offset));	\
+	__my_cpu_offset;				\
+})
 
 #define PERCPU_OP(op, asm_op, c_op)					\
 static __always_inline unsigned long __percpu_##op(void *ptr,		\
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
  2024-03-15  2:45 [PATCH] LoongArch: Change __my_cpu_offset definition to avoid mis-optimization Huacai Chen
@ 2024-03-20 10:27 ` Xi Ruoyao
  2024-03-20 13:49   ` Huacai Chen
  0 siblings, 1 reply; 3+ messages in thread
From: Xi Ruoyao @ 2024-03-20 10:27 UTC (permalink / raw)
  To: Huacai Chen, Arnd Bergmann, Huacai Chen
  Cc: loongarch, linux-arch, Xuefeng Li, Guo Ren, Xuerui Wang,
	Jiaxun Yang, linux-kernel, loongson-kernel, stable, Xiaotian Wu,
	Miao Wang, Xing Li, Hongchen Zhang, Rui Wang

On Fri, 2024-03-15 at 10:45 +0800, Huacai Chen wrote:
> From GCC commit 3f13154553f8546a ("df-scan: remove ad-hoc handling of
> global regs in asms"), global registers will no longer be forced to add
> to the def-use chain. Then current_thread_info(), current_stack_pointer
> and __my_cpu_offset may be lifted out of the loop because they are no
> longer treated as "volatile variables".

Ooops...  I'm wondering why this issue has not blown up our systems
before.  The referred GCC commit is far before LoongArch CPUs are taped.

> This optimization is still correct for the current_thread_info() and
> current_stack_pointer usages because they are associated to a thread.
> However it is wrong for __my_cpu_offset because it is associated to a
> CPU rather than a thread: if the thread migrates to a different CPU in
> the loop, __my_cpu_offset should be changed.
> 
> Change __my_cpu_offset definition to treat it as a "volatile variable",
> in order to avoid such a mis-optimization.
> 
> Cc: stable@vger.kernel.org

I suppose we should add Fixes: 5b0b14e550a0 ("LoongArch: Add
atomic/locking header") here.

> Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn>
> Reported-by: Miao Wang <shankerwangmiao@gmail.com>
> Signed-off-by: Xing Li <lixing@loongson.cn>
> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> Signed-off-by: Rui Wang <wangrui@loongson.cn>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>  arch/loongarch/include/asm/percpu.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/loongarch/include/asm/percpu.h b/arch/loongarch/include/asm/percpu.h
> index 9b36ac003f89..03b98491d301 100644
> --- a/arch/loongarch/include/asm/percpu.h
> +++ b/arch/loongarch/include/asm/percpu.h
> @@ -29,7 +29,12 @@ static inline void set_my_cpu_offset(unsigned long off)
>  	__my_cpu_offset = off;
>  	csr_write64(off, PERCPU_BASE_KS);
>  }
> -#define __my_cpu_offset __my_cpu_offset
> +
> +#define __my_cpu_offset					\
> +({							\
> +	__asm__ __volatile__("":"+r"(__my_cpu_offset));	\
> +	__my_cpu_offset;				\
> +})
>  
>  #define PERCPU_OP(op, asm_op, c_op)					\
>  static __always_inline unsigned long __percpu_##op(void *ptr,		\

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
  2024-03-20 10:27 ` Xi Ruoyao
@ 2024-03-20 13:49   ` Huacai Chen
  0 siblings, 0 replies; 3+ messages in thread
From: Huacai Chen @ 2024-03-20 13:49 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Huacai Chen, Arnd Bergmann, loongarch, linux-arch, Xuefeng Li,
	Guo Ren, Xuerui Wang, Jiaxun Yang, linux-kernel, loongson-kernel,
	stable, Xiaotian Wu, Miao Wang, Xing Li, Hongchen Zhang,
	Rui Wang

Hi, Ruoyao,

On Wed, Mar 20, 2024 at 6:27 PM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Fri, 2024-03-15 at 10:45 +0800, Huacai Chen wrote:
> > From GCC commit 3f13154553f8546a ("df-scan: remove ad-hoc handling of
> > global regs in asms"), global registers will no longer be forced to add
> > to the def-use chain. Then current_thread_info(), current_stack_pointer
> > and __my_cpu_offset may be lifted out of the loop because they are no
> > longer treated as "volatile variables".
>
> Ooops...  I'm wondering why this issue has not blown up our systems
> before.  The referred GCC commit is far before LoongArch CPUs are taped.
>
> > This optimization is still correct for the current_thread_info() and
> > current_stack_pointer usages because they are associated to a thread.
> > However it is wrong for __my_cpu_offset because it is associated to a
> > CPU rather than a thread: if the thread migrates to a different CPU in
> > the loop, __my_cpu_offset should be changed.
> >
> > Change __my_cpu_offset definition to treat it as a "volatile variable",
> > in order to avoid such a mis-optimization.
> >
> > Cc: stable@vger.kernel.org
>
> I suppose we should add Fixes: 5b0b14e550a0 ("LoongArch: Add
> atomic/locking header") here.
You are right, but since this patch is in loongarch-next for the pull
request, I don't want to change things.

Huacai

>
> > Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn>
> > Reported-by: Miao Wang <shankerwangmiao@gmail.com>
> > Signed-off-by: Xing Li <lixing@loongson.cn>
> > Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> > Signed-off-by: Rui Wang <wangrui@loongson.cn>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >  arch/loongarch/include/asm/percpu.h | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/loongarch/include/asm/percpu.h b/arch/loongarch/include/asm/percpu.h
> > index 9b36ac003f89..03b98491d301 100644
> > --- a/arch/loongarch/include/asm/percpu.h
> > +++ b/arch/loongarch/include/asm/percpu.h
> > @@ -29,7 +29,12 @@ static inline void set_my_cpu_offset(unsigned long off)
> >       __my_cpu_offset = off;
> >       csr_write64(off, PERCPU_BASE_KS);
> >  }
> > -#define __my_cpu_offset __my_cpu_offset
> > +
> > +#define __my_cpu_offset                                      \
> > +({                                                   \
> > +     __asm__ __volatile__("":"+r"(__my_cpu_offset)); \
> > +     __my_cpu_offset;                                \
> > +})
> >
> >  #define PERCPU_OP(op, asm_op, c_op)                                  \
> >  static __always_inline unsigned long __percpu_##op(void *ptr,                \
>
> --
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-20 13:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-15  2:45 [PATCH] LoongArch: Change __my_cpu_offset definition to avoid mis-optimization Huacai Chen
2024-03-20 10:27 ` Xi Ruoyao
2024-03-20 13:49   ` Huacai Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).