All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
@ 2022-07-04 15:36 Qi Hu
  2022-07-05  8:46 ` WANG Xuerui
  0 siblings, 1 reply; 16+ messages in thread
From: Qi Hu @ 2022-07-04 15:36 UTC (permalink / raw)
  To: Huacai Chen, WANG Xuerui, Jiaxun Yang; +Cc: loongarch, linux-kernel

The `vcsr` is not used anymore. Remove this member from `loongarch_fpu`.

From 3A5000(LoongArch), `vcsr` is removed in hardware. FP and LSX/LASX
both use `fcsr` as their csr.

Particularly, fcsr from $r16 to $r31 are reserved for LSX/LASX, and
using the registers in this area will cause SXD/ASXD if LSX/LASX is
not enabled.

Signed-off-by: Qi Hu <huqi@loongson.cn>
---
V2:
- Add more details in the commit message.
---
 arch/loongarch/include/asm/fpregdef.h  |  1 -
 arch/loongarch/include/asm/processor.h |  2 --
 arch/loongarch/kernel/asm-offsets.c    |  1 -
 arch/loongarch/kernel/fpu.S            | 10 ----------
 4 files changed, 14 deletions(-)

diff --git a/arch/loongarch/include/asm/fpregdef.h b/arch/loongarch/include/asm/fpregdef.h
index adb16e4b4..b6be52783 100644
--- a/arch/loongarch/include/asm/fpregdef.h
+++ b/arch/loongarch/include/asm/fpregdef.h
@@ -48,6 +48,5 @@
 #define fcsr1	$r1
 #define fcsr2	$r2
 #define fcsr3	$r3
-#define vcsr16	$r16
 
 #endif /* _ASM_FPREGDEF_H */
diff --git a/arch/loongarch/include/asm/processor.h b/arch/loongarch/include/asm/processor.h
index 1d63c934b..57ec45aa0 100644
--- a/arch/loongarch/include/asm/processor.h
+++ b/arch/loongarch/include/asm/processor.h
@@ -80,7 +80,6 @@ BUILD_FPR_ACCESS(64)
 
 struct loongarch_fpu {
 	unsigned int	fcsr;
-	unsigned int	vcsr;
 	uint64_t	fcc;	/* 8x8 */
 	union fpureg	fpr[NUM_FPU_REGS];
 };
@@ -161,7 +160,6 @@ struct thread_struct {
 	 */							\
 	.fpu			= {				\
 		.fcsr		= 0,				\
-		.vcsr		= 0,				\
 		.fcc		= 0,				\
 		.fpr		= {{{0,},},},			\
 	},							\
diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
index bfb65eb28..20cd9e16a 100644
--- a/arch/loongarch/kernel/asm-offsets.c
+++ b/arch/loongarch/kernel/asm-offsets.c
@@ -166,7 +166,6 @@ void output_thread_fpu_defines(void)
 
 	OFFSET(THREAD_FCSR, loongarch_fpu, fcsr);
 	OFFSET(THREAD_FCC,  loongarch_fpu, fcc);
-	OFFSET(THREAD_VCSR, loongarch_fpu, vcsr);
 	BLANK();
 }
 
diff --git a/arch/loongarch/kernel/fpu.S b/arch/loongarch/kernel/fpu.S
index 75c6ce068..a631a7137 100644
--- a/arch/loongarch/kernel/fpu.S
+++ b/arch/loongarch/kernel/fpu.S
@@ -146,16 +146,6 @@
 	movgr2fcsr	fcsr0, \tmp0
 	.endm
 
-	.macro sc_save_vcsr base, tmp0
-	movfcsr2gr	\tmp0, vcsr16
-	EX	st.w \tmp0, \base, 0
-	.endm
-
-	.macro sc_restore_vcsr base, tmp0
-	EX	ld.w \tmp0, \base, 0
-	movgr2fcsr	vcsr16, \tmp0
-	.endm
-
 /*
  * Save a thread's fp context.
  */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-04 15:36 [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu Qi Hu
@ 2022-07-05  8:46 ` WANG Xuerui
  2022-07-05  9:49   ` Xi Ruoyao
  0 siblings, 1 reply; 16+ messages in thread
From: WANG Xuerui @ 2022-07-05  8:46 UTC (permalink / raw)
  To: Qi Hu, Huacai Chen, WANG Xuerui, Jiaxun Yang; +Cc: loongarch, linux-kernel

Hi,

I just noticed that the subject line is in Chinglish too. It should 
probably be just "LoongArch: remove mentions of vcsr" and without the 
trailing period.

On 2022/7/4 23:36, Qi Hu wrote:
> The `vcsr` is not used anymore. Remove this member from `loongarch_fpu`.
>
>  From 3A5000(LoongArch), `vcsr` is removed in hardware. FP and LSX/LASX
> both use `fcsr` as their csr.
>
> Particularly, fcsr from $r16 to $r31 are reserved for LSX/LASX, and
> using the registers in this area will cause SXD/ASXD if LSX/LASX is
> not enabled.

Actually I'm still not very satisfied with the explanation; the code 
must be written with *something* in mind, since GS464V/LA464 is the only 
LoongArch implementation so far, it must have a VCSR to begin with. And 
you can't magically melt the VCSR on the tens of thousands of 
3A5000/3C5000's already shipped, because the old-world kernel obviously 
comes with LSX/LASX and it most likely utilizes the VCSR. In addition, 
you didn't mention what will happen if LSX/LASX *is* enabled on this 
new-world kernel, *and* fcsr16 is being accessed.

I think maybe you just want to remove the mentions of VCSR since they 
are dead code right now, as I don't believe it's gone in shipped 
hardware, as explained above. Except if there's magically a way to 
implement LSX/LASX without touching the said-to-have-disappeared VCSR, 
which I don't know of, and can't know because the LSX/LASX ISA manual is 
still not publicly accessible; so I don't feel comfortable approving 
this patch.

(BTW, the $rXX-style reference to the FCSRs is obviously an 
implementation wart of the toolchain, the FCSR is nothing like GPR so it 
obviously shouldn't be referred to as one. Binutils patch will be out 
shortly.)

>
> Signed-off-by: Qi Hu <huqi@loongson.cn>
> ---
> V2:
> - Add more details in the commit message.
> ---
>   arch/loongarch/include/asm/fpregdef.h  |  1 -
>   arch/loongarch/include/asm/processor.h |  2 --
>   arch/loongarch/kernel/asm-offsets.c    |  1 -
>   arch/loongarch/kernel/fpu.S            | 10 ----------
>   4 files changed, 14 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/fpregdef.h b/arch/loongarch/include/asm/fpregdef.h
> index adb16e4b4..b6be52783 100644
> --- a/arch/loongarch/include/asm/fpregdef.h
> +++ b/arch/loongarch/include/asm/fpregdef.h
> @@ -48,6 +48,5 @@
>   #define fcsr1	$r1
>   #define fcsr2	$r2
>   #define fcsr3	$r3
> -#define vcsr16	$r16
>   
>   #endif /* _ASM_FPREGDEF_H */
> diff --git a/arch/loongarch/include/asm/processor.h b/arch/loongarch/include/asm/processor.h
> index 1d63c934b..57ec45aa0 100644
> --- a/arch/loongarch/include/asm/processor.h
> +++ b/arch/loongarch/include/asm/processor.h
> @@ -80,7 +80,6 @@ BUILD_FPR_ACCESS(64)
>   
>   struct loongarch_fpu {
>   	unsigned int	fcsr;
> -	unsigned int	vcsr;
>   	uint64_t	fcc;	/* 8x8 */
>   	union fpureg	fpr[NUM_FPU_REGS];
>   };
> @@ -161,7 +160,6 @@ struct thread_struct {
>   	 */							\
>   	.fpu			= {				\
>   		.fcsr		= 0,				\
> -		.vcsr		= 0,				\
>   		.fcc		= 0,				\
>   		.fpr		= {{{0,},},},			\
>   	},							\
> diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
> index bfb65eb28..20cd9e16a 100644
> --- a/arch/loongarch/kernel/asm-offsets.c
> +++ b/arch/loongarch/kernel/asm-offsets.c
> @@ -166,7 +166,6 @@ void output_thread_fpu_defines(void)
>   
>   	OFFSET(THREAD_FCSR, loongarch_fpu, fcsr);
>   	OFFSET(THREAD_FCC,  loongarch_fpu, fcc);
> -	OFFSET(THREAD_VCSR, loongarch_fpu, vcsr);
>   	BLANK();
>   }
>   
> diff --git a/arch/loongarch/kernel/fpu.S b/arch/loongarch/kernel/fpu.S
> index 75c6ce068..a631a7137 100644
> --- a/arch/loongarch/kernel/fpu.S
> +++ b/arch/loongarch/kernel/fpu.S
> @@ -146,16 +146,6 @@
>   	movgr2fcsr	fcsr0, \tmp0
>   	.endm
>   
> -	.macro sc_save_vcsr base, tmp0
> -	movfcsr2gr	\tmp0, vcsr16
> -	EX	st.w \tmp0, \base, 0
> -	.endm
> -
> -	.macro sc_restore_vcsr base, tmp0
> -	EX	ld.w \tmp0, \base, 0
> -	movgr2fcsr	vcsr16, \tmp0
> -	.endm
> -
>   /*
>    * Save a thread's fp context.
>    */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-05  8:46 ` WANG Xuerui
@ 2022-07-05  9:49   ` Xi Ruoyao
  2022-07-05 10:00     ` Xi Ruoyao
  0 siblings, 1 reply; 16+ messages in thread
From: Xi Ruoyao @ 2022-07-05  9:49 UTC (permalink / raw)
  To: WANG Xuerui, Qi Hu, Huacai Chen, Jiaxun Yang; +Cc: loongarch, linux-kernel

On Tue, 2022-07-05 at 16:46 +0800, WANG Xuerui wrote:

> Actually I'm still not very satisfied with the explanation; the code 
> must be written with *something* in mind, since GS464V/LA464 is the only 
> LoongArch implementation so far, it must have a VCSR to begin with. And 
> you can't magically melt the VCSR on the tens of thousands of 
> 3A5000/3C5000's already shipped, because the old-world kernel obviously 
> comes with LSX/LASX and it most likely utilizes the VCSR. In addition,
> you didn't mention what will happen if LSX/LASX *is* enabled on this 
> new-world kernel, *and* fcsr16 is being accessed.
> 
> I think maybe you just want to remove the mentions of VCSR since they 
> are dead code right now, as I don't believe it's gone in shipped 
> hardware, as explained above. Except if there's magically a way to 
> implement LSX/LASX without touching the said-to-have-disappeared VCSR,
> which I don't know of, and can't know because the LSX/LASX ISA manual is 
> still not publicly accessible; so I don't feel comfortable approving 
> this patch.

Let me make some bold guess here.  In the MIPS-compatible 3A4000 we had
"MSACSR" register.  According to the MSA manual, only the 24-th bit of
this register is used:

"Some MSA floating point instructions might not handle subnormal input
operands or compute tiny non-zero results. Such instructions may signal
the Unimplemented Operation Exception and let the software emulation
finalize the operation. If software emulation is not needed or desired,
MSACSR FS bit could be set to replace every tiny non-zero result and
subnormal input operand with zero of the same sign."

And, it says:

"Should the alternate exception handling attributes of the IEEE Standard
for Floating-Point Arithmetic 754-2008, Section 8 be desired, the MSACSR
FS bit should be zero, the Underflow Exception be enabled and a trap
handler be provided to carry out the execution of the alternate
exception handling attributes."

We can see Loongson has been trying to make 3A processors more IEEE-754
compatible.  For example, 3A4000 is the only non-R6 MIPS-compatible
processor using IEEE-754-2008 NaN encoding.  And LoongArch manual
directly refers to IEEE-754-2008 manual in many places.  So I guess this
change means Loongson won't set this bit to 1 for 3A5000 at all, and any
attempt by the user to set this bit could be considered undefined
behavior (causing inconsistent behavior on 3A5000 as the kernel doesn't
save/restore VCSR at context switch, and SIGILL on 3A6000 as the
register is removed).
-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-05  9:49   ` Xi Ruoyao
@ 2022-07-05 10:00     ` Xi Ruoyao
  2022-07-06  2:35       ` Huacai Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Xi Ruoyao @ 2022-07-05 10:00 UTC (permalink / raw)
  To: WANG Xuerui, Qi Hu, Huacai Chen, Jiaxun Yang; +Cc: loongarch, linux-kernel

On Tue, 2022-07-05 at 17:49 +0800, Xi Ruoyao wrote:
> On Tue, 2022-07-05 at 16:46 +0800, WANG Xuerui wrote:
> 
> > Actually I'm still not very satisfied with the explanation; the code
> > must be written with *something* in mind, since GS464V/LA464 is the only 
> > LoongArch implementation so far, it must have a VCSR to begin with. And 
> > you can't magically melt the VCSR on the tens of thousands of 
> > 3A5000/3C5000's already shipped, because the old-world kernel obviously 
> > comes with LSX/LASX and it most likely utilizes the VCSR. In addition,
> > you didn't mention what will happen if LSX/LASX *is* enabled on this
> > new-world kernel, *and* fcsr16 is being accessed.
> > 
> > I think maybe you just want to remove the mentions of VCSR since they 
> > are dead code right now, as I don't believe it's gone in shipped 
> > hardware, as explained above. Except if there's magically a way to 
> > implement LSX/LASX without touching the said-to-have-disappeared VCSR,
> > which I don't know of, and can't know because the LSX/LASX ISA manual is 
> > still not publicly accessible; so I don't feel comfortable approving
> > this patch.
> 
> Let me make some bold guess here.  In the MIPS-compatible 3A4000 we had
> "MSACSR" register.  According to the MSA manual, only the 24-th bit of
> this register is used:
> 
> "Some MSA floating point instructions might not handle subnormal input
> operands or compute tiny non-zero results. Such instructions may signal
> the Unimplemented Operation Exception and let the software emulation
> finalize the operation. If software emulation is not needed or desired,
> MSACSR FS bit could be set to replace every tiny non-zero result and
> subnormal input operand with zero of the same sign."
> 
> And, it says:
> 
> "Should the alternate exception handling attributes of the IEEE Standard
> for Floating-Point Arithmetic 754-2008, Section 8 be desired, the MSACSR
> FS bit should be zero, the Underflow Exception be enabled and a trap
> handler be provided to carry out the execution of the alternate
> exception handling attributes."
> 
> We can see Loongson has been trying to make 3A processors more IEEE-754
> compatible.  For example, 3A4000 is the only non-R6 MIPS-compatible
> processor using IEEE-754-2008 NaN encoding.  And LoongArch manual
> directly refers to IEEE-754-2008 manual in many places.  So I guess this
> change means Loongson won't set this bit to 1 for 3A5000 at all, and any
> attempt by the user to set this bit could be considered undefined
> behavior (causing inconsistent behavior on 3A5000 as the kernel doesn't
> save/restore VCSR at context switch, and SIGILL on 3A6000 as the
> register is removed).

P. S. I'm just playing an "intelligence game" writing the reply so maybe
I'm completely wrong.  And, it does not indicate any position of me or
my affiliation about some "controversial topics".  And, even if my guess
is correct, it's still better to add something like "the use of VCSR is
optional in LA464 and we've decided to remove it in future
architectures, so it's not supported to use it even on LA464" into the
commit message.
-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-05 10:00     ` Xi Ruoyao
@ 2022-07-06  2:35       ` Huacai Chen
  2022-07-06  2:51         ` Xi Ruoyao
  0 siblings, 1 reply; 16+ messages in thread
From: Huacai Chen @ 2022-07-06  2:35 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: WANG Xuerui, Qi Hu, Jiaxun Yang, loongarch, LKML

Hi, all,

On Tue, Jul 5, 2022 at 6:01 PM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Tue, 2022-07-05 at 17:49 +0800, Xi Ruoyao wrote:
> > On Tue, 2022-07-05 at 16:46 +0800, WANG Xuerui wrote:
> >
> > > Actually I'm still not very satisfied with the explanation; the code
> > > must be written with *something* in mind, since GS464V/LA464 is the only
> > > LoongArch implementation so far, it must have a VCSR to begin with. And
> > > you can't magically melt the VCSR on the tens of thousands of
> > > 3A5000/3C5000's already shipped, because the old-world kernel obviously
> > > comes with LSX/LASX and it most likely utilizes the VCSR. In addition,
> > > you didn't mention what will happen if LSX/LASX *is* enabled on this
> > > new-world kernel, *and* fcsr16 is being accessed.
> > >
> > > I think maybe you just want to remove the mentions of VCSR since they
> > > are dead code right now, as I don't believe it's gone in shipped
> > > hardware, as explained above. Except if there's magically a way to
> > > implement LSX/LASX without touching the said-to-have-disappeared VCSR,
> > > which I don't know of, and can't know because the LSX/LASX ISA manual is
> > > still not publicly accessible; so I don't feel comfortable approving
> > > this patch.
> >
> > Let me make some bold guess here.  In the MIPS-compatible 3A4000 we had
> > "MSACSR" register.  According to the MSA manual, only the 24-th bit of
> > this register is used:
> >
> > "Some MSA floating point instructions might not handle subnormal input
> > operands or compute tiny non-zero results. Such instructions may signal
> > the Unimplemented Operation Exception and let the software emulation
> > finalize the operation. If software emulation is not needed or desired,
> > MSACSR FS bit could be set to replace every tiny non-zero result and
> > subnormal input operand with zero of the same sign."
> >
> > And, it says:
> >
> > "Should the alternate exception handling attributes of the IEEE Standard
> > for Floating-Point Arithmetic 754-2008, Section 8 be desired, the MSACSR
> > FS bit should be zero, the Underflow Exception be enabled and a trap
> > handler be provided to carry out the execution of the alternate
> > exception handling attributes."
> >
> > We can see Loongson has been trying to make 3A processors more IEEE-754
> > compatible.  For example, 3A4000 is the only non-R6 MIPS-compatible
> > processor using IEEE-754-2008 NaN encoding.  And LoongArch manual
> > directly refers to IEEE-754-2008 manual in many places.  So I guess this
> > change means Loongson won't set this bit to 1 for 3A5000 at all, and any
> > attempt by the user to set this bit could be considered undefined
> > behavior (causing inconsistent behavior on 3A5000 as the kernel doesn't
> > save/restore VCSR at context switch, and SIGILL on 3A6000 as the
> > register is removed).
>
> P. S. I'm just playing an "intelligence game" writing the reply so maybe
> I'm completely wrong.  And, it does not indicate any position of me or
> my affiliation about some "controversial topics".  And, even if my guess
> is correct, it's still better to add something like "the use of VCSR is
> optional in LA464 and we've decided to remove it in future
> architectures, so it's not supported to use it even on LA464" into the
> commit message.

Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
surely be upstream, this has nothing to do with cleanup VCSR16.
Because FP/LSX/LASX share the same control bits in FCSR now. Am I
right? Qi Hu?

If I'm right, Qi Hu, please tell us what control bits were in VCSR16,
and where their replacements in FCSR are. Thanks.

Huacai

> --
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-06  2:35       ` Huacai Chen
@ 2022-07-06  2:51         ` Xi Ruoyao
  2022-07-06  4:00           ` Qi Hu
  0 siblings, 1 reply; 16+ messages in thread
From: Xi Ruoyao @ 2022-07-06  2:51 UTC (permalink / raw)
  To: Huacai Chen; +Cc: WANG Xuerui, Qi Hu, Jiaxun Yang, loongarch, LKML

On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:

> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
> surely be upstream, this has nothing to do with cleanup VCSR16.
> Because FP/LSX/LASX share the same control bits in FCSR now.

My guess:

Almost all behavior of vector unit is controlled by FCSR (for example,
the rounding of both FPU and vector unit should be controlled by FCSR
altogether), except one bit similar to the bit 24 of MSACSR ("flush to
zero") is in VCSR [^1].  And "flush to zero" is not really useful so it
will be removed in 3A6000, and we'll not use it for 3A5000.

[^1]: A more bold guess: the hardware engineers could have just said
"let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
LA464, maybe it will be useful and who knows?"  But now in practice it's
not useful.

Am I correct?


-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-06  2:51         ` Xi Ruoyao
@ 2022-07-06  4:00           ` Qi Hu
  2022-07-06 20:49             ` Jiaxun Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Qi Hu @ 2022-07-06  4:00 UTC (permalink / raw)
  To: Xi Ruoyao, Huacai Chen; +Cc: WANG Xuerui, Jiaxun Yang, loongarch, LKML


On 2022/7/6 10:51, Xi Ruoyao wrote:
> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>
>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>> surely be upstream, this has nothing to do with cleanup VCSR16.
>> Because FP/LSX/LASX share the same control bits in FCSR now.
> My guess:
>
> Almost all behavior of vector unit is controlled by FCSR (for example,
> the rounding of both FPU and vector unit should be controlled by FCSR
> altogether), except one bit similar to the bit 24 of MSACSR ("flush to
> zero") is in VCSR [^1].  And "flush to zero" is not really useful so it
> will be removed in 3A6000, and we'll not use it for 3A5000.
Actually, flush to zero has been removed in 3A5000.
>
> [^1]: A more bold guess: the hardware engineers could have just said
> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
> LA464, maybe it will be useful and who knows?"  But now in practice it's
> not useful.
>
> Am I correct?
The hardware(LA464) has removed the vcsr("has but not use" is 
incorrect), and here are some details:

- For all FP operations, including LSX/LASX, they are controlled by 
fcsr0/1/2/3.

- For LSX/LASX other operations, they are *not* controlled by any other 
CSR now. And fcsr16 to fcsr31 are reserved to control these operations 
(now they are *undefined*).

- Flush to zero(MSACSR.FS) is removed and not supported.

- If you use "movfcsr2gr" to read the fcsr16, the value is *UNDEFINED*.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-06  4:00           ` Qi Hu
@ 2022-07-06 20:49             ` Jiaxun Yang
  2022-07-07  1:29               ` Qi Hu
  0 siblings, 1 reply; 16+ messages in thread
From: Jiaxun Yang @ 2022-07-06 20:49 UTC (permalink / raw)
  To: Qi Hu, Xi Ruoyao, Huacai Chen; +Cc: Xuerui Wang, loongarch, LKML



在2022年7月6日七月 上午5:00,Qi Hu写道:
> On 2022/7/6 10:51, Xi Ruoyao wrote:
>> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>>
>>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>>> surely be upstream, this has nothing to do with cleanup VCSR16.
>>> Because FP/LSX/LASX share the same control bits in FCSR now.
>> My guess:
>>
>> Almost all behavior of vector unit is controlled by FCSR (for example,
>> the rounding of both FPU and vector unit should be controlled by FCSR
>> altogether), except one bit similar to the bit 24 of MSACSR ("flush to
>> zero") is in VCSR [^1].  And "flush to zero" is not really useful so it
>> will be removed in 3A6000, and we'll not use it for 3A5000.
> Actually, flush to zero has been removed in 3A5000.
>>
>> [^1]: A more bold guess: the hardware engineers could have just said
>> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
>> LA464, maybe it will be useful and who knows?"  But now in practice it's
>> not useful.
>>
>> Am I correct?
> The hardware(LA464) has removed the vcsr("has but not use" is 
> incorrect), and here are some details:
>
> - For all FP operations, including LSX/LASX, they are controlled by 
> fcsr0/1/2/3.
>
> - For LSX/LASX other operations, they are *not* controlled by any other 
> CSR now. And fcsr16 to fcsr31 are reserved to control these operations 
> (now they are *undefined*).
Sorry but what do you meant by “these” here?
If it means LSX/LASX, are you trying to say that future chip’s LSX/LASX won’t be
compatible with present 3A5000? As your said fcsr16 and fcsr31 are undefined
for now.

Thanks
-

>
> - Flush to zero(MSACSR.FS) is removed and not supported.
>
> - If you use "movfcsr2gr" to read the fcsr16, the value is *UNDEFINED*.

-- 
- Jiaxun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-06 20:49             ` Jiaxun Yang
@ 2022-07-07  1:29               ` Qi Hu
  2022-07-07  3:05                 ` WANG Xuerui
  0 siblings, 1 reply; 16+ messages in thread
From: Qi Hu @ 2022-07-07  1:29 UTC (permalink / raw)
  To: Jiaxun Yang, Xi Ruoyao, Huacai Chen; +Cc: Xuerui Wang, loongarch, LKML


On 2022/7/7 04:49, Jiaxun Yang wrote:
>
> 在2022年7月6日七月 上午5:00,Qi Hu写道:
>> On 2022/7/6 10:51, Xi Ruoyao wrote:
>>> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>>>
>>>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>>>> surely be upstream, this has nothing to do with cleanup VCSR16.
>>>> Because FP/LSX/LASX share the same control bits in FCSR now.
>>> My guess:
>>>
>>> Almost all behavior of vector unit is controlled by FCSR (for example,
>>> the rounding of both FPU and vector unit should be controlled by FCSR
>>> altogether), except one bit similar to the bit 24 of MSACSR ("flush to
>>> zero") is in VCSR [^1].  And "flush to zero" is not really useful so it
>>> will be removed in 3A6000, and we'll not use it for 3A5000.
>> Actually, flush to zero has been removed in 3A5000.
>>> [^1]: A more bold guess: the hardware engineers could have just said
>>> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
>>> LA464, maybe it will be useful and who knows?"  But now in practice it's
>>> not useful.
>>>
>>> Am I correct?
>> The hardware(LA464) has removed the vcsr("has but not use" is
>> incorrect), and here are some details:
>>
>> - For all FP operations, including LSX/LASX, they are controlled by
>> fcsr0/1/2/3.
>>
>> - For LSX/LASX other operations, they are *not* controlled by any other
>> CSR now. And fcsr16 to fcsr31 are reserved to control these operations
>> (now they are *undefined*).
> Sorry but what do you meant by “these” here?
"These operations" means "LSX/LASX other operations", except its 
floating-point operations.
> If it means LSX/LASX, are you trying to say that future chip’s LSX/LASX won’t be
> compatible with present 3A5000? As your said fcsr16 and fcsr31 are undefined
> for now.
>
> Thanks
> -

"not compatible" is incorrect. If future chips add new features to 
define and use these registers, some bits in CPUCFG should be set, like 
CPUID in X86.

And at that time, if the applications do not use these new features(like 
some old apps), they can run at *default* state which is the same as 
current 3A5000.

So, "reserved" is just to prepare for adding something, not "incompatible".

Thanks.

>> - Flush to zero(MSACSR.FS) is removed and not supported.
>>
>> - If you use "movfcsr2gr" to read the fcsr16, the value is *UNDEFINED*.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-07  1:29               ` Qi Hu
@ 2022-07-07  3:05                 ` WANG Xuerui
  2022-07-07  4:04                   ` Xi Ruoyao
  2022-07-07  5:42                   ` Qi Hu
  0 siblings, 2 replies; 16+ messages in thread
From: WANG Xuerui @ 2022-07-07  3:05 UTC (permalink / raw)
  To: Qi Hu, Jiaxun Yang, Xi Ruoyao, Huacai Chen; +Cc: Xuerui Wang, loongarch, LKML

On 2022/7/7 09:29, Qi Hu wrote:
>
> On 2022/7/7 04:49, Jiaxun Yang wrote:
>>
>> 在2022年7月6日七月 上午5:00,Qi Hu写道:
>>> On 2022/7/6 10:51, Xi Ruoyao wrote:
>>>> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>>>>
>>>>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>>>>> surely be upstream, this has nothing to do with cleanup VCSR16.
>>>>> Because FP/LSX/LASX share the same control bits in FCSR now.
>>>> My guess:
>>>>
>>>> Almost all behavior of vector unit is controlled by FCSR (for example,
>>>> the rounding of both FPU and vector unit should be controlled by FCSR
>>>> altogether), except one bit similar to the bit 24 of MSACSR ("flush to
>>>> zero") is in VCSR [^1].  And "flush to zero" is not really useful 
>>>> so it
>>>> will be removed in 3A6000, and we'll not use it for 3A5000.
>>> Actually, flush to zero has been removed in 3A5000.
>>>> [^1]: A more bold guess: the hardware engineers could have just said
>>>> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
>>>> LA464, maybe it will be useful and who knows?"  But now in practice 
>>>> it's
>>>> not useful.
>>>>
>>>> Am I correct?
>>> The hardware(LA464) has removed the vcsr("has but not use" is
>>> incorrect), and here are some details:
>>>
>>> - For all FP operations, including LSX/LASX, they are controlled by
>>> fcsr0/1/2/3.
>>>
>>> - For LSX/LASX other operations, they are *not* controlled by any other
>>> CSR now. And fcsr16 to fcsr31 are reserved to control these operations
>>> (now they are *undefined*).
>> Sorry but what do you meant by “these” here?
> "These operations" means "LSX/LASX other operations", except its 
> floating-point operations.

This is getting hard to follow. Assuming the expression "LSX/LASX other 
operations" is Chinglish (it's certainly not proper English), I think 
you mean "the non-FP operations belonging to LSX/LASX" here right?

And it's strange, that these ops do exist in LSX/LASX, hence also 
present in 3A5000, but the control bits are undefined. How come this is 
even possible?

>> If it means LSX/LASX, are you trying to say that future chip’s 
>> LSX/LASX won’t be
>> compatible with present 3A5000? As your said fcsr16 and fcsr31 are 
>> undefined
>> for now.
>>
>> Thanks
>> -
>
> "not compatible" is incorrect. If future chips add new features to 
> define and use these registers, some bits in CPUCFG should be set, 
> like CPUID in X86.
>
> And at that time, if the applications do not use these new 
> features(like some old apps), they can run at *default* state which is 
> the same as current 3A5000.
>
> So, "reserved" is just to prepare for adding something, not 
> "incompatible".

To be frank, at this point I think you're trying to hide something. 
(This is not your fault, blame someone else of course because they told 
you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is 
certainly being saved/restored, and there's apparently no harm in doing 
so. And if the contents are indeed "undefined", why are the code there 
in the first place? Certainly the bits *are* meaningful, only that for 
some reason you aren't revealing the semantics and pretending that they 
are "undefined" and probably "do nothing externally observable" if 
accessed in the first place.

Then, either

(1) I am wrong and being hyper-aggressive here (and if it turns out to 
be the case, I'd immediately apologize for that), and the VCSR is indeed 
only meaningful for some early development steppings of 3A5000, and does 
nothing on the 3A5000 we all have now;

or, (2) the VCSR is still relevant for the "other" applicable LSX/LASX 
ops on 3A5000, but this patch is taking the VCSR out thus rendering such 
ops unreliable/incorrect in case they depend on the additional state.

Either way, the new-world kernel is handling LSX/LASX differently from 
the old-world, and I fear this would create unnecessary churn both in 
kernel and downstream projects already carrying LSX/LASX assembly, in 
case they actually used said ops. I can't rule out such possibility 
because at this time all LSX/LASX assembly was written by Loongson 
people with access to the manuals, and tested on old-world systems, so 
they might not be aware that some non-FP ops are going to magically 
change behavior on new-world systems because VCSR is suddenly not being 
saved/restored.

(Actually, people outside of Loongson might never know if (2) is the 
case, because the insns dependent on VCSR could be simply deleted from 
manuals, so they "don't exist" for outsiders, and "compatibility" of 
course doesn't suffer.)

>
> Thanks.
>
>>> - Flush to zero(MSACSR.FS) is removed and not supported.
>>>
>>> - If you use "movfcsr2gr" to read the fcsr16, the value is *UNDEFINED*.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-07  3:05                 ` WANG Xuerui
@ 2022-07-07  4:04                   ` Xi Ruoyao
  2022-07-07  4:22                     ` WANG Xuerui
  2022-07-07  5:42                   ` Qi Hu
  1 sibling, 1 reply; 16+ messages in thread
From: Xi Ruoyao @ 2022-07-07  4:04 UTC (permalink / raw)
  To: WANG Xuerui, Qi Hu, Jiaxun Yang, Huacai Chen; +Cc: loongarch, LKML

On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:

> To be frank, at this point I think you're trying to hide something. 
> (This is not your fault, blame someone else of course because they told 
> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is 
> certainly being saved/restored, and there's apparently no harm in doing 
> so. And if the contents are indeed "undefined", why are the code there
> in the first place? Certainly the bits *are* meaningful, only that for
> some reason you aren't revealing the semantics and pretending that they 
> are "undefined" and probably "do nothing externally observable" if 
> accessed in the first place.

On a 3A5000LL, I did an experiment via a kernel module, which enables
LSX/LASX and tries to write and read fcsr16.  I tried each bit (1, 2, 4,
8, ..., 1 << 31) one by one.  The result: no matter which bit I wrote
into fcsr16, I always read out 0.

And I've objdump'ed a kernel shipped in an early Loongnix release.  It
seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
instruction.
-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-07  4:04                   ` Xi Ruoyao
@ 2022-07-07  4:22                     ` WANG Xuerui
  2022-07-07  6:09                       ` Qi Hu
  0 siblings, 1 reply; 16+ messages in thread
From: WANG Xuerui @ 2022-07-07  4:22 UTC (permalink / raw)
  To: Xi Ruoyao, WANG Xuerui, Qi Hu, Jiaxun Yang, Huacai Chen; +Cc: loongarch, LKML

On 2022/7/7 12:04, Xi Ruoyao wrote:
> On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:
>
>> To be frank, at this point I think you're trying to hide something.
>> (This is not your fault, blame someone else of course because they told
>> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
>> certainly being saved/restored, and there's apparently no harm in doing
>> so. And if the contents are indeed "undefined", why are the code there
>> in the first place? Certainly the bits *are* meaningful, only that for
>> some reason you aren't revealing the semantics and pretending that they
>> are "undefined" and probably "do nothing externally observable" if
>> accessed in the first place.
> On a 3A5000LL, I did an experiment via a kernel module, which enables
> LSX/LASX and tries to write and read fcsr16.  I tried each bit (1, 2, 4,
> 8, ..., 1 << 31) one by one.  The result: no matter which bit I wrote
> into fcsr16, I always read out 0.
>
> And I've objdump'ed a kernel shipped in an early Loongnix release.  It
> seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
> instruction.

Hmm this is weird. I can't understand why the vcsr code was there in the 
first place then... I'd like to check a few Loongnix/Kylin/UOS kernels 
but currently I don't have the time.

If this is the case, indeed all vcsr-related code should be removed. 
Although I'm still not sure how to best word the commit message.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-07  3:05                 ` WANG Xuerui
  2022-07-07  4:04                   ` Xi Ruoyao
@ 2022-07-07  5:42                   ` Qi Hu
  1 sibling, 0 replies; 16+ messages in thread
From: Qi Hu @ 2022-07-07  5:42 UTC (permalink / raw)
  To: WANG Xuerui, Jiaxun Yang, Xi Ruoyao, Huacai Chen; +Cc: loongarch, LKML


On 2022/7/7 11:05, WANG Xuerui wrote:
> On 2022/7/7 09:29, Qi Hu wrote:
>>
>> On 2022/7/7 04:49, Jiaxun Yang wrote:
>>>
>>> 在2022年7月6日七月 上午5:00,Qi Hu写道:
>>>> On 2022/7/6 10:51, Xi Ruoyao wrote:
>>>>> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>>>>>
>>>>>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>>>>>> surely be upstream, this has nothing to do with cleanup VCSR16.
>>>>>> Because FP/LSX/LASX share the same control bits in FCSR now.
>>>>> My guess:
>>>>>
>>>>> Almost all behavior of vector unit is controlled by FCSR (for 
>>>>> example,
>>>>> the rounding of both FPU and vector unit should be controlled by FCSR
>>>>> altogether), except one bit similar to the bit 24 of MSACSR 
>>>>> ("flush to
>>>>> zero") is in VCSR [^1].  And "flush to zero" is not really useful 
>>>>> so it
>>>>> will be removed in 3A6000, and we'll not use it for 3A5000.
>>>> Actually, flush to zero has been removed in 3A5000.
>>>>> [^1]: A more bold guess: the hardware engineers could have just said
>>>>> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
>>>>> LA464, maybe it will be useful and who knows?"  But now in 
>>>>> practice it's
>>>>> not useful.
>>>>>
>>>>> Am I correct?
>>>> The hardware(LA464) has removed the vcsr("has but not use" is
>>>> incorrect), and here are some details:
>>>>
>>>> - For all FP operations, including LSX/LASX, they are controlled by
>>>> fcsr0/1/2/3.
>>>>
>>>> - For LSX/LASX other operations, they are *not* controlled by any 
>>>> other
>>>> CSR now. And fcsr16 to fcsr31 are reserved to control these operations
>>>> (now they are *undefined*).
>>> Sorry but what do you meant by “these” here?
>> "These operations" means "LSX/LASX other operations", except its 
>> floating-point operations.
>
> This is getting hard to follow. Assuming the expression "LSX/LASX 
> other operations" is Chinglish (it's certainly not proper English), I 
> think you mean "the non-FP operations belonging to LSX/LASX" here right?
That's right.
>
> And it's strange, that these ops do exist in LSX/LASX, hence also 
> present in 3A5000, but the control bits are undefined. How come this 
> is even possible?
The code is redundant, actually. Reading or writing the fcsr16 do not 
have any effect on LSX/LASX.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-07  4:22                     ` WANG Xuerui
@ 2022-07-07  6:09                       ` Qi Hu
  2022-07-07  6:24                         ` WANG Xuerui
  0 siblings, 1 reply; 16+ messages in thread
From: Qi Hu @ 2022-07-07  6:09 UTC (permalink / raw)
  To: WANG Xuerui, Xi Ruoyao, Jiaxun Yang, Huacai Chen; +Cc: loongarch, LKML


On 2022/7/7 12:22, WANG Xuerui wrote:
> On 2022/7/7 12:04, Xi Ruoyao wrote:
>> On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:
>>
>>> To be frank, at this point I think you're trying to hide something.
>>> (This is not your fault, blame someone else of course because they told
>>> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
>>> certainly being saved/restored, and there's apparently no harm in doing
>>> so. And if the contents are indeed "undefined", why are the code there
>>> in the first place? Certainly the bits *are* meaningful, only that for
>>> some reason you aren't revealing the semantics and pretending that they
>>> are "undefined" and probably "do nothing externally observable" if
>>> accessed in the first place.
>> On a 3A5000LL, I did an experiment via a kernel module, which enables
>> LSX/LASX and tries to write and read fcsr16.  I tried each bit (1, 2, 4,
>> 8, ..., 1 << 31) one by one.  The result: no matter which bit I wrote
>> into fcsr16, I always read out 0.
>>
>> And I've objdump'ed a kernel shipped in an early Loongnix release.  It
>> seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
>> instruction.
>
> Hmm this is weird. I can't understand why the vcsr code was there in 
> the first place then... I'd like to check a few Loongnix/Kylin/UOS 
> kernels but currently I don't have the time.
>
> If this is the case, indeed all vcsr-related code should be removed. 
> Although I'm still not sure how to best word the commit message.
>
Thanks for the Ruoyao's experiment.

Removing the vcsr is the first step to trying to support LBT and 
LSX/LASX in Kernel. In my opinion, the vcsr relevant code may be used 
for debugging and forgot to remove.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-07  6:09                       ` Qi Hu
@ 2022-07-07  6:24                         ` WANG Xuerui
  2022-07-07  6:39                           ` Qi Hu
  0 siblings, 1 reply; 16+ messages in thread
From: WANG Xuerui @ 2022-07-07  6:24 UTC (permalink / raw)
  To: Qi Hu, WANG Xuerui, Xi Ruoyao, Jiaxun Yang, Huacai Chen; +Cc: loongarch, LKML


On 2022/7/7 14:09, Qi Hu wrote:
>
> On 2022/7/7 12:22, WANG Xuerui wrote:
>> On 2022/7/7 12:04, Xi Ruoyao wrote:
>>> On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:
>>>
>>>> To be frank, at this point I think you're trying to hide something.
>>>> (This is not your fault, blame someone else of course because they 
>>>> told
>>>> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
>>>> certainly being saved/restored, and there's apparently no harm in 
>>>> doing
>>>> so. And if the contents are indeed "undefined", why are the code there
>>>> in the first place? Certainly the bits *are* meaningful, only that for
>>>> some reason you aren't revealing the semantics and pretending that 
>>>> they
>>>> are "undefined" and probably "do nothing externally observable" if
>>>> accessed in the first place.
>>> On a 3A5000LL, I did an experiment via a kernel module, which enables
>>> LSX/LASX and tries to write and read fcsr16.  I tried each bit (1, 
>>> 2, 4,
>>> 8, ..., 1 << 31) one by one.  The result: no matter which bit I wrote
>>> into fcsr16, I always read out 0.
>>>
>>> And I've objdump'ed a kernel shipped in an early Loongnix release.  It
>>> seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
>>> instruction.
>>
>> Hmm this is weird. I can't understand why the vcsr code was there in 
>> the first place then... I'd like to check a few Loongnix/Kylin/UOS 
>> kernels but currently I don't have the time.
>>
>> If this is the case, indeed all vcsr-related code should be removed. 
>> Although I'm still not sure how to best word the commit message.
>>
> Thanks for the Ruoyao's experiment.
>
> Removing the vcsr is the first step to trying to support LBT and 
> LSX/LASX in Kernel. In my opinion, the vcsr relevant code may be used 
> for debugging and forgot to remove.
>
Thinking about it harder, it actually makes sense. Given that access to 
the LSX/LASX manuals is currently restricted, outsiders can never know 
whether the code in question is really needed, so one has to err on the 
conservative side. Thanks for the clarification, and my apologies for 
being harsh in the previous reply.

I think the commit message could be reworded like:

"The VCSR (also known as FCSR16) is not present on retail steppings of 
3A5000. FCSR16 through FCSR31 are reserved for non-floating-point 
LSX/LASX operations, but on 3A5000 all writes to them are no-op and all 
reads return zero. FP LSX/LASX operations reuse the FCSR0 as their CSR.

Remove VCSR handling that is probably leftover debugging code for an 
earlier not-for-retail stepping."

(And remove the trailing period in your patch title, while at it; the 
Linux kernel doesn't use a trailing period for majority of its commits.)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.
  2022-07-07  6:24                         ` WANG Xuerui
@ 2022-07-07  6:39                           ` Qi Hu
  0 siblings, 0 replies; 16+ messages in thread
From: Qi Hu @ 2022-07-07  6:39 UTC (permalink / raw)
  To: WANG Xuerui, Xi Ruoyao, Jiaxun Yang, Huacai Chen; +Cc: loongarch, LKML


On 2022/7/7 14:24, WANG Xuerui wrote:
>
> On 2022/7/7 14:09, Qi Hu wrote:
>>
>> On 2022/7/7 12:22, WANG Xuerui wrote:
>>> On 2022/7/7 12:04, Xi Ruoyao wrote:
>>>> On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:
>>>>
>>>>> To be frank, at this point I think you're trying to hide something.
>>>>> (This is not your fault, blame someone else of course because they 
>>>>> told
>>>>> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
>>>>> certainly being saved/restored, and there's apparently no harm in 
>>>>> doing
>>>>> so. And if the contents are indeed "undefined", why are the code 
>>>>> there
>>>>> in the first place? Certainly the bits *are* meaningful, only that 
>>>>> for
>>>>> some reason you aren't revealing the semantics and pretending that 
>>>>> they
>>>>> are "undefined" and probably "do nothing externally observable" if
>>>>> accessed in the first place.
>>>> On a 3A5000LL, I did an experiment via a kernel module, which enables
>>>> LSX/LASX and tries to write and read fcsr16.  I tried each bit (1, 
>>>> 2, 4,
>>>> 8, ..., 1 << 31) one by one.  The result: no matter which bit I wrote
>>>> into fcsr16, I always read out 0.
>>>>
>>>> And I've objdump'ed a kernel shipped in an early Loongnix release.  It
>>>> seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
>>>> instruction.
>>>
>>> Hmm this is weird. I can't understand why the vcsr code was there in 
>>> the first place then... I'd like to check a few Loongnix/Kylin/UOS 
>>> kernels but currently I don't have the time.
>>>
>>> If this is the case, indeed all vcsr-related code should be removed. 
>>> Although I'm still not sure how to best word the commit message.
>>>
>> Thanks for the Ruoyao's experiment.
>>
>> Removing the vcsr is the first step to trying to support LBT and 
>> LSX/LASX in Kernel. In my opinion, the vcsr relevant code may be used 
>> for debugging and forgot to remove.
>>
> Thinking about it harder, it actually makes sense. Given that access 
> to the LSX/LASX manuals is currently restricted, outsiders can never 
> know whether the code in question is really needed, so one has to err 
> on the conservative side. Thanks for the clarification, and my 
> apologies for being harsh in the previous reply.
>
> I think the commit message could be reworded like:
>
> "The VCSR (also known as FCSR16) is not present on retail steppings of 
> 3A5000. FCSR16 through FCSR31 are reserved for non-floating-point 
> LSX/LASX operations, but on 3A5000 all writes to them are no-op and 
> all reads return zero. FP LSX/LASX operations reuse the FCSR0 as their 
> CSR.
>
> Remove VCSR handling that is probably leftover debugging code for an 
> earlier not-for-retail stepping."
>
> (And remove the trailing period in your patch title, while at it; the 
> Linux kernel doesn't use a trailing period for majority of its commits.)
>
Thanks for reminding me about this. It is my first time committing the 
patch to Linux Kernel, and the commit message will be modified at V4 patch.

-

Qi


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-07-07  6:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-04 15:36 [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu Qi Hu
2022-07-05  8:46 ` WANG Xuerui
2022-07-05  9:49   ` Xi Ruoyao
2022-07-05 10:00     ` Xi Ruoyao
2022-07-06  2:35       ` Huacai Chen
2022-07-06  2:51         ` Xi Ruoyao
2022-07-06  4:00           ` Qi Hu
2022-07-06 20:49             ` Jiaxun Yang
2022-07-07  1:29               ` Qi Hu
2022-07-07  3:05                 ` WANG Xuerui
2022-07-07  4:04                   ` Xi Ruoyao
2022-07-07  4:22                     ` WANG Xuerui
2022-07-07  6:09                       ` Qi Hu
2022-07-07  6:24                         ` WANG Xuerui
2022-07-07  6:39                           ` Qi Hu
2022-07-07  5:42                   ` Qi Hu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.