* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc @ 2019-01-11 12:40 徐成华 2019-01-11 12:45 ` huangpei ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: 徐成华 @ 2019-01-11 12:40 UTC (permalink / raw) To: paul.burton Cc: ysu, pburton, linux-mips, chenhc, zhangfx, wuzhangjin, linux-mips, 黄沛 Hi Paul Burton, For Loongson 3A1000 and 3A3000, when a memory access instruction (load, store, or prefetch)'s executing occurs between the execution of LL and SC, the success or failure of SC is not predictable. Although programmer would not insert memory access instructions between LL and SC, the memory instructions before LL in program-order, may dynamically executed between the execution of LL/SC, so a memory fence(SYNC) is needed before LL/LLD to avoid this situation. Since 3A3000, we improved our hardware design to handle this case. But we later deduce a rarely circumstance that some speculatively executed memory instructions due to branch misprediction between LL/SC still fall into the above case, so a memory fence(SYNC) at branch-target(if its target is not between LL/SC) is needed for 3A1000 and 3A3000. Our processor is continually evolving and we aim to to remove all these workaround-SYNCs around LL/SC for new-come processor. 北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Loongson Technology Corporation Limited, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-11 12:40 [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 徐成华 @ 2019-01-11 12:45 ` huangpei 2019-01-11 19:00 ` Paul Burton 2019-01-12 3:25 ` huangpei 2 siblings, 0 replies; 13+ messages in thread From: huangpei @ 2019-01-11 12:45 UTC (permalink / raw) To: 徐成华 Cc: paul.burton, ysu, pburton, linux-mips, chenhc, zhangfx, wuzhangjin, linux-mips Hi all, I will submit a new version patch, which fix this bug *sufficiently and exactly*. > -----原始邮件----- > 发件人: "徐成华" <xuchenghua@loongson.cn> > 发送时间: 2019-01-11 20:40:49 (星期五) > 收件人: paul.burton@mips.com > 抄送: ysu@wavecomp.com, pburton@wavecomp.com, linux-mips@vger.kernel.org, chenhc@lemote.com, zhangfx@lemote.com, wuzhangjin@gmail.com, linux-mips@linux-mips.org, "黄沛" <huangpei@loongson.cn> > 主题: Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc > > Hi Paul Burton, > > For Loongson 3A1000 and 3A3000, when a memory access instruction (load, store, or prefetch)'s executing occurs between the execution of LL and SC, the success or failure of SC is not predictable. Although programmer would not insert memory access instructions between LL and SC, the memory instructions before LL in program-order, may dynamically executed between the execution of LL/SC, so a memory fence(SYNC) is needed before LL/LLD to avoid this situation. > > Since 3A3000, we improved our hardware design to handle this case. But we later deduce a rarely circumstance that some speculatively executed memory instructions due to branch misprediction between LL/SC still fall into the above case, so a memory fence(SYNC) at branch-target(if its target is not between LL/SC) is needed for 3A1000 and 3A3000. > > Our processor is continually evolving and we aim to to remove all these workaround-SYNCs around LL/SC for new-come processor. > > 北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 > > This email and its attachments contain confidential information from Loongson > Technology Corporation Limited, which is intended only for the person or entity > whose address is listed above. Any use of the information contained herein in > any way (including, but not limited to, total or partial disclosure, > reproduction or dissemination) by persons other than the intended recipient(s) > is prohibited. If you receive this email in error, please notify the sender by > phone or email immediately and delete it. 北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Loongson Technology Corporation Limited, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-11 12:40 [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 徐成华 2019-01-11 12:45 ` huangpei @ 2019-01-11 19:00 ` Paul Burton 2019-01-12 8:02 ` 徐成华 2019-01-12 3:25 ` huangpei 2 siblings, 1 reply; 13+ messages in thread From: Paul Burton @ 2019-01-11 19:00 UTC (permalink / raw) To: 徐成华 Cc: Yunqiang Su, Paul Burton, linux-mips, chenhc, zhangfx, wuzhangjin, linux-mips, 黄沛 Hello, On Fri, Jan 11, 2019 at 08:40:49PM +0800, 徐成华 wrote: > For Loongson 3A1000 and 3A3000, when a memory access instruction > (load, store, or prefetch)'s executing occurs between the execution of > LL and SC, the success or failure of SC is not predictable. Although > programmer would not insert memory access instructions between LL and > SC, the memory instructions before LL in program-order, may > dynamically executed between the execution of LL/SC, so a memory > fence(SYNC) is needed before LL/LLD to avoid this situation. > > Since 3A3000, we improved our hardware design to handle this case. > But we later deduce a rarely circumstance that some speculatively > executed memory instructions due to branch misprediction between LL/SC > still fall into the above case, so a memory fence(SYNC) at > branch-target(if its target is not between LL/SC) is needed for 3A1000 > and 3A3000. Thank you - that description is really helpful. I have a few follow-up questions if you don't mind: 1) Is it correct to say that the only consequence of the bug is that an SC might fail when it ought to have succeeded? 2) Does that mean placing a sync before the LL is purely a performance optimization? ie. if we don't have the sync & the SC fails then we'll retry the LL/SC anyway, and this time not have the reordered instruction from before the LL to cause a problem. 3) In the speculative execution case would it also work to place a sync before the branch instruction, instead of at the branch target? In some cases this might be nicer since the workaround would be contained within the LL/SC loop, but I guess it could potentially add more overhead if the branch is conditional & not taken. 4) When we talk about branches here, is it really just branch instructions that are affected or will the CPU speculate past jump instructions too? I just want to be sure that we work around this properly, and document it in the kernel so that it's clear to developers why the workaround exists & how to avoid introducing bugs for these CPUs in future. > Our processor is continually evolving and we aim to to remove all > these workaround-SYNCs around LL/SC for new-come processor. I'm very glad to hear that :) I hope one day I can get my hands on a nice Loongson laptop to test with. Thanks, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-11 19:00 ` Paul Burton @ 2019-01-12 8:02 ` 徐成华 2019-01-12 8:19 ` huangpei 0 siblings, 1 reply; 13+ messages in thread From: 徐成华 @ 2019-01-12 8:02 UTC (permalink / raw) To: Paul Burton Cc: Yunqiang Su, Paul Burton, linux-mips, chenhc, zhangfx, wuzhangjin, linux-mips, 黄沛 > > For Loongson 3A1000 and 3A3000, when a memory access instruction > > (load, store, or prefetch)'s executing occurs between the execution of > > LL and SC, the success or failure of SC is not predictable. Although > > programmer would not insert memory access instructions between LL and > > SC, the memory instructions before LL in program-order, may > > dynamically executed between the execution of LL/SC, so a memory > > fence(SYNC) is needed before LL/LLD to avoid this situation. > > > > Since 3A3000, we improved our hardware design to handle this case. > > But we later deduce a rarely circumstance that some speculatively > > executed memory instructions due to branch misprediction between LL/SC > > still fall into the above case, so a memory fence(SYNC) at > > branch-target(if its target is not between LL/SC) is needed for 3A1000 > > and 3A3000. > > Thank you - that description is really helpful. > > I have a few follow-up questions if you don't mind: > > 1) Is it correct to say that the only consequence of the bug is that an > SC might fail when it ought to have succeeded? Unfortunately, the SC succeeded when it should fail that cause a functional error. > 2) Does that mean placing a sync before the LL is purely a performance > optimization? ie. if we don't have the sync & the SC fails then > we'll retry the LL/SC anyway, and this time not have the reordered > instruction from before the LL to cause a problem. It's functional bug not performance bug. > 3) In the speculative execution case would it also work to place a sync > before the branch instruction, instead of at the branch target? In > some cases this might be nicer since the workaround would be > contained within the LL/SC loop, but I guess it could potentially > add more overhead if the branch is conditional & not taken. Yes, it more overhead so we don't use that. > 4) When we talk about branches here, is it really just branch > instructions that are affected or will the CPU speculate past jump > instructions too? No, bug only expose when real program-order is still ll/sc, unconditional branch or jump is not really ll/sc, so it not affected. > I just want to be sure that we work around this properly, and document > it in the kernel so that it's clear to developers why the workaround > exists & how to avoid introducing bugs for these CPUs in future. > > > Our processor is continually evolving and we aim to to remove all > > these workaround-SYNCs around LL/SC for new-come processor. > > I'm very glad to hear that :) > > I hope one day I can get my hands on a nice Loongson laptop to test > with. We can ship one to you as a gift when the laptop is stable. > Thanks, > Paul -- 北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) 62546668传真: +86 (10) 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Loongson Technology Corporation Limited, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-12 8:02 ` 徐成华 @ 2019-01-12 8:19 ` huangpei 0 siblings, 0 replies; 13+ messages in thread From: huangpei @ 2019-01-12 8:19 UTC (permalink / raw) To: 徐成华 Cc: Paul Burton, Yunqiang Su, Paul Burton, linux-mips, chenhc, zhangfx, wuzhangjin, linux-mips On Sat, 12 Jan 2019 16:02:40 +0800 (GMT+08:00) 徐成华 <xuchenghua@loongson.cn> wrote: > > > For Loongson 3A1000 and 3A3000, when a memory access instruction > > > (load, store, or prefetch)'s executing occurs between the > > > execution of LL and SC, the success or failure of SC is not > > > predictable. Although programmer would not insert memory access > > > instructions between LL and SC, the memory instructions before LL > > > in program-order, may dynamically executed between the execution > > > of LL/SC, so a memory fence(SYNC) is needed before LL/LLD to > > > avoid this situation. > > > > > > Since 3A3000, we improved our hardware design to handle this case. > > > But we later deduce a rarely circumstance that some speculatively > > > executed memory instructions due to branch misprediction between > > > LL/SC still fall into the above case, so a memory fence(SYNC) at > > > branch-target(if its target is not between LL/SC) is needed for > > > 3A1000 and 3A3000. > > > > Thank you - that description is really helpful. > > > > I have a few follow-up questions if you don't mind: > > > > 1) Is it correct to say that the only consequence of the bug is > > that an SC might fail when it ought to have succeeded? here is an example: both cpu1 and cpu2 simutaneously run atomic_add by 1 on same variable, this bug cause both sc run by two cpus (in atomic_add) succeed at same time( sc return 1), and the variable is only added by 1, which is wrong and unacceptable.( it should be added by 2) I think sc do it wrong, instead of failing to to it; > > Unfortunately, the SC succeeded when it should fail that cause a > functional error. > > 2) Does that mean placing a sync before the LL is purely a > > performance optimization? ie. if we don't have the sync & the SC > > fails then we'll retry the LL/SC anyway, and this time not have the > > reordered instruction from before the LL to cause a problem. > > It's functional bug not performance bug. > > > 3) In the speculative execution case would it also work to place a > > sync before the branch instruction, instead of at the branch > > target? In some cases this might be nicer since the workaround > > would be contained within the LL/SC loop, but I guess it could > > potentially add more overhead if the branch is conditional & not > > taken. > > Yes, it more overhead so we don't use that. > > > 4) When we talk about branches here, is it really just branch > > instructions that are affected or will the CPU speculate past > > jump instructions too? > > No, bug only expose when real program-order is still ll/sc, > unconditional branch or jump is not really ll/sc, so it not affected. > > > I just want to be sure that we work around this properly, and > > document it in the kernel so that it's clear to developers why the > > workaround exists & how to avoid introducing bugs for these CPUs in > > future. > > > Our processor is continually evolving and we aim to to remove all > > > these workaround-SYNCs around LL/SC for new-come processor. > > > > I'm very glad to hear that :) > > > > I hope one day I can get my hands on a nice Loongson laptop to test > > with. > > We can ship one to you as a gift when the laptop is stable. > > > Thanks, > > Paul > > > -- > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-11 12:40 [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 徐成华 2019-01-11 12:45 ` huangpei 2019-01-11 19:00 ` Paul Burton @ 2019-01-12 3:25 ` huangpei 2019-01-12 3:41 ` Yunqiang Su 2 siblings, 1 reply; 13+ messages in thread From: huangpei @ 2019-01-12 3:25 UTC (permalink / raw) To: 徐成华 Cc: paul.burton, ysu, pburton, linux-mips, chenhc, zhangfx, wuzhangjin, linux-mips [-- Attachment #1: Type: text/plain, Size: 2533 bytes --] hi, this is the patch for ll/sc bug in Loongson3 based on Linux-4.20 (8fe28cb58bcb235034b64cbbb7550a8a43fd88be) +. it cover all loongson3 CPU; +. to fix the ll/sc bug *sufficiently and exactly*, this patch shows how many places need to touch +. it is built ok for on Loongson3 and Cavium/Octeon, old version is tested in high pressure test On Fri, 11 Jan 2019 20:40:49 +0800 (GMT+08:00) 徐成华 <xuchenghua@loongson.cn> wrote: > Hi Paul Burton, > > For Loongson 3A1000 and 3A3000, when a memory access instruction > (load, store, or prefetch)'s executing occurs between the execution > of LL and SC, the success or failure of SC is not predictable. > Although programmer would not insert memory access instructions > between LL and SC, the memory instructions before LL in > program-order, may dynamically executed between the execution of > LL/SC, so a memory fence(SYNC) is needed before LL/LLD to avoid this > situation. > > Since 3A3000, we improved our hardware design to handle this case. > But we later deduce a rarely circumstance that some speculatively > executed memory instructions due to branch misprediction between > LL/SC still fall into the above case, so a memory fence(SYNC) at > branch-target(if its target is not between LL/SC) is needed for > 3A1000 and 3A3000. > > Our processor is continually evolving and we aim to to remove all > these workaround-SYNCs around LL/SC for new-come processor. > > 北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) > 62546668传真: +86 (10) > 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 > 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 > > This email and its attachments contain confidential information from > Loongson Technology Corporation Limited, which is intended only for > the person or entity whose address is listed above. Any use of the > information contained herein in any way (including, but not limited > to, total or partial disclosure, reproduction or dissemination) by > persons other than the intended recipient(s) is prohibited. If you > receive this email in error, please notify the sender by phone or > email immediately and delete it. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-loongson64-add-helper-for-ll-sc-bugfix-in-loongson3.patch --] [-- Type: text/x-patch, Size: 1331 bytes --] From 510d8c6cce97c7fb62ee2bf81c1856438583c328 Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 09:37:18 +0800 Subject: [PATCH 1/3] loongson64: add helper for ll/sc bugfix in loongson3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit there is a bug in ll/sc operation on loongson 3, that it causes two concurrent ll/sc on same variable both succeed, which is unacceptable clearly Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/barrier.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index a5eb1bb..fc21eb5 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -203,6 +203,16 @@ #define __WEAK_LLSC_MB " \n" #endif +#if defined(CONFIG_CPU_LOONGSON3) +#define __LS3A_WAR_LLSC " .set mips64r2\nsynci 0\n.set mips0\n" +#define __ls3a_war_llsc() __asm__ __volatile__("synci 0" : : :"memory") +#define __LS_WAR_LLSC " .set mips3\nsync\n.set mips0\n" +#else +#define __LS3A_WAR_LLSC +#define __ls3a_war_llsc() +#define __LS_WAR_LLSC +#endif + #define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory") #ifdef CONFIG_CPU_CAVIUM_OCTEON -- 2.7.4 [-- Attachment #3: 0002-loongson64-fix-ll-sc-bug-of-loongson3-in-inline-asm.patch --] [-- Type: text/x-patch, Size: 8604 bytes --] From ebb19370348b0b3f66baeec314b330abc879b91e Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 09:40:31 +0800 Subject: [PATCH 2/3] loongson64: fix ll/sc bug of loongson3 in inline asm +. without __LS3A_WAR_LLSC before ll, and __LS_WAR_LLSC before target from branch ins between ll and sc, two ll/sc operation on same variable can success both, which is clearly wrong. +. __LS3A_WAR_LLSC is needed for Loongson 3 CPU before 3A2000(NOT including 3A2000) +. __LS_WAR_LLSC is needed all Looongson 3 CPU +. old patch fix cmpxchg.h, but now smp_mb__before_llsc and smp_llsc_mb in cmpxchg.h is enought +. change __WEAK_LLSC_MB in futex.h to support same function as __LS_WAR_LLSC Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/atomic.h | 6 ++++++ arch/mips/include/asm/bitops.h | 6 ++++++ arch/mips/include/asm/edac.h | 1 + arch/mips/include/asm/futex.h | 4 +++- arch/mips/include/asm/local.h | 2 ++ arch/mips/include/asm/pgtable.h | 2 ++ arch/mips/kernel/syscall.c | 1 + 7 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h index d4ea7a5..ba48a50 100644 --- a/arch/mips/include/asm/atomic.h +++ b/arch/mips/include/asm/atomic.h @@ -59,6 +59,7 @@ static __inline__ void atomic_##op(int i, atomic_t * v) \ int temp; \ \ __asm__ __volatile__( \ + __LS3A_WAR_LLSC \ " .set "MIPS_ISA_LEVEL" \n" \ "1: ll %0, %1 # atomic_" #op " \n" \ " " #asm_op " %0, %2 \n" \ @@ -86,6 +87,7 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3A_WAR_LLSC \ "1: ll %1, %2 # atomic_" #op "_return \n" \ " " #asm_op " %0, %1, %3 \n" \ " sc %0, %2 \n" \ @@ -118,6 +120,7 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3A_WAR_LLSC \ "1: ll %1, %2 # atomic_fetch_" #op " \n" \ " " #asm_op " %0, %1, %3 \n" \ " sc %0, %2 \n" \ @@ -253,6 +256,7 @@ static __inline__ void atomic64_##op(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3A_WAR_LLSC \ "1: lld %0, %1 # atomic64_" #op " \n" \ " " #asm_op " %0, %2 \n" \ " scd %0, %1 \n" \ @@ -279,6 +283,7 @@ static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3A_WAR_LLSC \ "1: lld %1, %2 # atomic64_" #op "_return\n" \ " " #asm_op " %0, %1, %3 \n" \ " scd %0, %2 \n" \ @@ -311,6 +316,7 @@ static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3A_WAR_LLSC \ "1: lld %1, %2 # atomic64_fetch_" #op "\n" \ " " #asm_op " %0, %1, %3 \n" \ " scd %0, %2 \n" \ diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h index da1b8718..ba50277 100644 --- a/arch/mips/include/asm/bitops.h +++ b/arch/mips/include/asm/bitops.h @@ -68,6 +68,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr) : "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)); #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) } else if (kernel_uses_llsc && __builtin_constant_p(bit)) { + __ls3a_war_llsc(); do { __asm__ __volatile__( " " __LL "%0, %1 # set_bit \n" @@ -78,6 +79,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr) } while (unlikely(!temp)); #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */ } else if (kernel_uses_llsc) { + __ls3a_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -120,6 +122,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr) : "ir" (~(1UL << bit))); #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) } else if (kernel_uses_llsc && __builtin_constant_p(bit)) { + __ls3a_war_llsc(); do { __asm__ __volatile__( " " __LL "%0, %1 # clear_bit \n" @@ -130,6 +133,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr) } while (unlikely(!temp)); #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */ } else if (kernel_uses_llsc) { + __ls3a_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -188,6 +192,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr) unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG); unsigned long temp; + __ls3a_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -291,6 +296,7 @@ static inline int test_and_set_bit_lock(unsigned long nr, unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG); unsigned long temp; + __ls3a_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" diff --git a/arch/mips/include/asm/edac.h b/arch/mips/include/asm/edac.h index fc46776..9141fa2 100644 --- a/arch/mips/include/asm/edac.h +++ b/arch/mips/include/asm/edac.h @@ -22,6 +22,7 @@ static inline void edac_atomic_scrub(void *va, u32 size) __asm__ __volatile__ ( " .set mips2 \n" + __LS3A_WAR_LLSC "1: ll %0, %1 # edac_atomic_scrub \n" " addu %0, $0 \n" " sc %0, %1 \n" diff --git a/arch/mips/include/asm/futex.h b/arch/mips/include/asm/futex.h index a9e61ea..0706ac3 100644 --- a/arch/mips/include/asm/futex.h +++ b/arch/mips/include/asm/futex.h @@ -54,6 +54,7 @@ " .set push \n" \ " .set noat \n" \ " .set "MIPS_ISA_ARCH_LEVEL" \n" \ + __LS3A_WAR_LLSC \ "1: "user_ll("%1", "%4")" # __futex_atomic_op\n" \ " .set mips0 \n" \ " " insn " \n" \ @@ -167,6 +168,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, " .set push \n" " .set noat \n" " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3A_WAR_LLSC "1: "user_ll("%1", "%3")" \n" " bne %1, %z4, 3f \n" " .set mips0 \n" @@ -174,8 +176,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, " .set "MIPS_ISA_ARCH_LEVEL" \n" "2: "user_sc("$1", "%2")" \n" " beqz $1, 1b \n" - __WEAK_LLSC_MB "3: \n" + __WEAK_LLSC_MB " .insn \n" " .set pop \n" " .section .fixup,\"ax\" \n" diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h index ac8264e..afc64b2 100644 --- a/arch/mips/include/asm/local.h +++ b/arch/mips/include/asm/local.h @@ -50,6 +50,7 @@ static __inline__ long local_add_return(long i, local_t * l) __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3A_WAR_LLSC "1:" __LL "%1, %2 # local_add_return \n" " addu %0, %1, %3 \n" __SC "%0, %2 \n" @@ -95,6 +96,7 @@ static __inline__ long local_sub_return(long i, local_t * l) __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3A_WAR_LLSC "1:" __LL "%1, %2 # local_sub_return \n" " subu %0, %1, %3 \n" __SC "%0, %2 \n" diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 129e032..12a8217 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -233,6 +233,7 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) " .set "MIPS_ISA_ARCH_LEVEL" \n" " .set push \n" " .set noreorder \n" + __LS3A_WAR_LLSC "1:" __LL "%[tmp], %[buddy] \n" " bnez %[tmp], 2f \n" " or %[tmp], %[tmp], %[global] \n" @@ -240,6 +241,7 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) " beqz %[tmp], 1b \n" " nop \n" "2: \n" + __LS_WAR_LLSC " .set pop \n" " .set mips0 \n" : [buddy] "+m" (buddy->pte), [tmp] "=&r" (tmp) diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c index 69c17b5..b1a0fd3 100644 --- a/arch/mips/kernel/syscall.c +++ b/arch/mips/kernel/syscall.c @@ -135,6 +135,7 @@ static inline int mips_atomic_set(unsigned long addr, unsigned long new) " .set "MIPS_ISA_ARCH_LEVEL" \n" " li %[err], 0 \n" "1: \n" + __LS3A_WAR_LLSC user_ll("%[old]", "(%[addr])") " move %[tmp], %[new] \n" "2: \n" -- 2.7.4 [-- Attachment #4: 0003-loongson64-fix-ll-sc-bug-of-Loongson-3-in-handle_tlb.patch --] [-- Type: text/x-patch, Size: 4147 bytes --] From 724bbfa3b00accf55d64b19172569fd87c959802 Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 11:01:55 +0800 Subject: [PATCH 3/3] loongson64: fix ll/sc bug of Loongson 3 in handle_tlb{m,s,l} Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/mach-cavium-octeon/war.h | 1 + arch/mips/include/asm/mach-generic/war.h | 1 + arch/mips/include/asm/mach-loongson64/war.h | 26 ++++++++++++++++++++++++++ arch/mips/mm/tlbex.c | 13 +++++++++++++ 4 files changed, 41 insertions(+) create mode 100644 arch/mips/include/asm/mach-loongson64/war.h diff --git a/arch/mips/include/asm/mach-cavium-octeon/war.h b/arch/mips/include/asm/mach-cavium-octeon/war.h index 35c80be..1c43fb2 100644 --- a/arch/mips/include/asm/mach-cavium-octeon/war.h +++ b/arch/mips/include/asm/mach-cavium-octeon/war.h @@ -20,6 +20,7 @@ #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 #define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 0 #define MIPS34K_MISSED_ITLB_WAR 0 #define CAVIUM_OCTEON_DCACHE_PREFETCH_WAR \ diff --git a/arch/mips/include/asm/mach-generic/war.h b/arch/mips/include/asm/mach-generic/war.h index a1bc2e7..2dd9bf5 100644 --- a/arch/mips/include/asm/mach-generic/war.h +++ b/arch/mips/include/asm/mach-generic/war.h @@ -19,6 +19,7 @@ #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 #define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 0 #define MIPS34K_MISSED_ITLB_WAR 0 #endif /* __ASM_MACH_GENERIC_WAR_H */ diff --git a/arch/mips/include/asm/mach-loongson64/war.h b/arch/mips/include/asm/mach-loongson64/war.h new file mode 100644 index 0000000..9801760 --- /dev/null +++ b/arch/mips/include/asm/mach-loongson64/war.h @@ -0,0 +1,26 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * + * Copyright (C) 2019, by Huang Pei <huangpei@loongson.cn> + */ +#ifndef __ASM_LOONGSON64_MACH_WAR_H +#define __ASM_LOONGSON64_MACH_WAR_H + +#define R4600_V1_INDEX_ICACHEOP_WAR 0 +#define R4600_V1_HIT_CACHEOP_WAR 0 +#define R4600_V2_HIT_CACHEOP_WAR 0 +#define R5432_CP0_INTERRUPT_WAR 0 +#define BCM1250_M3_WAR 0 +#define SIBYTE_1956_WAR 0 +#define MIPS4K_ICACHE_REFILL_WAR 0 +#define MIPS_CACHE_SYNC_WAR 0 +#define TX49XX_ICACHE_INDEX_INV_WAR 0 +#define ICACHE_REFILLS_WORKAROUND_WAR 0 +#define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 1 +#define MIPS34K_MISSED_ITLB_WAR 0 + +#endif /* __ASM_LOONGSON64_MACH_WAR_H */ diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c index 0677142..51926ea 100644 --- a/arch/mips/mm/tlbex.c +++ b/arch/mips/mm/tlbex.c @@ -93,6 +93,11 @@ static inline int __maybe_unused r10000_llsc_war(void) return R10000_LLSC_WAR; } +static inline int __maybe_unused loongson_llsc_war(void) +{ + return LOONGSON_LLSC_WAR; +} + static int use_bbit_insns(void) { switch (current_cpu_type()) { @@ -1645,6 +1650,8 @@ static void iPTE_LW(u32 **p, unsigned int pte, unsigned int ptr) { #ifdef CONFIG_SMP + if (loongson_llsc_war()) + uasm_i_sync(p, STYPE_SYNC); # ifdef CONFIG_PHYS_ADDR_T_64BIT if (cpu_has_64bits) uasm_i_lld(p, pte, 0, ptr); @@ -2258,6 +2265,8 @@ static void build_r4000_tlb_load_handler(void) #endif uasm_l_nopage_tlbl(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_0 & 1) { @@ -2312,6 +2321,8 @@ static void build_r4000_tlb_store_handler(void) #endif uasm_l_nopage_tlbs(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_1 & 1) { @@ -2367,6 +2378,8 @@ static void build_r4000_tlb_modify_handler(void) #endif uasm_l_nopage_tlbm(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_1 & 1) { -- 2.7.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-12 3:25 ` huangpei @ 2019-01-12 3:41 ` Yunqiang Su 2019-01-12 6:21 ` huangpei 0 siblings, 1 reply; 13+ messages in thread From: Yunqiang Su @ 2019-01-12 3:41 UTC (permalink / raw) To: huangpei Cc: 徐成华, Paul Burton, Paul Burton, linux-mips, chenhc, zhangfx, wuzhangjin, linux-mips >+#define __LS3A_WAR_LLSC " .set mips64r2\nsynci 0\n.set mips0\n" >+#define __ls3a_war_llsc() __asm__ __volatile__("synci 0" : : :"memory”) 看起来这个只用于1000,所以我觉得名字应该是 __ls3x1k 或者类似的 俩下划线需要么? > smp_llsc_mb in cmpxchg.h is enought enought拼写错了 - __WEAK_LLSC_MB "3: \n" + __WEAK_LLSC_MB 这里可能会影响其他CPU的性能? #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 #define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 0 #define MIPS34K_MISSED_ITLB_WAR 0 这个应该要搞个CONFIG_啥啥啥 ?毕竟以后的芯片很可能没这问题了。 > 在 2019年1月12日,上午11:25,huangpei <huangpei@loongson.cn> 写道: > > hi, this is the patch for ll/sc bug in Loongson3 based on Linux-4.20 > (8fe28cb58bcb235034b64cbbb7550a8a43fd88be) > > +. it cover all loongson3 CPU; > > +. to fix the ll/sc bug *sufficiently and exactly*, this patch shows > how many places need to touch > > +. it is built ok for on Loongson3 and Cavium/Octeon, old version is > tested in high pressure test > > > On Fri, 11 Jan 2019 20:40:49 +0800 (GMT+08:00) > 徐成华 <xuchenghua@loongson.cn> wrote: > >> Hi Paul Burton, >> >> For Loongson 3A1000 and 3A3000, when a memory access instruction >> (load, store, or prefetch)'s executing occurs between the execution >> of LL and SC, the success or failure of SC is not predictable. >> Although programmer would not insert memory access instructions >> between LL and SC, the memory instructions before LL in >> program-order, may dynamically executed between the execution of >> LL/SC, so a memory fence(SYNC) is needed before LL/LLD to avoid this >> situation. >> >> Since 3A3000, we improved our hardware design to handle this case. >> But we later deduce a rarely circumstance that some speculatively >> executed memory instructions due to branch misprediction between >> LL/SC still fall into the above case, so a memory fence(SYNC) at >> branch-target(if its target is not between LL/SC) is needed for >> 3A1000 and 3A3000. >> >> Our processor is continually evolving and we aim to to remove all >> these workaround-SYNCs around LL/SC for new-come processor. >> >> 北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 (10) >> 62546668传真: +86 (10) >> 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 >> 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 >> >> This email and its attachments contain confidential information from >> Loongson Technology Corporation Limited, which is intended only for >> the person or entity whose address is listed above. Any use of the >> information contained herein in any way (including, but not limited >> to, total or partial disclosure, reproduction or dissemination) by >> persons other than the intended recipient(s) is prohibited. If you >> receive this email in error, please notify the sender by phone or >> email immediately and delete it. > <0001-loongson64-add-helper-for-ll-sc-bugfix-in-loongson3.patch><0002-loongson64-fix-ll-sc-bug-of-loongson3-in-inline-asm.patch><0003-loongson64-fix-ll-sc-bug-of-Loongson-3-in-handle_tlb.patch> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-12 3:41 ` Yunqiang Su @ 2019-01-12 6:21 ` huangpei 0 siblings, 0 replies; 13+ messages in thread From: huangpei @ 2019-01-12 6:21 UTC (permalink / raw) To: Yunqiang Su Cc: 徐成华, Paul Burton, linux-mips, chenhc, zhangfx, wuzhangjin, linux-mips [-- Attachment #1: Type: text/plain, Size: 18703 bytes --] this patch serial is meant to explain what need to do to fix this bug *sufficient and exactly*, which let us understand previous explanation about this bug better. From 9639d49b88d6b3e96b52ba23507819c7a790a330 Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 11:57:33 +0800 Subject: [PATCH 1/3] loongson64: add helper for ll/sc bugfix in Loongson 3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit there is a bug in ll/sc operation on Loongson 3, that it causes two concurrent ll/sc on same variable both succeed, which is unacceptable clearly Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/barrier.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index a5eb1bb..04b9e21 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -203,6 +203,16 @@ #define __WEAK_LLSC_MB " \n" #endif +#if defined(CONFIG_CPU_LOONGSON3) +#define __LS3_WAR_LLSC " .set mips3\nsync\n.set mips0\n" +#define __ls3_war_llsc() __asm__ __volatile__("sync" : : :"memory") +#define __LS_WAR_LLSC " .set mips3\nsync\n.set mips0\n" +#else +#define __LS3_WAR_LLSC +#define __ls3_war_llsc() +#define __LS_WAR_LLSC +#endif + #define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory") #ifdef CONFIG_CPU_CAVIUM_OCTEON -- 2.7.4 From 5bc7601982195c899fd8e3a5cf9a2ea1e8a326af Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 09:40:31 +0800 Subject: [PATCH 2/3] loongson64: fix ll/sc bug of Loongson 3 in inline asm +. without __LS3_WAR_LLSC before ll, and __LS_WAR_LLSC before target from branch ins between ll and sc, two ll/sc operation on same variable can success both, which is clearly wrong. +. __LS3_WAR_LLSC is needed for Loongson 3 CPU before 3A2000(NOT including 3A2000) +. __LS_WAR_LLSC is needed all Looongson 3 CPU +. old patch fix cmpxchg.h, but now smp_mb__before_llsc and smp_llsc_mb in cmpxchg.h is enough +. change __WEAK_LLSC_MB in futex.h to support same function as __LS_WAR_LLSC Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/atomic.h | 6 ++++++ arch/mips/include/asm/bitops.h | 6 ++++++ arch/mips/include/asm/edac.h | 1 + arch/mips/include/asm/futex.h | 4 +++- arch/mips/include/asm/local.h | 2 ++ arch/mips/include/asm/pgtable.h | 2 ++ arch/mips/kernel/syscall.c | 1 + 7 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h index d4ea7a5..29068ad 100644 --- a/arch/mips/include/asm/atomic.h +++ b/arch/mips/include/asm/atomic.h @@ -59,6 +59,7 @@ static __inline__ void atomic_##op(int i, atomic_t * v) \ int temp; \ \ __asm__ __volatile__( \ + __LS3_WAR_LLSC \ " .set "MIPS_ISA_LEVEL" \n" \ "1: ll %0, %1 # atomic_" #op " \n" \ " " #asm_op " %0, %2 \n" \ @@ -86,6 +87,7 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: ll %1, %2 # atomic_" #op "_return \n" \ " " #asm_op " %0, %1, %3 \n" \ " sc %0, %2 \n" \ @@ -118,6 +120,7 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: ll %1, %2 # atomic_fetch_" #op " \n" \ " " #asm_op " %0, %1, %3 \n" \ " sc %0, %2 \n" \ @@ -253,6 +256,7 @@ static __inline__ void atomic64_##op(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: lld %0, %1 # atomic64_" #op " \n" \ " " #asm_op " %0, %2 \n" \ " scd %0, %1 \n" \ @@ -279,6 +283,7 @@ static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: lld %1, %2 # atomic64_" #op "_return\n" \ " " #asm_op " %0, %1, %3 \n" \ " scd %0, %2 \n" \ @@ -311,6 +316,7 @@ static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: lld %1, %2 # atomic64_fetch_" #op "\n" \ " " #asm_op " %0, %1, %3 \n" \ " scd %0, %2 \n" \ diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h index da1b8718..075fc52 100644 --- a/arch/mips/include/asm/bitops.h +++ b/arch/mips/include/asm/bitops.h @@ -68,6 +68,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr) : "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)); #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) } else if (kernel_uses_llsc && __builtin_constant_p(bit)) { + __ls3_war_llsc(); do { __asm__ __volatile__( " " __LL "%0, %1 # set_bit \n" @@ -78,6 +79,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr) } while (unlikely(!temp)); #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */ } else if (kernel_uses_llsc) { + __ls3_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -120,6 +122,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr) : "ir" (~(1UL << bit))); #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) } else if (kernel_uses_llsc && __builtin_constant_p(bit)) { + __ls3_war_llsc(); do { __asm__ __volatile__( " " __LL "%0, %1 # clear_bit \n" @@ -130,6 +133,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr) } while (unlikely(!temp)); #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */ } else if (kernel_uses_llsc) { + __ls3_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -188,6 +192,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr) unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG); unsigned long temp; + __ls3_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -291,6 +296,7 @@ static inline int test_and_set_bit_lock(unsigned long nr, unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG); unsigned long temp; + __ls3_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" diff --git a/arch/mips/include/asm/edac.h b/arch/mips/include/asm/edac.h index fc46776..6cf3f3e 100644 --- a/arch/mips/include/asm/edac.h +++ b/arch/mips/include/asm/edac.h @@ -22,6 +22,7 @@ static inline void edac_atomic_scrub(void *va, u32 size) __asm__ __volatile__ ( " .set mips2 \n" + __LS3_WAR_LLSC "1: ll %0, %1 # edac_atomic_scrub \n" " addu %0, $0 \n" " sc %0, %1 \n" diff --git a/arch/mips/include/asm/futex.h b/arch/mips/include/asm/futex.h index a9e61ea..e390c68 100644 --- a/arch/mips/include/asm/futex.h +++ b/arch/mips/include/asm/futex.h @@ -54,6 +54,7 @@ " .set push \n" \ " .set noat \n" \ " .set "MIPS_ISA_ARCH_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: "user_ll("%1", "%4")" # __futex_atomic_op\n" \ " .set mips0 \n" \ " " insn " \n" \ @@ -167,6 +168,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, " .set push \n" " .set noat \n" " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3_WAR_LLSC "1: "user_ll("%1", "%3")" \n" " bne %1, %z4, 3f \n" " .set mips0 \n" @@ -174,8 +176,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, " .set "MIPS_ISA_ARCH_LEVEL" \n" "2: "user_sc("$1", "%2")" \n" " beqz $1, 1b \n" - __WEAK_LLSC_MB "3: \n" + __WEAK_LLSC_MB " .insn \n" " .set pop \n" " .section .fixup,\"ax\" \n" diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h index ac8264e..dea04b5 100644 --- a/arch/mips/include/asm/local.h +++ b/arch/mips/include/asm/local.h @@ -50,6 +50,7 @@ static __inline__ long local_add_return(long i, local_t * l) __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3_WAR_LLSC "1:" __LL "%1, %2 # local_add_return \n" " addu %0, %1, %3 \n" __SC "%0, %2 \n" @@ -95,6 +96,7 @@ static __inline__ long local_sub_return(long i, local_t * l) __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3_WAR_LLSC "1:" __LL "%1, %2 # local_sub_return \n" " subu %0, %1, %3 \n" __SC "%0, %2 \n" diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 129e032..6ceb49b 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -233,6 +233,7 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) " .set "MIPS_ISA_ARCH_LEVEL" \n" " .set push \n" " .set noreorder \n" + __LS3_WAR_LLSC "1:" __LL "%[tmp], %[buddy] \n" " bnez %[tmp], 2f \n" " or %[tmp], %[tmp], %[global] \n" @@ -240,6 +241,7 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) " beqz %[tmp], 1b \n" " nop \n" "2: \n" + __LS_WAR_LLSC " .set pop \n" " .set mips0 \n" : [buddy] "+m" (buddy->pte), [tmp] "=&r" (tmp) diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c index 69c17b5..25fad03 100644 --- a/arch/mips/kernel/syscall.c +++ b/arch/mips/kernel/syscall.c @@ -135,6 +135,7 @@ static inline int mips_atomic_set(unsigned long addr, unsigned long new) " .set "MIPS_ISA_ARCH_LEVEL" \n" " li %[err], 0 \n" "1: \n" + __LS3_WAR_LLSC user_ll("%[old]", "(%[addr])") " move %[tmp], %[new] \n" "2: \n" -- 2.7.4 From 3bc856aede2c9d1c495ae5c082c2a526ce7238db Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 11:01:55 +0800 Subject: [PATCH 3/3] loongson64: fix ll/sc bug of Loongson 3 in handle_tlb{m,s,l} Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/mach-cavium-octeon/war.h | 1 + arch/mips/include/asm/mach-generic/war.h | 1 + arch/mips/include/asm/mach-loongson64/war.h | 26 ++++++++++++++++++++++++++ arch/mips/mm/tlbex.c | 13 +++++++++++++ 4 files changed, 41 insertions(+) create mode 100644 arch/mips/include/asm/mach-loongson64/war.h diff --git a/arch/mips/include/asm/mach-cavium-octeon/war.h b/arch/mips/include/asm/mach-cavium-octeon/war.h index 35c80be..1c43fb2 100644 --- a/arch/mips/include/asm/mach-cavium-octeon/war.h +++ b/arch/mips/include/asm/mach-cavium-octeon/war.h @@ -20,6 +20,7 @@ #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 #define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 0 #define MIPS34K_MISSED_ITLB_WAR 0 #define CAVIUM_OCTEON_DCACHE_PREFETCH_WAR \ diff --git a/arch/mips/include/asm/mach-generic/war.h b/arch/mips/include/asm/mach-generic/war.h index a1bc2e7..2dd9bf5 100644 --- a/arch/mips/include/asm/mach-generic/war.h +++ b/arch/mips/include/asm/mach-generic/war.h @@ -19,6 +19,7 @@ #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 #define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 0 #define MIPS34K_MISSED_ITLB_WAR 0 #endif /* __ASM_MACH_GENERIC_WAR_H */ diff --git a/arch/mips/include/asm/mach-loongson64/war.h b/arch/mips/include/asm/mach-loongson64/war.h new file mode 100644 index 0000000..4eb57f6 --- /dev/null +++ b/arch/mips/include/asm/mach-loongson64/war.h @@ -0,0 +1,26 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * + * Copyright (C) 2019, by Huang Pei <huangpei@loongson.cn> + */ +#ifndef __ASM_LOONGSON64_MACH_WAR_H +#define __ASM_LOONGSON64_MACH_WAR_H + +#define R4600_V1_INDEX_ICACHEOP_WAR 0 +#define R4600_V1_HIT_CACHEOP_WAR 0 +#define R4600_V2_HIT_CACHEOP_WAR 0 +#define R5432_CP0_INTERRUPT_WAR 0 +#define BCM1250_M3_WAR 0 +#define SIBYTE_1956_WAR 0 +#define MIPS4K_ICACHE_REFILL_WAR 0 +#define MIPS_CACHE_SYNC_WAR 0 +#define TX49XX_ICACHE_INDEX_INV_WAR 0 +#define ICACHE_REFILLS_WORKAROUND_WAR 0 +#define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 1 +#define MIPS34K_MISSED_ITLB_WAR 0 + +#endif /* __ASM_LOONGSON64_MACH_WAR_H */ diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c index 0677142..51926ea 100644 --- a/arch/mips/mm/tlbex.c +++ b/arch/mips/mm/tlbex.c @@ -93,6 +93,11 @@ static inline int __maybe_unused r10000_llsc_war(void) return R10000_LLSC_WAR; } +static inline int __maybe_unused loongson_llsc_war(void) +{ + return LOONGSON_LLSC_WAR; +} + static int use_bbit_insns(void) { switch (current_cpu_type()) { @@ -1645,6 +1650,8 @@ static void iPTE_LW(u32 **p, unsigned int pte, unsigned int ptr) { #ifdef CONFIG_SMP + if (loongson_llsc_war()) + uasm_i_sync(p, STYPE_SYNC); # ifdef CONFIG_PHYS_ADDR_T_64BIT if (cpu_has_64bits) uasm_i_lld(p, pte, 0, ptr); @@ -2258,6 +2265,8 @@ static void build_r4000_tlb_load_handler(void) #endif uasm_l_nopage_tlbl(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_0 & 1) { @@ -2312,6 +2321,8 @@ static void build_r4000_tlb_store_handler(void) #endif uasm_l_nopage_tlbs(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_1 & 1) { @@ -2367,6 +2378,8 @@ static void build_r4000_tlb_modify_handler(void) #endif uasm_l_nopage_tlbm(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_1 & 1) { -- 2.7.4 On Sat, 12 Jan 2019 03:41:56 +0000 Yunqiang Su <ysu@wavecomp.com> wrote: > >+#define __LS3A_WAR_LLSC " .set > >mips64r2\nsynci 0\n.set mips0\n" +#define __ls3a_war_llsc() > >__asm__ __volatile__("synci 0" : : :"memory”) > > 看起来这个只用于1000,所以我觉得名字应该是 __ls3x1k 或者类似的 > 俩下划线需要么? fix it with __LS3_WAR_LLSC/__ls3_war_llsc(), only Loongson 3 CPU before 3A2000 need this; Loongosn 2K1000 *does not* need this, so use __LS3* same as __WEAK_LLSC_MB > > > smp_llsc_mb in cmpxchg.h is enought > enought拼写错了 fixed > > - __WEAK_LLSC_MB > "3: > \n" > + __WEAK_LLSC_MB > > 这里可能会影响其他CPU的性能? it is not the point, see commit msg > > > > #define TX49XX_ICACHE_INDEX_INV_WAR 0 > #define ICACHE_REFILLS_WORKAROUND_WAR 0 > #define R10000_LLSC_WAR 0 > +#define LOONGSON_LLSC_WAR 0 > #define MIPS34K_MISSED_ITLB_WAR 0 > > 这个应该要搞个CONFIG_啥啥啥 ?毕竟以后的芯片很可能没这问题了。 got it, but let's see any other suggestion. > > > > 在 2019年1月12日,上午11:25,huangpei <huangpei@loongson.cn> 写道: > > > > hi, this is the patch for ll/sc bug in Loongson3 based on Linux-4.20 > > (8fe28cb58bcb235034b64cbbb7550a8a43fd88be) > > > > +. it cover all loongson3 CPU; > > > > +. to fix the ll/sc bug *sufficiently and exactly*, this patch shows > > how many places need to touch > > > > +. it is built ok for on Loongson3 and Cavium/Octeon, old version is > > tested in high pressure test > > > > > > On Fri, 11 Jan 2019 20:40:49 +0800 (GMT+08:00) > > 徐成华 <xuchenghua@loongson.cn> wrote: > > > >> Hi Paul Burton, > >> > >> For Loongson 3A1000 and 3A3000, when a memory access instruction > >> (load, store, or prefetch)'s executing occurs between the execution > >> of LL and SC, the success or failure of SC is not predictable. > >> Although programmer would not insert memory access instructions > >> between LL and SC, the memory instructions before LL in > >> program-order, may dynamically executed between the execution of > >> LL/SC, so a memory fence(SYNC) is needed before LL/LLD to avoid > >> this situation. > >> > >> Since 3A3000, we improved our hardware design to handle this case. > >> But we later deduce a rarely circumstance that some speculatively > >> executed memory instructions due to branch misprediction between > >> LL/SC still fall into the above case, so a memory fence(SYNC) at > >> branch-target(if its target is not between LL/SC) is needed for > >> 3A1000 and 3A3000. > >> > >> Our processor is continually evolving and we aim to to remove all > >> these workaround-SYNCs around LL/SC for new-come processor. > >> > >> 北京市海淀区中关村环保科技示范园龙芯产业园2号楼 100095电话: +86 > >> (10) 62546668传真: +86 (10) > >> 62600826www.loongson.cn本邮件及其附件含有龙芯中科技术有限公司的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部 > >> 分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 > >> > >> This email and its attachments contain confidential information > >> from Loongson Technology Corporation Limited, which is intended > >> only for the person or entity whose address is listed above. Any > >> use of the information contained herein in any way (including, but > >> not limited to, total or partial disclosure, reproduction or > >> dissemination) by persons other than the intended recipient(s) is > >> prohibited. If you receive this email in error, please notify the > >> sender by phone or email immediately and delete it. > > <0001-loongson64-add-helper-for-ll-sc-bugfix-in-loongson3.patch><0002-loongson64-fix-ll-sc-bug-of-loongson3-in-inline-asm.patch><0003-loongson64-fix-ll-sc-bug-of-Loongson-3-in-handle_tlb.patch> > [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-loongson64-add-helper-for-ll-sc-bugfix-in-Loongson-3.patch --] [-- Type: text/x-patch, Size: 1319 bytes --] From 9639d49b88d6b3e96b52ba23507819c7a790a330 Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 11:57:33 +0800 Subject: [PATCH 1/3] loongson64: add helper for ll/sc bugfix in Loongson 3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit there is a bug in ll/sc operation on Loongson 3, that it causes two concurrent ll/sc on same variable both succeed, which is unacceptable clearly Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/barrier.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index a5eb1bb..04b9e21 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -203,6 +203,16 @@ #define __WEAK_LLSC_MB " \n" #endif +#if defined(CONFIG_CPU_LOONGSON3) +#define __LS3_WAR_LLSC " .set mips3\nsync\n.set mips0\n" +#define __ls3_war_llsc() __asm__ __volatile__("sync" : : :"memory") +#define __LS_WAR_LLSC " .set mips3\nsync\n.set mips0\n" +#else +#define __LS3_WAR_LLSC +#define __ls3_war_llsc() +#define __LS_WAR_LLSC +#endif + #define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory") #ifdef CONFIG_CPU_CAVIUM_OCTEON -- 2.7.4 [-- Attachment #3: 0002-loongson64-fix-ll-sc-bug-of-Loongson-3-in-inline-asm.patch --] [-- Type: text/x-patch, Size: 8583 bytes --] From 5bc7601982195c899fd8e3a5cf9a2ea1e8a326af Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 09:40:31 +0800 Subject: [PATCH 2/3] loongson64: fix ll/sc bug of Loongson 3 in inline asm +. without __LS3_WAR_LLSC before ll, and __LS_WAR_LLSC before target from branch ins between ll and sc, two ll/sc operation on same variable can success both, which is clearly wrong. +. __LS3_WAR_LLSC is needed for Loongson 3 CPU before 3A2000(NOT including 3A2000) +. __LS_WAR_LLSC is needed all Looongson 3 CPU +. old patch fix cmpxchg.h, but now smp_mb__before_llsc and smp_llsc_mb in cmpxchg.h is enough +. change __WEAK_LLSC_MB in futex.h to support same function as __LS_WAR_LLSC Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/atomic.h | 6 ++++++ arch/mips/include/asm/bitops.h | 6 ++++++ arch/mips/include/asm/edac.h | 1 + arch/mips/include/asm/futex.h | 4 +++- arch/mips/include/asm/local.h | 2 ++ arch/mips/include/asm/pgtable.h | 2 ++ arch/mips/kernel/syscall.c | 1 + 7 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h index d4ea7a5..29068ad 100644 --- a/arch/mips/include/asm/atomic.h +++ b/arch/mips/include/asm/atomic.h @@ -59,6 +59,7 @@ static __inline__ void atomic_##op(int i, atomic_t * v) \ int temp; \ \ __asm__ __volatile__( \ + __LS3_WAR_LLSC \ " .set "MIPS_ISA_LEVEL" \n" \ "1: ll %0, %1 # atomic_" #op " \n" \ " " #asm_op " %0, %2 \n" \ @@ -86,6 +87,7 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: ll %1, %2 # atomic_" #op "_return \n" \ " " #asm_op " %0, %1, %3 \n" \ " sc %0, %2 \n" \ @@ -118,6 +120,7 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: ll %1, %2 # atomic_fetch_" #op " \n" \ " " #asm_op " %0, %1, %3 \n" \ " sc %0, %2 \n" \ @@ -253,6 +256,7 @@ static __inline__ void atomic64_##op(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: lld %0, %1 # atomic64_" #op " \n" \ " " #asm_op " %0, %2 \n" \ " scd %0, %1 \n" \ @@ -279,6 +283,7 @@ static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: lld %1, %2 # atomic64_" #op "_return\n" \ " " #asm_op " %0, %1, %3 \n" \ " scd %0, %2 \n" \ @@ -311,6 +316,7 @@ static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v) \ \ __asm__ __volatile__( \ " .set "MIPS_ISA_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: lld %1, %2 # atomic64_fetch_" #op "\n" \ " " #asm_op " %0, %1, %3 \n" \ " scd %0, %2 \n" \ diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h index da1b8718..075fc52 100644 --- a/arch/mips/include/asm/bitops.h +++ b/arch/mips/include/asm/bitops.h @@ -68,6 +68,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr) : "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)); #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) } else if (kernel_uses_llsc && __builtin_constant_p(bit)) { + __ls3_war_llsc(); do { __asm__ __volatile__( " " __LL "%0, %1 # set_bit \n" @@ -78,6 +79,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr) } while (unlikely(!temp)); #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */ } else if (kernel_uses_llsc) { + __ls3_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -120,6 +122,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr) : "ir" (~(1UL << bit))); #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) } else if (kernel_uses_llsc && __builtin_constant_p(bit)) { + __ls3_war_llsc(); do { __asm__ __volatile__( " " __LL "%0, %1 # clear_bit \n" @@ -130,6 +133,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr) } while (unlikely(!temp)); #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */ } else if (kernel_uses_llsc) { + __ls3_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -188,6 +192,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr) unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG); unsigned long temp; + __ls3_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" @@ -291,6 +296,7 @@ static inline int test_and_set_bit_lock(unsigned long nr, unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG); unsigned long temp; + __ls3_war_llsc(); do { __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" diff --git a/arch/mips/include/asm/edac.h b/arch/mips/include/asm/edac.h index fc46776..6cf3f3e 100644 --- a/arch/mips/include/asm/edac.h +++ b/arch/mips/include/asm/edac.h @@ -22,6 +22,7 @@ static inline void edac_atomic_scrub(void *va, u32 size) __asm__ __volatile__ ( " .set mips2 \n" + __LS3_WAR_LLSC "1: ll %0, %1 # edac_atomic_scrub \n" " addu %0, $0 \n" " sc %0, %1 \n" diff --git a/arch/mips/include/asm/futex.h b/arch/mips/include/asm/futex.h index a9e61ea..e390c68 100644 --- a/arch/mips/include/asm/futex.h +++ b/arch/mips/include/asm/futex.h @@ -54,6 +54,7 @@ " .set push \n" \ " .set noat \n" \ " .set "MIPS_ISA_ARCH_LEVEL" \n" \ + __LS3_WAR_LLSC \ "1: "user_ll("%1", "%4")" # __futex_atomic_op\n" \ " .set mips0 \n" \ " " insn " \n" \ @@ -167,6 +168,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, " .set push \n" " .set noat \n" " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3_WAR_LLSC "1: "user_ll("%1", "%3")" \n" " bne %1, %z4, 3f \n" " .set mips0 \n" @@ -174,8 +176,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, " .set "MIPS_ISA_ARCH_LEVEL" \n" "2: "user_sc("$1", "%2")" \n" " beqz $1, 1b \n" - __WEAK_LLSC_MB "3: \n" + __WEAK_LLSC_MB " .insn \n" " .set pop \n" " .section .fixup,\"ax\" \n" diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h index ac8264e..dea04b5 100644 --- a/arch/mips/include/asm/local.h +++ b/arch/mips/include/asm/local.h @@ -50,6 +50,7 @@ static __inline__ long local_add_return(long i, local_t * l) __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3_WAR_LLSC "1:" __LL "%1, %2 # local_add_return \n" " addu %0, %1, %3 \n" __SC "%0, %2 \n" @@ -95,6 +96,7 @@ static __inline__ long local_sub_return(long i, local_t * l) __asm__ __volatile__( " .set "MIPS_ISA_ARCH_LEVEL" \n" + __LS3_WAR_LLSC "1:" __LL "%1, %2 # local_sub_return \n" " subu %0, %1, %3 \n" __SC "%0, %2 \n" diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 129e032..6ceb49b 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -233,6 +233,7 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) " .set "MIPS_ISA_ARCH_LEVEL" \n" " .set push \n" " .set noreorder \n" + __LS3_WAR_LLSC "1:" __LL "%[tmp], %[buddy] \n" " bnez %[tmp], 2f \n" " or %[tmp], %[tmp], %[global] \n" @@ -240,6 +241,7 @@ static inline void set_pte(pte_t *ptep, pte_t pteval) " beqz %[tmp], 1b \n" " nop \n" "2: \n" + __LS_WAR_LLSC " .set pop \n" " .set mips0 \n" : [buddy] "+m" (buddy->pte), [tmp] "=&r" (tmp) diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c index 69c17b5..25fad03 100644 --- a/arch/mips/kernel/syscall.c +++ b/arch/mips/kernel/syscall.c @@ -135,6 +135,7 @@ static inline int mips_atomic_set(unsigned long addr, unsigned long new) " .set "MIPS_ISA_ARCH_LEVEL" \n" " li %[err], 0 \n" "1: \n" + __LS3_WAR_LLSC user_ll("%[old]", "(%[addr])") " move %[tmp], %[new] \n" "2: \n" -- 2.7.4 [-- Attachment #4: 0003-loongson64-fix-ll-sc-bug-of-Loongson-3-in-handle_tlb.patch --] [-- Type: text/x-patch, Size: 4146 bytes --] From 3bc856aede2c9d1c495ae5c082c2a526ce7238db Mon Sep 17 00:00:00 2001 From: Huang Pei <huangpei@loongson.cn> Date: Sat, 12 Jan 2019 11:01:55 +0800 Subject: [PATCH 3/3] loongson64: fix ll/sc bug of Loongson 3 in handle_tlb{m,s,l} Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/include/asm/mach-cavium-octeon/war.h | 1 + arch/mips/include/asm/mach-generic/war.h | 1 + arch/mips/include/asm/mach-loongson64/war.h | 26 ++++++++++++++++++++++++++ arch/mips/mm/tlbex.c | 13 +++++++++++++ 4 files changed, 41 insertions(+) create mode 100644 arch/mips/include/asm/mach-loongson64/war.h diff --git a/arch/mips/include/asm/mach-cavium-octeon/war.h b/arch/mips/include/asm/mach-cavium-octeon/war.h index 35c80be..1c43fb2 100644 --- a/arch/mips/include/asm/mach-cavium-octeon/war.h +++ b/arch/mips/include/asm/mach-cavium-octeon/war.h @@ -20,6 +20,7 @@ #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 #define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 0 #define MIPS34K_MISSED_ITLB_WAR 0 #define CAVIUM_OCTEON_DCACHE_PREFETCH_WAR \ diff --git a/arch/mips/include/asm/mach-generic/war.h b/arch/mips/include/asm/mach-generic/war.h index a1bc2e7..2dd9bf5 100644 --- a/arch/mips/include/asm/mach-generic/war.h +++ b/arch/mips/include/asm/mach-generic/war.h @@ -19,6 +19,7 @@ #define TX49XX_ICACHE_INDEX_INV_WAR 0 #define ICACHE_REFILLS_WORKAROUND_WAR 0 #define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 0 #define MIPS34K_MISSED_ITLB_WAR 0 #endif /* __ASM_MACH_GENERIC_WAR_H */ diff --git a/arch/mips/include/asm/mach-loongson64/war.h b/arch/mips/include/asm/mach-loongson64/war.h new file mode 100644 index 0000000..4eb57f6 --- /dev/null +++ b/arch/mips/include/asm/mach-loongson64/war.h @@ -0,0 +1,26 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * + * Copyright (C) 2019, by Huang Pei <huangpei@loongson.cn> + */ +#ifndef __ASM_LOONGSON64_MACH_WAR_H +#define __ASM_LOONGSON64_MACH_WAR_H + +#define R4600_V1_INDEX_ICACHEOP_WAR 0 +#define R4600_V1_HIT_CACHEOP_WAR 0 +#define R4600_V2_HIT_CACHEOP_WAR 0 +#define R5432_CP0_INTERRUPT_WAR 0 +#define BCM1250_M3_WAR 0 +#define SIBYTE_1956_WAR 0 +#define MIPS4K_ICACHE_REFILL_WAR 0 +#define MIPS_CACHE_SYNC_WAR 0 +#define TX49XX_ICACHE_INDEX_INV_WAR 0 +#define ICACHE_REFILLS_WORKAROUND_WAR 0 +#define R10000_LLSC_WAR 0 +#define LOONGSON_LLSC_WAR 1 +#define MIPS34K_MISSED_ITLB_WAR 0 + +#endif /* __ASM_LOONGSON64_MACH_WAR_H */ diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c index 0677142..51926ea 100644 --- a/arch/mips/mm/tlbex.c +++ b/arch/mips/mm/tlbex.c @@ -93,6 +93,11 @@ static inline int __maybe_unused r10000_llsc_war(void) return R10000_LLSC_WAR; } +static inline int __maybe_unused loongson_llsc_war(void) +{ + return LOONGSON_LLSC_WAR; +} + static int use_bbit_insns(void) { switch (current_cpu_type()) { @@ -1645,6 +1650,8 @@ static void iPTE_LW(u32 **p, unsigned int pte, unsigned int ptr) { #ifdef CONFIG_SMP + if (loongson_llsc_war()) + uasm_i_sync(p, STYPE_SYNC); # ifdef CONFIG_PHYS_ADDR_T_64BIT if (cpu_has_64bits) uasm_i_lld(p, pte, 0, ptr); @@ -2258,6 +2265,8 @@ static void build_r4000_tlb_load_handler(void) #endif uasm_l_nopage_tlbl(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_0 & 1) { @@ -2312,6 +2321,8 @@ static void build_r4000_tlb_store_handler(void) #endif uasm_l_nopage_tlbs(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_1 & 1) { @@ -2367,6 +2378,8 @@ static void build_r4000_tlb_modify_handler(void) #endif uasm_l_nopage_tlbm(&l, p); + if (loongson_llsc_war()) + uasm_i_sync(&p, STYPE_SYNC); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_1 & 1) { -- 2.7.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc @ 2019-01-05 15:00 YunQiang Su 2019-01-09 22:08 ` Paul Burton 0 siblings, 1 reply; 13+ messages in thread From: YunQiang Su @ 2019-01-05 15:00 UTC (permalink / raw) To: pburton, linux-mips Cc: chehc, syq, zhangfx, wuzhangjin, linux-mips, YunQiang Su From: YunQiang Su <ysu@wavecomp.com> Loongson 2G/2H/3A/3B is quite weak sync'ed. If there is a branch, and the target is not in the scope of ll/sc or lld/scd, a sync is needed at the postion of target. Loongson doesn't plan to fix this problem in future, so we add the sync here for any condition. This is based on the patch from Chen Huacai. Signed-off-by: YunQiang Su <ysu@wavecomp.com> --- arch/mips/mm/tlbex.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c index 37b1cb246..08a9a66ef 100644 --- a/arch/mips/mm/tlbex.c +++ b/arch/mips/mm/tlbex.c @@ -932,6 +932,8 @@ build_get_pgd_vmalloc64(u32 **p, struct uasm_label **l, struct uasm_reloc **r, * to mimic that here by taking a load/istream page * fault. */ + if(current_cpu_type() == CPU_LOONGSON3) + uasm_i_sync(p, 0); UASM_i_LA(p, ptr, (unsigned long)tlb_do_page_fault_0); uasm_i_jr(p, ptr); @@ -1556,6 +1558,7 @@ static void build_loongson3_tlb_refill_handler(void) if (check_for_high_segbits) { uasm_l_large_segbits_fault(&l, p); + uasm_i_sync(&p, 0); UASM_i_LA(&p, K1, (unsigned long)tlb_do_page_fault_0); uasm_i_jr(&p, K1); uasm_i_nop(&p); @@ -2259,6 +2262,8 @@ static void build_r4000_tlb_load_handler(void) #endif uasm_l_nopage_tlbl(&l, p); + if(current_cpu_type() == CPU_LOONGSON3) + uasm_i_sync(&p, 0); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_0 & 1) { @@ -2313,6 +2318,8 @@ static void build_r4000_tlb_store_handler(void) #endif uasm_l_nopage_tlbs(&l, p); + if(current_cpu_type() == CPU_LOONGSON3) + uasm_i_sync(&p, 0); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_1 & 1) { @@ -2368,6 +2375,8 @@ static void build_r4000_tlb_modify_handler(void) #endif uasm_l_nopage_tlbm(&l, p); + if(current_cpu_type() == CPU_LOONGSON3) + uasm_i_sync(&p, 0); build_restore_work_registers(&p); #ifdef CONFIG_CPU_MICROMIPS if ((unsigned long)tlb_do_page_fault_1 & 1) { -- 2.20.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-05 15:00 YunQiang Su @ 2019-01-09 22:08 ` Paul Burton 2019-01-10 1:59 ` Yunqiang Su 0 siblings, 1 reply; 13+ messages in thread From: Paul Burton @ 2019-01-09 22:08 UTC (permalink / raw) To: YunQiang Su Cc: Paul Burton, linux-mips, chehc, zhangfx, wuzhangjin, linux-mips, Yunqiang Su Hi YunQiang, On Sat, Jan 05, 2019 at 11:00:36PM +0800, YunQiang Su wrote: > Loongson 2G/2H/3A/3B is quite weak sync'ed. If there is a branch, > and the target is not in the scope of ll/sc or lld/scd, a sync is > needed at the postion of target. OK, so is this the same issue that the second patch in the series is working around or a different one? I'm pretty confused at this point about what the actual bugs are in these various Loongson CPUs. Could someone provide an actual errata writeup describing the bugs in detail? What does "in the scope of ll/sc" mean? What happens if a branch target is not "in the scope of ll/sc"? How does the sync help? Are jumps affected, or just branches? Does this affect userland as well as the kernel? ...and probably more questions depending upon the answers to these ones. > Loongson doesn't plan to fix this problem in future, so we add the > sync here for any condition. So are you saying that future Loongson CPUs will all be buggy too, and someone there has said that they consider this to be OK..? I really really hope that is not true. If hardware people say they're not going to fix their bugs then working around them is definitely not going to be a priority. It's one thing if a CPU designer says "oops, my bad, work around this & I'll fix it next time". It's quite another for them to say they're not interested in fixing their bugs at all. Thanks, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-09 22:08 ` Paul Burton @ 2019-01-10 1:59 ` Yunqiang Su 2019-01-10 17:35 ` Paul Burton 0 siblings, 1 reply; 13+ messages in thread From: Yunqiang Su @ 2019-01-10 1:59 UTC (permalink / raw) To: Paul Burton Cc: YunQiang Su, linux-mips, chehc, zhangfx, wuzhangjin, linux-mips, paul.hua.gm > 在 2019年1月10日,上午6:08,Paul Burton <pburton@wavecomp.com> 写道: > > Hi YunQiang, > > On Sat, Jan 05, 2019 at 11:00:36PM +0800, YunQiang Su wrote: >> Loongson 2G/2H/3A/3B is quite weak sync'ed. If there is a branch, >> and the target is not in the scope of ll/sc or lld/scd, a sync is >> needed at the postion of target. > > OK, so is this the same issue that the second patch in the series is > working around or a different one? > > I'm pretty confused at this point about what the actual bugs are in > these various Loongson CPUs. Could someone provide an actual errata > writeup describing the bugs in detail? > > What does "in the scope of ll/sc" mean? > Loongson 3 series has some version, called, 1000, 2000, and 3000. There are 2 bugs all about LL/SC. Let’s call them bug-1 and bug-2. BUG-1: a `sync’ is needed before LL or LLD instruction. This bug appears on 1000 only, and I am sure that it has been fixed in 3000. BUG-2: if there is an branch instruction inside LL/SC, and the branch target is outside of the scope of LL/SC, a `sync’ is needed at the branch target. Aka, the first insn of the target branch should be `sync’. Loongson said that, we don’t plan fix this problem in short time before they Designe a totally new core. > What happens if a branch target is not "in the scope of ll/sc”? At least they said that there won’t be a problem > How does the sync help? > > Are jumps affected, or just branches? I am not sure, so CC a Loongson people. @Paul Hua > > Does this affect userland as well as the kernel? > There is few place can trigger these 2 bugs in kernel. In user land we have to workaround in binutils: https://www.sourceware.org/ml/binutils/2019-01/msg00025.html In fact the kernel is the easiest since we can have a flavor build for Loongson. > ...and probably more questions depending upon the answers to these ones. > >> Loongson doesn't plan to fix this problem in future, so we add the >> sync here for any condition. > > So are you saying that future Loongson CPUs will all be buggy too, and > someone there has said that they consider this to be OK..? I really > really hope that is not true. > Bug is bug. It is not OK. I blame these Loongson guys here. Some Loongson guys is not so normal people. Anyway they are a little more normal now, and anyway again, still abnormal. > If hardware people say they're not going to fix their bugs then working > around them is definitely not going to be a priority. It's one thing if > a CPU designer says "oops, my bad, work around this & I'll fix it next > time". It's quite another for them to say they're not interested in > fixing their bugs at all. They have interests, while I guess the true reason is that they have no enough people and money to desgin a core, while this bug is quilt hard to fix. > > Thanks, > Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-10 1:59 ` Yunqiang Su @ 2019-01-10 17:35 ` Paul Burton 2019-01-10 18:42 ` YunQiang Su 0 siblings, 1 reply; 13+ messages in thread From: Paul Burton @ 2019-01-10 17:35 UTC (permalink / raw) To: Yunqiang Su, YunQiang Su, paul.hua.gm Cc: Paul Burton, linux-mips, chehc, zhangfx, wuzhangjin, linux-mips Hi Yunqiang, On Wed, Jan 09, 2019 at 05:59:07PM -0800, Yunqiang Su wrote: > > 在 2019年1月10日,上午6:08,Paul Burton <pburton@wavecomp.com> 写道: > > On Sat, Jan 05, 2019 at 11:00:36PM +0800, YunQiang Su wrote: > >> Loongson 2G/2H/3A/3B is quite weak sync'ed. If there is a branch, > >> and the target is not in the scope of ll/sc or lld/scd, a sync is > >> needed at the postion of target. > > > > OK, so is this the same issue that the second patch in the series is > > working around or a different one? > > > > I'm pretty confused at this point about what the actual bugs are in > > these various Loongson CPUs. Could someone provide an actual errata > > writeup describing the bugs in detail? > > > > What does "in the scope of ll/sc" mean? > > Loongson 3 series has some version, called, 1000, 2000, and 3000. > > There are 2 bugs all about LL/SC. Let’s call them bug-1 and bug-2. > > BUG-1: a `sync’ is needed before LL or LLD instruction. > This bug appears on 1000 only, and I am sure that it has been fixed in 3000. > > BUG-2: if there is an branch instruction inside LL/SC, and the branch target is outside > of the scope of LL/SC, a `sync’ is needed at the branch target. > Aka, the first insn of the target branch should be `sync’. > Loongson said that, we don’t plan fix this problem in short time before they > Designe a totally new core. > > > > What happens if a branch target is not "in the scope of ll/sc”? > > At least they said that there won’t be a problem You still didn't define what "in the scope of ll/sc" means - I'm guessing that you're referring to a branch target as "in scope" if it is in between the ll & sc instructions (inclusive?). But this is just a guess & clarity from people who actually know would be helpful. And there must be a problem. The whole point of this is that there's a bug, right? If there's no problem then we don't need to do anything :) From a look at the GCC patch it talks about placing a sync at a branch target if it *is* in between an ll & sc [1], which I just can't reconcile with the phrase "outside of the scope of LL/SC". Is the problem when a branch target *is* in between an ll & sc, or when it *is not* between an ll & sc? Reading this kernel patch doesn't make it any clearer - for example the sync it emits in build_loongson3_tlb_refill_handler() is nowhere near an ll or sc instruction. Something doesn't add up here. > > How does the sync help? > > > > Are jumps affected, or just branches? > > I am not sure, so CC a Loongson people. > @Paul Hua Hi Paul - any help obtaining a detailed description of these bugs would be much appreciated. Even if you only have something in Chinese I can probably get someone to help translate. > > Does this affect userland as well as the kernel? > > There is few place can trigger these 2 bugs in kernel. > In user land we have to workaround in binutils: > https://www.sourceware.org/ml/binutils/2019-01/msg00025.html > > In fact the kernel is the easiest since we can have a flavor build for Loongson. My concern with regards to userland is that there's talk of a "deadlock" - if userland can hit this & the CPU actually stalls then the system is hopelessly vulnerable to denial of service from a malicious or buggy userland program, or simply an innocent program unaware of the errata. > > ...and probably more questions depending upon the answers to these ones. > > > >> Loongson doesn't plan to fix this problem in future, so we add the > >> sync here for any condition. > > > > So are you saying that future Loongson CPUs will all be buggy too, and > > someone there has said that they consider this to be OK..? I really > > really hope that is not true. > > Bug is bug. It is not OK. > I blame these Loongson guys here. > Some Loongson guys is not so normal people. > Anyway they are a little more normal now, and anyway again, still abnormal. > > > If hardware people say they're not going to fix their bugs then working > > around them is definitely not going to be a priority. It's one thing if > > a CPU designer says "oops, my bad, work around this & I'll fix it next > > time". It's quite another for them to say they're not interested in > > fixing their bugs at all. > > They have interests, while I guess the true reason is that they have no enough > people and money to desgin a core, while this bug is quilt hard to fix. I'm not sure I fully understand what you're saying above, but essentially I want to know that Loongson care about fixing their CPU bugs. If they don't, and the bugs are as bad as they sound, then in my view working around them will only reinforce that producing CPUs with such serious bugs is a good idea. So if anyone from Loongson is reading, I'd really like to hear that the above is a miscommunication & that you're not intending to knowingly design any further CPUs with these bugs. Thanks, Paul [1] https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01064.html ("Loongson3 need a sync before branch target that between ll and sc.") ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 2019-01-10 17:35 ` Paul Burton @ 2019-01-10 18:42 ` YunQiang Su 0 siblings, 0 replies; 13+ messages in thread From: YunQiang Su @ 2019-01-10 18:42 UTC (permalink / raw) To: Paul Burton Cc: Yunqiang Su, paul.hua.gm, Paul Burton, linux-mips, chehc, zhangfx, wuzhangjin, linux-mips Paul Burton <paul.burton@mips.com> 于2019年1月11日周五 上午1:35写道: > > Hi Yunqiang, > > On Wed, Jan 09, 2019 at 05:59:07PM -0800, Yunqiang Su wrote: > > > 在 2019年1月10日,上午6:08,Paul Burton <pburton@wavecomp.com> 写道: > > > On Sat, Jan 05, 2019 at 11:00:36PM +0800, YunQiang Su wrote: > > >> Loongson 2G/2H/3A/3B is quite weak sync'ed. If there is a branch, > > >> and the target is not in the scope of ll/sc or lld/scd, a sync is > > >> needed at the postion of target. > > > > > > OK, so is this the same issue that the second patch in the series is > > > working around or a different one? > > > > > > I'm pretty confused at this point about what the actual bugs are in > > > these various Loongson CPUs. Could someone provide an actual errata > > > writeup describing the bugs in detail? > > > > > > What does "in the scope of ll/sc" mean? > > > > Loongson 3 series has some version, called, 1000, 2000, and 3000. > > > > There are 2 bugs all about LL/SC. Let’s call them bug-1 and bug-2. > > > > BUG-1: a `sync’ is needed before LL or LLD instruction. > > This bug appears on 1000 only, and I am sure that it has been fixed in 3000. > > > > BUG-2: if there is an branch instruction inside LL/SC, and the branch target is outside > > of the scope of LL/SC, a `sync’ is needed at the branch target. > > Aka, the first insn of the target branch should be `sync’. > > Loongson said that, we don’t plan fix this problem in short time before they > > Designe a totally new core. > > > > > > > What happens if a branch target is not "in the scope of ll/sc”? > > > > At least they said that there won’t be a problem > > You still didn't define what "in the scope of ll/sc" means - I'm > guessing that you're referring to a branch target as "in scope" if it is > in between the ll & sc instructions (inclusive?). But this is just a > guess & clarity from people who actually know would be helpful. > Yes. your guess is correct. It is between. > And there must be a problem. The whole point of this is that there's a > bug, right? If there's no problem then we don't need to do anything :) > Sure. It is a problem. Some Loongson guys seem no dare to say out their CPU is buggy. > From a look at the GCC patch it talks about placing a sync at a branch > target if it *is* in between an ll & sc [1], which I just can't > reconcile with the phrase "outside of the scope of LL/SC". Is the > problem when a branch target *is* in between an ll & sc, or when it *is > not* between an ll & sc? This problem happens when: the branch insn like `beq' is between ll and sc AND the target of the branch insn is not between ll/sc > > Reading this kernel patch doesn't make it any clearer - for example the > sync it emits in build_loongson3_tlb_refill_handler() is nowhere near an > ll or sc instruction. Something doesn't add up here. > Loongson guys told me that, there is a branch insn between ll and sc may jump here. In fact I don't know where is the insn. > > > How does the sync help? > > > > > > Are jumps affected, or just branches? > > > > I am not sure, so CC a Loongson people. > > @Paul Hua > > Hi Paul - any help obtaining a detailed description of these bugs would > be much appreciated. Even if you only have something in Chinese I can > probably get someone to help translate. > > > > Does this affect userland as well as the kernel? > > > > There is few place can trigger these 2 bugs in kernel. > > In user land we have to workaround in binutils: > > https://www.sourceware.org/ml/binutils/2019-01/msg00025.html > > > > In fact the kernel is the easiest since we can have a flavor build for Loongson. > > My concern with regards to userland is that there's talk of a "deadlock" > - if userland can hit this & the CPU actually stalls then the system is > hopelessly vulnerable to denial of service from a malicious or buggy > userland program, or simply an innocent program unaware of the errata. > I have an Loongson 3A 3000 laptop. If without any workaround, the whole system hangs very frequently. With this patch, the whole system hangs rarely. Since the bug effects the userland, applications still hangs frequently, for example `tmux'. In Debian, we have a list packages that can build on Cavium while cannot on Loongson 1000. bcftools botch casacore ceres-solver chemps2 clippoly cpl-plugin-giraf cpl-plugin-xshoo dolfin freeipa git golang-1.11 graphicsmagick igraph libminc knot-resolver nodejs octave-ltfat prodigal pypy redis ruby2.3 ghc yade Most of them fail due to hangs. I tested them on Loongson 3K, some of them can build successfully now, and some of them cannot build still. I guess the reason is that we also need some workaround in userland, like libc etc. > > > ...and probably more questions depending upon the answers to these ones. > > > > > >> Loongson doesn't plan to fix this problem in future, so we add the > > >> sync here for any condition. > > > > > > So are you saying that future Loongson CPUs will all be buggy too, and > > > someone there has said that they consider this to be OK..? I really > > > really hope that is not true. > > > > Bug is bug. It is not OK. > > I blame these Loongson guys here. > > Some Loongson guys is not so normal people. > > Anyway they are a little more normal now, and anyway again, still abnormal. > > > > > If hardware people say they're not going to fix their bugs then working > > > around them is definitely not going to be a priority. It's one thing if > > > a CPU designer says "oops, my bad, work around this & I'll fix it next > > > time". It's quite another for them to say they're not interested in > > > fixing their bugs at all. > > > > They have interests, while I guess the true reason is that they have no enough > > people and money to desgin a core, while this bug is quilt hard to fix. > > I'm not sure I fully understand what you're saying above, but > essentially I want to know that Loongson care about fixing their CPU > bugs. If they don't, and the bugs are as bad as they sound, then in my > view working around them will only reinforce that producing CPUs with > such serious bugs is a good idea. > Yes, you are correct. (some bad words here. > So if anyone from Loongson is reading, I'd really like to hear that the > above is a miscommunication & that you're not intending to knowingly > design any further CPUs with these bugs. > In fact I told with them lots of times face to face. The only improve of them is that finally they can say out this is a bug not features. > Thanks, > Paul > > [1] https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01064.html > ("Loongson3 need a sync before branch target that between ll and sc.") ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-01-12 8:20 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-11 12:40 [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc 徐成华 2019-01-11 12:45 ` huangpei 2019-01-11 19:00 ` Paul Burton 2019-01-12 8:02 ` 徐成华 2019-01-12 8:19 ` huangpei 2019-01-12 3:25 ` huangpei 2019-01-12 3:41 ` Yunqiang Su 2019-01-12 6:21 ` huangpei -- strict thread matches above, loose matches on Subject: below -- 2019-01-05 15:00 YunQiang Su 2019-01-09 22:08 ` Paul Burton 2019-01-10 1:59 ` Yunqiang Su 2019-01-10 17:35 ` Paul Burton 2019-01-10 18:42 ` YunQiang Su
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.