From: guoren@kernel.org To: anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, conor.dooley@microchip.com, heiko@sntech.de, philipp.tomsich@vrull.eu Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Guo Ren <guoren@linux.alibaba.com>, Guo Ren <guoren@kernel.org>, Anup Patel <apatel@ventanamicro.com>, Palmer Dabbelt <palmer@rivosinc.com> Subject: [PATCH RESEND] riscv: asid: Fixup stale TLB entry cause application crash Date: Tue, 8 Nov 2022 05:20:44 -0500 [thread overview] Message-ID: <20221108102044.3317793-1-guoren@kernel.org> (raw) From: Guo Ren <guoren@linux.alibaba.com> After use_asid_allocator enabled, the userspace application will crash for stale tlb entry. Because only using cpumask_clear_cpu without local_flush_tlb_all couldn't guarantee CPU's tlb entries fresh. Then set_mm_asid would cause user space application get a stale value by the stale tlb entry, but set_mm_noasid is okay. Here is the symptom of the bug: unhandled signal 11 code 0x1 (coredump) 0x0000003fd6d22524 <+4>: auipc s0,0x70 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 => 0x0000003fd6d2252c <+12>: ld a5,0(s0) (gdb) i r s0 s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 (gdb) x/16 0x3fd6d92490 0x3fd6d92490: 0xd80ac8a8 0x0000003f The core dump file shows that the value of register s0 is wrong, but the value in memory is right. This is because 'ld s0, -148(s0)' use a stale mapping entry in TLB and got a wrong value from a stale physical address. When task run on CPU0, the task loaded/speculative-loaded the value of address(0x3fd6d92490), and the first version of tlb mapping entry was PTWed into CPU0's tlb. When the task switched from CPU0 to CPU1 without local_tlb_flush_all (because of asid), the task happened to write a value on address (0x3fd6d92490). It caused do_page_fault -> wp_page_copy -> ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need tlb flush, but CPU0 had cleared the CPU0's mm_cpumask in previous switch_mm. So we only flushed the CPU1 tlb, and setted second version mapping of the pte. When the task switch from CPU1 to CPU0 again, CPU0 still used a stale tlb mapping entry which contained a wrong target physical address. When the task happened to read that value, the bug would be raised. Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Anup Patel <apatel@ventanamicro.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> --- arch/riscv/mm/context.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 7acbfbd14557..8ad6c2493e93 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -317,7 +317,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, */ cpu = smp_processor_id(); - cpumask_clear_cpu(cpu, mm_cpumask(prev)); + if (!static_branch_unlikely(&use_asid_allocator)) + cpumask_clear_cpu(cpu, mm_cpumask(prev)); + cpumask_set_cpu(cpu, mm_cpumask(next)); set_mm(next, cpu); -- 2.36.1
WARNING: multiple messages have this Message-ID (diff)
From: guoren@kernel.org To: anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, conor.dooley@microchip.com, heiko@sntech.de, philipp.tomsich@vrull.eu Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Guo Ren <guoren@linux.alibaba.com>, Guo Ren <guoren@kernel.org>, Anup Patel <apatel@ventanamicro.com>, Palmer Dabbelt <palmer@rivosinc.com> Subject: [PATCH RESEND] riscv: asid: Fixup stale TLB entry cause application crash Date: Tue, 8 Nov 2022 05:20:44 -0500 [thread overview] Message-ID: <20221108102044.3317793-1-guoren@kernel.org> (raw) From: Guo Ren <guoren@linux.alibaba.com> After use_asid_allocator enabled, the userspace application will crash for stale tlb entry. Because only using cpumask_clear_cpu without local_flush_tlb_all couldn't guarantee CPU's tlb entries fresh. Then set_mm_asid would cause user space application get a stale value by the stale tlb entry, but set_mm_noasid is okay. Here is the symptom of the bug: unhandled signal 11 code 0x1 (coredump) 0x0000003fd6d22524 <+4>: auipc s0,0x70 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 => 0x0000003fd6d2252c <+12>: ld a5,0(s0) (gdb) i r s0 s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 (gdb) x/16 0x3fd6d92490 0x3fd6d92490: 0xd80ac8a8 0x0000003f The core dump file shows that the value of register s0 is wrong, but the value in memory is right. This is because 'ld s0, -148(s0)' use a stale mapping entry in TLB and got a wrong value from a stale physical address. When task run on CPU0, the task loaded/speculative-loaded the value of address(0x3fd6d92490), and the first version of tlb mapping entry was PTWed into CPU0's tlb. When the task switched from CPU0 to CPU1 without local_tlb_flush_all (because of asid), the task happened to write a value on address (0x3fd6d92490). It caused do_page_fault -> wp_page_copy -> ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need tlb flush, but CPU0 had cleared the CPU0's mm_cpumask in previous switch_mm. So we only flushed the CPU1 tlb, and setted second version mapping of the pte. When the task switch from CPU1 to CPU0 again, CPU0 still used a stale tlb mapping entry which contained a wrong target physical address. When the task happened to read that value, the bug would be raised. Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Anup Patel <apatel@ventanamicro.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> --- arch/riscv/mm/context.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 7acbfbd14557..8ad6c2493e93 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -317,7 +317,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, */ cpu = smp_processor_id(); - cpumask_clear_cpu(cpu, mm_cpumask(prev)); + if (!static_branch_unlikely(&use_asid_allocator)) + cpumask_clear_cpu(cpu, mm_cpumask(prev)); + cpumask_set_cpu(cpu, mm_cpumask(next)); set_mm(next, cpu); -- 2.36.1 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
next reply other threads:[~2022-11-08 10:21 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-11-08 10:20 guoren [this message] 2022-11-08 10:20 ` [PATCH RESEND] riscv: asid: Fixup stale TLB entry cause application crash guoren 2022-11-08 10:27 ` Conor Dooley 2022-11-08 10:27 ` Conor Dooley 2022-11-08 14:22 ` Conor Dooley 2022-11-08 14:22 ` Conor Dooley 2022-11-09 0:30 ` Guo Ren 2022-11-09 0:30 ` Guo Ren 2022-11-09 0:30 ` Guo Ren 2022-11-09 0:30 ` Guo Ren 2022-11-09 1:42 ` kernel test robot 2022-11-09 1:42 ` kernel test robot 2022-11-09 2:33 ` kernel test robot 2022-11-09 2:33 ` kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20221108102044.3317793-1-guoren@kernel.org \ --to=guoren@kernel.org \ --cc=anup@brainfault.org \ --cc=apatel@ventanamicro.com \ --cc=conor.dooley@microchip.com \ --cc=guoren@linux.alibaba.com \ --cc=heiko@sntech.de \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-riscv@lists.infradead.org \ --cc=palmer@dabbelt.com \ --cc=palmer@rivosinc.com \ --cc=paul.walmsley@sifive.com \ --cc=philipp.tomsich@vrull.eu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.