* Random memory corruption with v5.2 @ 2019-07-29 10:51 Andreas Schwab 2019-07-29 22:58 ` David Abdurachmanov 0 siblings, 1 reply; 30+ messages in thread From: Andreas Schwab @ 2019-07-29 10:51 UTC (permalink / raw) To: linux-riscv Since switching to 5.2 kernels I'm seeing random crashes and misbehaviors on the HiFive, for example while building gcc or glibc. Perhaps missing TLB flushes? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-29 10:51 Random memory corruption with v5.2 Andreas Schwab @ 2019-07-29 22:58 ` David Abdurachmanov 2019-07-30 4:27 ` Atish Patra 2019-07-30 6:56 ` Andreas Schwab 0 siblings, 2 replies; 30+ messages in thread From: David Abdurachmanov @ 2019-07-29 22:58 UTC (permalink / raw) To: Andreas Schwab; +Cc: linux-riscv On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote: > > Since switching to 5.2 kernels I'm seeing random crashes and > misbehaviors on the HiFive, for example while building gcc or glibc. > Perhaps missing TLB flushes? Do you have some examples of crashes? I am running 5.2-rc7 on a large number QEMU instances for builders, and I see some strange behavior, but I haven't noticed any issues on the board using OpenEmbedded build with the final 5.2 yet. [17983.074847] Unable to handle kernel paging request at virtual address 0fffffdff5e14700 [17983.085132] Oops [#1] [133953.710130] kernel BUG at include/linux/mm.h:1023! [133953.718204] Kernel BUG [#1] [165770.567652] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 [148578.912479] kernel BUG at lib/list_debug.c:51! [148578.917701] Kernel BUG [#1] [163756.869949] EXT4-fs (vda2): pa 00000000e9971722: logic 512, phys. 2558464, len 512 [163756.889549] EXT4-fs error (device vda2): ext4_mb_release_inode_pa:3837: group 78, free 0, pa_free 149 [163757.757600] EXT4-fs (vda2): pa 0000000066b479c3: logic 32, phys. 2558368, len 96 sbi_trap_error: hart1: misaligned store handler failed (error -10) sbi_trap_error: hart1: mcause=0x0000000000000006 mtval=0x00000000000002c3 sbi_trap_error: hart1: mepc=0xffffffe0009dc1f4 mstatus=0x0000000000000802 sbi_trap_error: hart1: ra=0xffffffe0009dc1ee sp=0xffffffe1f3c17be0 [178876.406122] Unable to handle kernel paging request at virtual address 0000000000012a28 [178876.423941] Oops [#1] _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-29 22:58 ` David Abdurachmanov @ 2019-07-30 4:27 ` Atish Patra 2019-07-30 6:56 ` Andreas Schwab 1 sibling, 0 replies; 30+ messages in thread From: Atish Patra @ 2019-07-30 4:27 UTC (permalink / raw) To: David Abdurachmanov, Andreas Schwab; +Cc: linux-riscv On 7/29/19 3:58 PM, David Abdurachmanov wrote: > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote: >> >> Since switching to 5.2 kernels I'm seeing random crashes and >> misbehaviors on the HiFive, for example while building gcc or glibc. >> Perhaps missing TLB flushes? > > Do you have some examples of crashes? > > I am running 5.2-rc7 on a large number QEMU instances for builders, > and I see some strange behavior, but I haven't noticed any issues > on the board using OpenEmbedded build with the final 5.2 yet. > Looking at the timestamps, these seems to be different crashes in different instances. Is there any particular workload you were running or just happens randomly if you run long enough ? If you have complete dmesg and/or vmlinux that will help as well. > [17983.074847] Unable to handle kernel paging request at virtual > address 0fffffdff5e14700 > [17983.085132] Oops [#1] > > [133953.710130] kernel BUG at include/linux/mm.h:1023! > [133953.718204] Kernel BUG [#1] > > [165770.567652] Unable to handle kernel NULL pointer dereference at > virtual address 0000000000000010 > > [148578.912479] kernel BUG at lib/list_debug.c:51! > [148578.917701] Kernel BUG [#1] > > [163756.869949] EXT4-fs (vda2): pa 00000000e9971722: logic 512, phys. > 2558464, len 512 > [163756.889549] EXT4-fs error (device vda2): > ext4_mb_release_inode_pa:3837: group 78, free 0, pa_free 149 > [163757.757600] EXT4-fs (vda2): pa 0000000066b479c3: logic 32, phys. > 2558368, len 96 > > sbi_trap_error: hart1: misaligned store handler failed (error -10) > sbi_trap_error: hart1: mcause=0x0000000000000006 mtval=0x00000000000002c3 > sbi_trap_error: hart1: mepc=0xffffffe0009dc1f4 mstatus=0x0000000000000802 > sbi_trap_error: hart1: ra=0xffffffe0009dc1ee sp=0xffffffe1f3c17be0 > > [178876.406122] Unable to handle kernel paging request at virtual > address 0000000000012a28 > [178876.423941] Oops [#1] > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > -- Regards, Atish _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-29 22:58 ` David Abdurachmanov 2019-07-30 4:27 ` Atish Patra @ 2019-07-30 6:56 ` Andreas Schwab 2019-07-31 0:22 ` Paul Walmsley 2019-08-15 20:52 ` Atish Patra 1 sibling, 2 replies; 30+ messages in thread From: Andreas Schwab @ 2019-07-30 6:56 UTC (permalink / raw) To: David Abdurachmanov; +Cc: linux-riscv On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote: > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote: >> >> Since switching to 5.2 kernels I'm seeing random crashes and >> misbehaviors on the HiFive, for example while building gcc or glibc. >> Perhaps missing TLB flushes? > > Do you have some examples of crashes? While building glibc: an_ES.UTF-8...realloc(): invalid pointer /bin/sh: line 1: 7841 Aborted (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT /glibc-2.29-0.riscv64 $locale make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134 While building gcc: ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate': ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized] ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here +===========================GNAT BUG DETECTED==============================+ | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) | | Storage_Error stack overflow or erroneous memory access | | Error detected at output.ads:39:8 | realloc(): invalid pointer raised PROGRAM_ERROR : unhandled signal make[3]: *** [../../gcc/ada/gcc-interface/Make-lang.in:140: ada/exp_ch3.o] Error 1 Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-30 6:56 ` Andreas Schwab @ 2019-07-31 0:22 ` Paul Walmsley 2019-07-31 7:39 ` Andreas Schwab ` (3 more replies) 2019-08-15 20:52 ` Atish Patra 1 sibling, 4 replies; 30+ messages in thread From: Paul Walmsley @ 2019-07-31 0:22 UTC (permalink / raw) To: Andreas Schwab; +Cc: linux-riscv, David Abdurachmanov [-- Attachment #1: Type: text/plain, Size: 3256 bytes --] On Tue, 30 Jul 2019, Andreas Schwab wrote: > On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote: > > > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote: > >> > >> Since switching to 5.2 kernels I'm seeing random crashes and > >> misbehaviors on the HiFive, for example while building gcc or glibc. > >> Perhaps missing TLB flushes? > > > > Do you have some examples of crashes? > > While building glibc: > > an_ES.UTF-8...realloc(): invalid pointer > /bin/sh: line 1: 7841 Aborted (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT/glibc-2.29-0.riscv64 $locale > make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134 > > While building gcc: > > ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate': > ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized] > ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here > +===========================GNAT BUG DETECTED==============================+ > | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) | > | Storage_Error stack overflow or erroneous memory access | > | Error detected at output.ads:39:8 | > realloc(): invalid pointer I personally haven't seen these issues; but then again, I haven't done any glibc or gcc builds on v5.2. Will take a closer look. Reflecting on the recent commits, there weren't too many recent RISC-V-specific changes that could have an impact here. So if these problems are relatively repeatable, and they didn't happen with v5.1, there are a few patches that might be worth reverting to see if the situation improves. Here would be my short list: - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize MMU after pte change") - Commit 6dd91e0eacff0a5c822ca37565d6b5740c4d2a80 ("RISC-V: defconfig: Enable NO_HZ_IDLE and HIGH_RES_TIMERS") - Commit 671f9a3e2e24cdeb2d2856abee7422f093e23e29 ("RISC-V: Setup initial page tables in two stages") Of course, it's also possible that someone could have made a change outside arch/riscv that are causing these problems. If that's the case, we're probably stuck bisecting it. - Paul [-- Attachment #2: Type: text/plain, Size: 161 bytes --] _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-31 0:22 ` Paul Walmsley @ 2019-07-31 7:39 ` Andreas Schwab 2019-07-31 8:14 ` Anup Patel 2019-08-01 19:57 ` Palmer Dabbelt 2019-07-31 10:19 ` Andreas Schwab ` (2 subsequent siblings) 3 siblings, 2 replies; 30+ messages in thread From: Andreas Schwab @ 2019-07-31 7:39 UTC (permalink / raw) To: Paul Walmsley; +Cc: linux-riscv, David Abdurachmanov On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize > MMU after pte change") That would be my favorite. > - Commit 6dd91e0eacff0a5c822ca37565d6b5740c4d2a80 ("RISC-V: defconfig: > Enable NO_HZ_IDLE and HIGH_RES_TIMERS") I had these enabled forever already. > - Commit 671f9a3e2e24cdeb2d2856abee7422f093e23e29 ("RISC-V: Setup initial > page tables in two stages") I don't think a one-time initial setup can have such a subtle effect. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-31 7:39 ` Andreas Schwab @ 2019-07-31 8:14 ` Anup Patel 2019-08-01 19:57 ` Palmer Dabbelt 1 sibling, 0 replies; 30+ messages in thread From: Anup Patel @ 2019-07-31 8:14 UTC (permalink / raw) To: Andreas Schwab; +Cc: linux-riscv, David Abdurachmanov, Paul Walmsley On Wed, Jul 31, 2019 at 1:09 PM Andreas Schwab <schwab@suse.de> wrote: > > On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > > > - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize > > MMU after pte change") > > That would be my favorite. > > > - Commit 6dd91e0eacff0a5c822ca37565d6b5740c4d2a80 ("RISC-V: defconfig: > > Enable NO_HZ_IDLE and HIGH_RES_TIMERS") > > I had these enabled forever already. > > > - Commit 671f9a3e2e24cdeb2d2856abee7422f093e23e29 ("RISC-V: Setup initial > > page tables in two stages") > > I don't think a one-time initial setup can have such a subtle effect. The initial page table setup patch is not present in 5.2. It was merged in 5.3. Regards, Anup _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-31 7:39 ` Andreas Schwab 2019-07-31 8:14 ` Anup Patel @ 2019-08-01 19:57 ` Palmer Dabbelt 1 sibling, 0 replies; 30+ messages in thread From: Palmer Dabbelt @ 2019-08-01 19:57 UTC (permalink / raw) To: schwab; +Cc: linux-riscv, david.abdurachmanov, Paul Walmsley On Wed, 31 Jul 2019 00:39:10 PDT (-0700), schwab@suse.de wrote: > On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > >> - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize >> MMU after pte change") > > That would be my favorite. If that patch causes memory corruption then something scary is going on. I haven't been following the thread closely enough to know how easy this is to reproduce, but do you mind trying a kernel with a reverted version of that commit? This is also availiable on the "for-andreas" branch of git.kernel.org/palmer/linux.git commit 07d45256aa8641057c141f1a661bb29dd99eb32e gpg: Signature made Thu 01 Aug 2019 12:46:22 PM PDT gpg: using RSA key 00CE76D1834960DFCE886DF8EF4CA1502CCBAB41 gpg: issuer "palmer@dabbelt.com" gpg: Good signature from "Palmer Dabbelt <palmer@dabbelt.com>" [ultimate] gpg: aka "Palmer Dabbelt <palmer@sifive.com>" [ultimate] Author: Palmer Dabbelt <palmer@sifive.com> Date: Thu Aug 1 12:45:12 2019 -0700 Revert "riscv: mm: synchronize MMU after pte change" Andreas Schwab is seeing some random memory corruption with 5.2, and he thinks the reverted comit is the most likely candidate. The commit itself doesn't revert cleanly, but that's just because getting the comment right took two commits. If this does fix the issue then we're in a bit of trouble, as this TLB flush should be pretty safe. This reverts commit bf587caae305ae3b4393077fb22c98478ee55755. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index f960c3f4ce47..28dccb072255 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -16,7 +16,6 @@ #include <asm/pgalloc.h> #include <asm/ptrace.h> -#include <asm/tlbflush.h> /* * This routine handles page faults. It determines the address and the @@ -267,14 +266,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs) if (!pte_present(*pte_k)) goto no_context; - /* - * The kernel assumes that TLBs don't cache invalid - * entries, but in RISC-V, SFENCE.VMA specifies an - * ordering constraint, not a cache flush; it is - * necessary even after writing invalid entries. - */ - local_flush_tlb_page(addr); - return; } } >> - Commit 6dd91e0eacff0a5c822ca37565d6b5740c4d2a80 ("RISC-V: defconfig: >> Enable NO_HZ_IDLE and HIGH_RES_TIMERS") > > I had these enabled forever already. IIRC that was the argument for enabling them in defconfig :) >> - Commit 671f9a3e2e24cdeb2d2856abee7422f093e23e29 ("RISC-V: Setup initial >> page tables in two stages") > > I don't think a one-time initial setup can have such a subtle effect. As per Anup, it's not in 5.2. > > Andreas. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-31 0:22 ` Paul Walmsley 2019-07-31 7:39 ` Andreas Schwab @ 2019-07-31 10:19 ` Andreas Schwab 2019-07-31 12:57 ` Troy Benjegerdes 2019-08-01 18:32 ` Andreas Schwab 2019-08-02 7:25 ` Paul Walmsley 3 siblings, 1 reply; 30+ messages in thread From: Andreas Schwab @ 2019-07-31 10:19 UTC (permalink / raw) To: Paul Walmsley; +Cc: linux-riscv, David Abdurachmanov On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize > MMU after pte change") When I revert that commit, I'm getting soft lockups. Doesn't that point to some deeper issue with TLB flushes? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-31 10:19 ` Andreas Schwab @ 2019-07-31 12:57 ` Troy Benjegerdes 2019-07-31 13:10 ` Andreas Schwab 0 siblings, 1 reply; 30+ messages in thread From: Troy Benjegerdes @ 2019-07-31 12:57 UTC (permalink / raw) To: Andreas Schwab; +Cc: linux-riscv, David Abdurachmanov, Paul Walmsley > On Jul 31, 2019, at 5:19 AM, Andreas Schwab <schwab@suse.de> wrote: > > On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > >> - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize >> MMU after pte change") > > When I revert that commit, I'm getting soft lockups. Doesn't that point > to some deeper issue with TLB flushes? > > Andreas. What are you using for filesystem/storage? Is it the SDcard, network, or something else? _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-31 12:57 ` Troy Benjegerdes @ 2019-07-31 13:10 ` Andreas Schwab 0 siblings, 0 replies; 30+ messages in thread From: Andreas Schwab @ 2019-07-31 13:10 UTC (permalink / raw) To: Troy Benjegerdes; +Cc: linux-riscv, David Abdurachmanov, Paul Walmsley On Jul 31 2019, Troy Benjegerdes <troy.benjegerdes@sifive.com> wrote: > What are you using for filesystem/storage? NFS. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-31 0:22 ` Paul Walmsley 2019-07-31 7:39 ` Andreas Schwab 2019-07-31 10:19 ` Andreas Schwab @ 2019-08-01 18:32 ` Andreas Schwab 2019-08-02 2:00 ` Palmer Dabbelt 2019-08-02 2:15 ` Anup Patel 2019-08-02 7:25 ` Paul Walmsley 3 siblings, 2 replies; 30+ messages in thread From: Andreas Schwab @ 2019-08-01 18:32 UTC (permalink / raw) To: Paul Walmsley; +Cc: linux-riscv, opensbi, David Abdurachmanov On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > On Tue, 30 Jul 2019, Andreas Schwab wrote: > >> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote: >> >> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote: >> >> >> >> Since switching to 5.2 kernels I'm seeing random crashes and >> >> misbehaviors on the HiFive, for example while building gcc or glibc. >> >> Perhaps missing TLB flushes? >> > >> > Do you have some examples of crashes? >> >> While building glibc: >> >> an_ES.UTF-8...realloc(): invalid pointer >> /bin/sh: line 1: 7841 Aborted (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDR OOT/glibc-2.29-0.riscv64 $locale >> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134 >> >> While building gcc: >> >> ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate': >> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized] >> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here >> +===========================GNAT BUG DETECTED==============================+ >> | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) | >> | Storage_Error stack overflow or erroneous memory access | >> | Error detected at output.ads:39:8 | >> realloc(): invalid pointer > > I personally haven't seen these issues; but then again, I haven't done any > glibc or gcc builds on v5.2. Will take a closer look. I think there is some fundamental problem with SBI_REMOTE_SFENCE_VMA or the kernel interface to it. For exmaple, flush_tlb_page is defined as: #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) But the third argument of flush_tlb_range is supposed to be the end address, so this should actually be: #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE) Alas, that doesn't fix the crashes. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-01 18:32 ` Andreas Schwab @ 2019-08-02 2:00 ` Palmer Dabbelt 2019-08-02 2:15 ` Anup Patel 1 sibling, 0 replies; 30+ messages in thread From: Palmer Dabbelt @ 2019-08-02 2:00 UTC (permalink / raw) To: schwab; +Cc: linux-riscv, David Abdurachmanov, opensbi, Paul Walmsley On Thu, 01 Aug 2019 11:32:33 PDT (-0700), schwab@suse.de wrote: > On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > >> On Tue, 30 Jul 2019, Andreas Schwab wrote: >> >>> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote: >>> >>> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote: >>> >> >>> >> Since switching to 5.2 kernels I'm seeing random crashes and >>> >> misbehaviors on the HiFive, for example while building gcc or glibc. >>> >> Perhaps missing TLB flushes? >>> > >>> > Do you have some examples of crashes? >>> >>> While building glibc: >>> >>> an_ES.UTF-8...realloc(): invalid pointer >>> /bin/sh: line 1: 7841 Aborted (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILD ROOT/glibc-2.29-0.riscv64 > >>> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134 >>> >>> While building gcc: >>> >>> ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate': >>> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized] >>> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here >>> +===========================GNAT BUG DETECTED==============================+ >>> | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) | >>> | Storage_Error stack overflow or erroneous memory access | >>> | Error detected at output.ads:39:8 | >>> realloc(): invalid pointer >> >> I personally haven't seen these issues; but then again, I haven't done any >> glibc or gcc builds on v5.2. Will take a closer look. > > I think there is some fundamental problem with SBI_REMOTE_SFENCE_VMA or > the kernel interface to it. > > For exmaple, flush_tlb_page is defined as: > > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) > > But the third argument of flush_tlb_range is supposed to be the end > address, so this should actually be: > > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE) > > Alas, that doesn't fix the crashes. This line of reasoning smells like it'd find the issue: BBL just flushes the entire TLB every time, but IIRC OpenSBI respects the ranges. It looks like Fixes: 90cb4917b584 ("lib: Implement sfence.vma correctly.") is what introduced the new behavior in OpenSBI, which may have triggered a lot of latent bugs in Linux. If you have an easy way to compile OpenSBI, does something like $ git diff | cat diff --git a/lib/sbi/sbi_tlb.c b/lib/sbi/sbi_tlb.c index cffda52d66ab..007266b1f970 100644 --- a/lib/sbi/sbi_tlb.c +++ b/lib/sbi/sbi_tlb.c @@ -133,50 +133,12 @@ static void sbi_tlb_flush_all(void) static void sbi_tlb_fifo_sfence_vma(struct sbi_tlb_info *tinfo) { - unsigned long start = tinfo->start; - unsigned long size = tinfo->size; - unsigned long i; - - if ((start == 0 && size == 0) || (size == SBI_TLB_FLUSH_ALL)) { - sbi_tlb_flush_all(); - return; - } - - for (i = 0; i < size; i += PAGE_SIZE) { - __asm__ __volatile__("sfence.vma %0" - : - : "r"(start + i) - : "memory"); - } + sbi_tlb_flush_all(); } static void sbi_tlb_fifo_sfence_vma_asid(struct sbi_tlb_info *tinfo) { - unsigned long start = tinfo->start; - unsigned long size = tinfo->size; - unsigned long asid = tinfo->asid; - unsigned long i; - - if (start == 0 && size == 0) { - sbi_tlb_flush_all(); - return; - } - - /* Flush entire MM context for a given ASID */ - if (size == SBI_TLB_FLUSH_ALL) { - __asm__ __volatile__("sfence.vma x0, %0" - : - : "r"(asid) - : "memory"); - return; - } - - for (i = 0; i < size; i += PAGE_SIZE) { - __asm__ __volatile__("sfence.vma %0, %1" - : - : "r"(start + i), "r"(asid) - : "memory"); - } + sbi_tlb_flush_all(); } void sbi_tlb_fifo_process(struct sbi_scratch *scratch, u32 event) cause the issue to go away? If so, then I'd bet we need to scour Linux for broken TLB flushing, as given the one you found is pretty obvious I'd bet there's a lot more... > > Andreas. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-01 18:32 ` Andreas Schwab 2019-08-02 2:00 ` Palmer Dabbelt @ 2019-08-02 2:15 ` Anup Patel 2019-08-05 14:08 ` Andreas Schwab 1 sibling, 1 reply; 30+ messages in thread From: Anup Patel @ 2019-08-02 2:15 UTC (permalink / raw) To: Andreas Schwab; +Cc: linux-riscv, David Abdurachmanov, OpenSBI, Paul Walmsley On Fri, Aug 2, 2019 at 12:02 AM Andreas Schwab <schwab@suse.de> wrote: > > On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > > > On Tue, 30 Jul 2019, Andreas Schwab wrote: > > > >> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote: > >> > >> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote: > >> >> > >> >> Since switching to 5.2 kernels I'm seeing random crashes and > >> >> misbehaviors on the HiFive, for example while building gcc or glibc. > >> >> Perhaps missing TLB flushes? > >> > > >> > Do you have some examples of crashes? > >> > >> While building glibc: > >> > >> an_ES.UTF-8...realloc(): invalid pointer > >> /bin/sh: line 1: 7841 Aborted (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUIL DROOT/glibc-2.29-0.riscv64 $locale > >> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134 > >> > >> While building gcc: > >> > >> ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate': > >> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized] > >> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here > >> +===========================GNAT BUG DETECTED==============================+ > >> | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) | > >> | Storage_Error stack overflow or erroneous memory access | > >> | Error detected at output.ads:39:8 | > >> realloc(): invalid pointer > > > > I personally haven't seen these issues; but then again, I haven't done any > > glibc or gcc builds on v5.2. Will take a closer look. > > I think there is some fundamental problem with SBI_REMOTE_SFENCE_VMA or > the kernel interface to it. > > For exmaple, flush_tlb_page is defined as: > > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) > > But the third argument of flush_tlb_range is supposed to be the end > address, so this should actually be: > > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE) Instead of this can you try -1UL as the size: #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, -1UL) If above works for you then there is some issue with range of virtual memory we flush. Regards, Anup _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-02 2:15 ` Anup Patel @ 2019-08-05 14:08 ` Andreas Schwab 2019-08-05 14:34 ` Andreas Schwab 0 siblings, 1 reply; 30+ messages in thread From: Andreas Schwab @ 2019-08-05 14:08 UTC (permalink / raw) To: Anup Patel; +Cc: linux-riscv, David Abdurachmanov, OpenSBI, Paul Walmsley On Aug 02 2019, Anup Patel <anup@brainfault.org> wrote: > Instead of this can you try -1UL as the size: > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, -1UL) That doesn't help either. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-05 14:08 ` Andreas Schwab @ 2019-08-05 14:34 ` Andreas Schwab 2019-08-05 15:36 ` Andreas Schwab 2019-08-05 22:34 ` Atish Patra 0 siblings, 2 replies; 30+ messages in thread From: Andreas Schwab @ 2019-08-05 14:34 UTC (permalink / raw) To: Anup Patel; +Cc: linux-riscv, Paul Walmsley, OpenSBI, David Abdurachmanov But this does help: --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start, cpumask_clear(&hmask); riscv_cpuid_to_hartid_mask(cmask, &hmask); - sbi_remote_sfence_vma(hmask.bits, start, size); + sbi_remote_sfence_vma(hmask.bits, 0, -1); } #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-05 14:34 ` Andreas Schwab @ 2019-08-05 15:36 ` Andreas Schwab 2019-08-05 22:34 ` Atish Patra 1 sibling, 0 replies; 30+ messages in thread From: Andreas Schwab @ 2019-08-05 15:36 UTC (permalink / raw) To: Anup Patel; +Cc: linux-riscv, David Abdurachmanov, OpenSBI, Paul Walmsley This helps too: --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -50,10 +50,11 @@ static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start, cpumask_clear(&hmask); riscv_cpuid_to_hartid_mask(cmask, &hmask); sbi_remote_sfence_vma(hmask.bits, start, size); + local_flush_tlb_all(); } #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) -#define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) +#define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE) #define flush_tlb_range(vma, start, end) \ remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - (start)) #define flush_tlb_mm(mm) \ Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-05 14:34 ` Andreas Schwab 2019-08-05 15:36 ` Andreas Schwab @ 2019-08-05 22:34 ` Atish Patra 2019-08-06 0:25 ` Troy Benjegerdes ` (2 more replies) 1 sibling, 3 replies; 30+ messages in thread From: Atish Patra @ 2019-08-05 22:34 UTC (permalink / raw) To: anup, schwab; +Cc: linux-riscv, david.abdurachmanov, opensbi, paul.walmsley On Mon, 2019-08-05 at 16:34 +0200, Andreas Schwab wrote: > But this does help: > > --- a/arch/riscv/include/asm/tlbflush.h > +++ b/arch/riscv/include/asm/tlbflush.h > @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct cpumask > *cmask, unsigned long start, > > cpumask_clear(&hmask); > riscv_cpuid_to_hartid_mask(cmask, &hmask); > - sbi_remote_sfence_vma(hmask.bits, start, size); > + sbi_remote_sfence_vma(hmask.bits, 0, -1); > } > > #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) > I am also able to reprduce the issue while doing a install-locales. Here is the temporary fix that seems to solve the issue. diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 687dd19735a7..29b2bd7c9923 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -55,7 +55,7 @@ static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start, #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) #define flush_tlb_range(vma, start, end) \ - remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - (start)) + remote_sfence_vma(mm_cpumask((vma)->vm_mm), 0, -1) #define flush_tlb_mm(mm) \ remote_sfence_vma(mm_cpumask(mm), 0, -1) Can you please verify at your end? While your fix flushes the entire tlb for every type of remote tlb flush, this fix proves that the issue is with flush_tlb_range call only. I am looking at the OpenSBI/Kernel implementation to figure out if it is an OpenSBI issue or something changed in kernel recently to trigger this. Additionally, do you know if a particular or group of locale install is causing this issue? It takes more than hour to finish the full install-locales on unleashe board which makes it bit difficult to try out possible fixes. > Andreas. > -- Regards, Atish _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-05 22:34 ` Atish Patra @ 2019-08-06 0:25 ` Troy Benjegerdes 2019-08-06 0:30 ` Atish Patra 2019-08-06 6:41 ` Andreas Schwab 2019-08-06 7:43 ` Andreas Schwab 2 siblings, 1 reply; 30+ messages in thread From: Troy Benjegerdes @ 2019-08-06 0:25 UTC (permalink / raw) To: Atish Patra Cc: david.abdurachmanov, anup, opensbi, paul.walmsley, schwab, linux-riscv > On Aug 5, 2019, at 5:34 PM, Atish Patra <Atish.Patra@wdc.com> wrote: > > On Mon, 2019-08-05 at 16:34 +0200, Andreas Schwab wrote: >> But this does help: >> >> --- a/arch/riscv/include/asm/tlbflush.h >> +++ b/arch/riscv/include/asm/tlbflush.h >> @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct cpumask >> *cmask, unsigned long start, >> >> cpumask_clear(&hmask); >> riscv_cpuid_to_hartid_mask(cmask, &hmask); >> - sbi_remote_sfence_vma(hmask.bits, start, size); >> + sbi_remote_sfence_vma(hmask.bits, 0, -1); >> } >> >> #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) >> > > I am also able to reprduce the issue while doing a install-locales. > Here is the temporary fix that seems to solve the issue. > > diff --git a/arch/riscv/include/asm/tlbflush.h > b/arch/riscv/include/asm/tlbflush.h > index 687dd19735a7..29b2bd7c9923 100644 > --- a/arch/riscv/include/asm/tlbflush.h > +++ b/arch/riscv/include/asm/tlbflush.h > @@ -55,7 +55,7 @@ static inline void remote_sfence_vma(struct cpumask > *cmask, unsigned long start, > #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) > #define flush_tlb_range(vma, start, end) \ > - remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - > (start)) > + remote_sfence_vma(mm_cpumask((vma)->vm_mm), 0, -1) > #define flush_tlb_mm(mm) \ > remote_sfence_vma(mm_cpumask(mm), 0, -1) > > Can you please verify at your end? > > > While your fix flushes the entire tlb for every type of remote tlb > flush, this fix proves that the issue is with flush_tlb_range call > only. > > I am looking at the OpenSBI/Kernel implementation to figure out if it > is an OpenSBI issue or something changed in kernel recently to trigger > this. > > Additionally, do you know if a particular or group of locale install > is causing this issue? > > It takes more than hour to finish the full install-locales on unleashe > board which makes it bit difficult to try out possible fixes. > Did you reproduce with SDcard, or NFS? > >> Andreas. >> > > -- > Regards, > Atish > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-06 0:25 ` Troy Benjegerdes @ 2019-08-06 0:30 ` Atish Patra 0 siblings, 0 replies; 30+ messages in thread From: Atish Patra @ 2019-08-06 0:30 UTC (permalink / raw) To: troy.benjegerdes Cc: david.abdurachmanov, anup, opensbi, paul.walmsley, schwab, linux-riscv On Mon, 2019-08-05 at 19:25 -0500, Troy Benjegerdes wrote: > > On Aug 5, 2019, at 5:34 PM, Atish Patra <Atish.Patra@wdc.com> > > wrote: > > > > On Mon, 2019-08-05 at 16:34 +0200, Andreas Schwab wrote: > > > But this does help: > > > > > > --- a/arch/riscv/include/asm/tlbflush.h > > > +++ b/arch/riscv/include/asm/tlbflush.h > > > @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct > > > cpumask > > > *cmask, unsigned long start, > > > > > > cpumask_clear(&hmask); > > > riscv_cpuid_to_hartid_mask(cmask, &hmask); > > > - sbi_remote_sfence_vma(hmask.bits, start, size); > > > + sbi_remote_sfence_vma(hmask.bits, 0, -1); > > > } > > > > > > #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) > > > > > > > I am also able to reprduce the issue while doing a install-locales. > > Here is the temporary fix that seems to solve the issue. > > > > diff --git a/arch/riscv/include/asm/tlbflush.h > > b/arch/riscv/include/asm/tlbflush.h > > index 687dd19735a7..29b2bd7c9923 100644 > > --- a/arch/riscv/include/asm/tlbflush.h > > +++ b/arch/riscv/include/asm/tlbflush.h > > @@ -55,7 +55,7 @@ static inline void remote_sfence_vma(struct > > cpumask > > *cmask, unsigned long start, > > #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) > > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) > > #define flush_tlb_range(vma, start, end) \ > > - remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - > > (start)) > > + remote_sfence_vma(mm_cpumask((vma)->vm_mm), 0, -1) > > #define flush_tlb_mm(mm) \ > > remote_sfence_vma(mm_cpumask(mm), 0, -1) > > > > Can you please verify at your end? > > > > > > While your fix flushes the entire tlb for every type of remote tlb > > flush, this fix proves that the issue is with flush_tlb_range call > > only. > > > > I am looking at the OpenSBI/Kernel implementation to figure out if > > it > > is an OpenSBI issue or something changed in kernel recently to > > trigger > > this. > > > > Additionally, do you know if a particular or group of locale > > install > > is causing this issue? > > > > It takes more than hour to finish the full install-locales on > > unleashe > > board which makes it bit difficult to try out possible fixes. > > > > Did you reproduce with SDcard, or NFS? > I am running it on a nvme SSD attached to Microsemi expansion board. Kernel version: 5.3-rc2 OpenSBI/U-Boot: Latest master Regards, Atish > > > Andreas. > > > > > > > -- > > Regards, > > Atish > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-05 22:34 ` Atish Patra 2019-08-06 0:25 ` Troy Benjegerdes @ 2019-08-06 6:41 ` Andreas Schwab 2019-08-06 7:43 ` Andreas Schwab 2 siblings, 0 replies; 30+ messages in thread From: Andreas Schwab @ 2019-08-06 6:41 UTC (permalink / raw) To: Atish Patra Cc: anup, linux-riscv, david.abdurachmanov, opensbi, paul.walmsley On Aug 05 2019, Atish Patra <Atish.Patra@wdc.com> wrote: > It takes more than hour to finish the full install-locales on unleashe > board which makes it bit difficult to try out possible fixes. When it fails it usually fails pretty fast. Did you run it in parallel? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-05 22:34 ` Atish Patra 2019-08-06 0:25 ` Troy Benjegerdes 2019-08-06 6:41 ` Andreas Schwab @ 2019-08-06 7:43 ` Andreas Schwab 2 siblings, 0 replies; 30+ messages in thread From: Andreas Schwab @ 2019-08-06 7:43 UTC (permalink / raw) To: Atish Patra Cc: anup, linux-riscv, david.abdurachmanov, opensbi, paul.walmsley On Aug 05 2019, Atish Patra <Atish.Patra@wdc.com> wrote: > On Mon, 2019-08-05 at 16:34 +0200, Andreas Schwab wrote: >> But this does help: >> >> --- a/arch/riscv/include/asm/tlbflush.h >> +++ b/arch/riscv/include/asm/tlbflush.h >> @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct cpumask >> *cmask, unsigned long start, >> >> cpumask_clear(&hmask); >> riscv_cpuid_to_hartid_mask(cmask, &hmask); >> - sbi_remote_sfence_vma(hmask.bits, start, size); >> + sbi_remote_sfence_vma(hmask.bits, 0, -1); >> } >> >> #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) >> > > I am also able to reprduce the issue while doing a install-locales. > Here is the temporary fix that seems to solve the issue. > > diff --git a/arch/riscv/include/asm/tlbflush.h > b/arch/riscv/include/asm/tlbflush.h > index 687dd19735a7..29b2bd7c9923 100644 > --- a/arch/riscv/include/asm/tlbflush.h > +++ b/arch/riscv/include/asm/tlbflush.h > @@ -55,7 +55,7 @@ static inline void remote_sfence_vma(struct cpumask > *cmask, unsigned long start, > #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1) > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0) > #define flush_tlb_range(vma, start, end) \ > - remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - > (start)) > + remote_sfence_vma(mm_cpumask((vma)->vm_mm), 0, -1) > #define flush_tlb_mm(mm) \ > remote_sfence_vma(mm_cpumask(mm), 0, -1) > > Can you please verify at your end? This is equivalent to my patch since all other uses already pass 0,-1. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-31 0:22 ` Paul Walmsley ` (2 preceding siblings ...) 2019-08-01 18:32 ` Andreas Schwab @ 2019-08-02 7:25 ` Paul Walmsley 2019-08-02 12:08 ` Andreas Schwab 3 siblings, 1 reply; 30+ messages in thread From: Paul Walmsley @ 2019-08-02 7:25 UTC (permalink / raw) To: Andreas Schwab; +Cc: linux-riscv, palmer, David Abdurachmanov I was able to build glibc, and run most of the test suite, on v5.3-rc2 with BBL, with no problems so far. (The test suite is still running.) CPU at 1GHz. Will try gcc in a few hours. - Paul _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-02 7:25 ` Paul Walmsley @ 2019-08-02 12:08 ` Andreas Schwab 2019-08-02 17:32 ` Paul Walmsley 0 siblings, 1 reply; 30+ messages in thread From: Andreas Schwab @ 2019-08-02 12:08 UTC (permalink / raw) To: Paul Walmsley; +Cc: linux-riscv, palmer, David Abdurachmanov On Aug 02 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > I was able to build glibc, and run most of the test suite, on v5.3-rc2 Did you run install-locales? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-02 12:08 ` Andreas Schwab @ 2019-08-02 17:32 ` Paul Walmsley 2019-08-05 7:13 ` Andreas Schwab 0 siblings, 1 reply; 30+ messages in thread From: Paul Walmsley @ 2019-08-02 17:32 UTC (permalink / raw) To: Andreas Schwab; +Cc: linux-riscv, palmer, David Abdurachmanov On Fri, 2 Aug 2019, Andreas Schwab wrote: > On Aug 02 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > > > I was able to build glibc, and run most of the test suite, on v5.3-rc2 > > Did you run install-locales? I just ran "make -j4", "make -j4 check", "make -j4 xcheck". This is with rootfs on microSD, rather than on NFS. Do you still see the failures if you only run the above commands, or does the failure only appear with install-locales? - Paul _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-02 17:32 ` Paul Walmsley @ 2019-08-05 7:13 ` Andreas Schwab 0 siblings, 0 replies; 30+ messages in thread From: Andreas Schwab @ 2019-08-05 7:13 UTC (permalink / raw) To: Paul Walmsley; +Cc: linux-riscv, palmer, David Abdurachmanov On Aug 02 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote: > Do you still see the failures if you only run the above commands, or does > the failure only appear with install-locales? Only during install-locales. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-07-30 6:56 ` Andreas Schwab 2019-07-31 0:22 ` Paul Walmsley @ 2019-08-15 20:52 ` Atish Patra 2019-08-16 5:22 ` Atish Patra 2019-08-19 10:53 ` Andreas Schwab 1 sibling, 2 replies; 30+ messages in thread From: Atish Patra @ 2019-08-15 20:52 UTC (permalink / raw) To: david.abdurachmanov, schwab; +Cc: linux-riscv On Tue, 2019-07-30 at 08:56 +0200, Andreas Schwab wrote: > On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> > wrote: > > > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> > > wrote: > > > Since switching to 5.2 kernels I'm seeing random crashes and > > > misbehaviors on the HiFive, for example while building gcc or > > > glibc. > > > Perhaps missing TLB flushes? > > > > Do you have some examples of crashes? > > While building glibc: > > an_ES.UTF-8...realloc(): invalid pointer > /bin/sh: line 1: 7841 Aborted (core dumped) > I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/elf/ld-linux-riscv64-lp64d.so.1 --library-path > /home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl > /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef > $flags --alias-file=../intl/locale.alias -i locales/$input -f > charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT/glibc- > 2.29-0.riscv64 $locale > make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error > 134 > > While building gcc: > > ../../gcc/ada/exp_aggr.adb: In function > 'Exp_Aggr.Expand_N_Aggregate': > ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used > uninitialized in this function [-Wmaybe-uninitialized] > ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here > +===========================GNAT BUG > DETECTED==============================+ > > 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64- > > suse-linux) | > > Storage_Error stack overflow or erroneous memory > > access | > > Error detected at > > output.ads:39:8 | > realloc(): invalid pointer > > raised PROGRAM_ERROR : unhandled signal > make[3]: *** [../../gcc/ada/gcc-interface/Make-lang.in:140: > ada/exp_ch3.o] Error 1 > > Andreas. > Can you give it a try with following patch in OpenSBI & Kernel ? Linux kernel: http://lists.infradead.org/pipermail/linux-riscv/2019-August/005889.html OpenSBI: http://lists.infradead.org/pipermail/opensbi/2019-August/000386.html In my testing, I no longer the stress-ng error or glibc local install issue if I use following command. sudo make -j8 localedata/install-locale-files DESTDIR=/home/atish/glibc/build/install I still see segmentation fault if I use a archieve locale install command. sudo make -j8 localedata/install-locales DESTDIR=/home/atish/glibc/build/install But the error dump doesn't contain remap() error. Just a segmentation fault which may be due to userspace or just different version of old tlbflush problem. Regards, Atish _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-15 20:52 ` Atish Patra @ 2019-08-16 5:22 ` Atish Patra 2019-08-16 15:38 ` Troy Benjegerdes 2019-08-19 10:53 ` Andreas Schwab 1 sibling, 1 reply; 30+ messages in thread From: Atish Patra @ 2019-08-16 5:22 UTC (permalink / raw) To: david.abdurachmanov, schwab; +Cc: linux-riscv On Thu, 2019-08-15 at 13:52 -0700, Atish Patra wrote: > On Tue, 2019-07-30 at 08:56 +0200, Andreas Schwab wrote: > > On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> > > wrote: > > > > > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> > > > wrote: > > > > Since switching to 5.2 kernels I'm seeing random crashes and > > > > misbehaviors on the HiFive, for example while building gcc or > > > > glibc. > > > > Perhaps missing TLB flushes? > > > > > > Do you have some examples of crashes? > > > > While building glibc: > > > > an_ES.UTF-8...realloc(): invalid pointer > > /bin/sh: line 1: 7841 Aborted (core dumped) > > I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/elf/ld-linux-riscv64-lp64d.so.1 --library-path > > /home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- > > base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl > > /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef > > $flags --alias-file=../intl/locale.alias -i locales/$input -f > > charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT/glibc- > > 2.29-0.riscv64 $locale > > make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] > > Error > > 134 > > > > While building gcc: > > > > ../../gcc/ada/exp_aggr.adb: In function > > 'Exp_Aggr.Expand_N_Aggregate': > > ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used > > uninitialized in this function [-Wmaybe-uninitialized] > > ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here > > +===========================GNAT BUG > > DETECTED==============================+ > > > 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64- > > > suse-linux) | > > > Storage_Error stack overflow or erroneous memory > > > access | > > > Error detected at > > > output.ads:39:8 | > > realloc(): invalid pointer > > > > raised PROGRAM_ERROR : unhandled signal > > make[3]: *** [../../gcc/ada/gcc-interface/Make-lang.in:140: > > ada/exp_ch3.o] Error 1 > > > > Andreas. > > > > Can you give it a try with following patch in OpenSBI & Kernel ? > > Linux kernel: > http://lists.infradead.org/pipermail/linux-riscv/2019-August/005889.html > > OpenSBI: > http://lists.infradead.org/pipermail/opensbi/2019-August/000386.html > > In my testing, I no longer the stress-ng error or glibc local install > issue if I use following command. > > sudo make -j8 localedata/install-locale-files > DESTDIR=/home/atish/glibc/build/install > > > I still see segmentation fault if I use a archieve locale install > command. > > sudo make -j8 localedata/install-locales > DESTDIR=/home/atish/glibc/build/install > I am also able to run above archive locale install command successfully multiple times after removing the corrupted locale-archive files present in install path. Let me know if it works for you as well. I am now running stress-ng & parallel glibc locale install together to fully stress the system. Regards, Atish > But the error dump doesn't contain remap() error. Just a segmentation > fault which may be due to userspace or just different version of old > tlbflush problem. > > > Regards, > Atish > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-16 5:22 ` Atish Patra @ 2019-08-16 15:38 ` Troy Benjegerdes 0 siblings, 0 replies; 30+ messages in thread From: Troy Benjegerdes @ 2019-08-16 15:38 UTC (permalink / raw) To: Atish Patra; +Cc: schwab, linux-riscv, david.abdurachmanov > On Aug 15, 2019, at 10:22 PM, Atish Patra <Atish.Patra@wdc.com> wrote: > > On Thu, 2019-08-15 at 13:52 -0700, Atish Patra wrote: >> On Tue, 2019-07-30 at 08:56 +0200, Andreas Schwab wrote: >>> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> >>> wrote: >>> >>>> On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> >>>> wrote: >>>>> Since switching to 5.2 kernels I'm seeing random crashes and >>>>> misbehaviors on the HiFive, for example while building gcc or >>>>> glibc. >>>>> Perhaps missing TLB flushes? >>>> >>>> Do you have some examples of crashes? >>> >>> While building glibc: >>> >>> an_ES.UTF-8...realloc(): invalid pointer >>> /bin/sh: line 1: 7841 Aborted (core dumped) >>> I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/elf/ld-linux-riscv64-lp64d.so.1 --library-path >>> /home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc- >>> base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl >>> /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef >>> $flags --alias-file=../intl/locale.alias -i locales/$input -f >>> charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT/glibc- >>> 2.29-0.riscv64 $locale >>> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] >>> Error >>> 134 >>> >>> While building gcc: >>> >>> ../../gcc/ada/exp_aggr.adb: In function >>> 'Exp_Aggr.Expand_N_Aggregate': >>> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used >>> uninitialized in this function [-Wmaybe-uninitialized] >>> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here >>> +===========================GNAT BUG >>> DETECTED==============================+ >>>> 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64- >>>> suse-linux) | >>>> Storage_Error stack overflow or erroneous memory >>>> access | >>>> Error detected at >>>> output.ads:39:8 | >>> realloc(): invalid pointer >>> >>> raised PROGRAM_ERROR : unhandled signal >>> make[3]: *** [../../gcc/ada/gcc-interface/Make-lang.in:140: >>> ada/exp_ch3.o] Error 1 >>> >>> Andreas. >>> >> >> Can you give it a try with following patch in OpenSBI & Kernel ? >> >> Linux kernel: >> http://lists.infradead.org/pipermail/linux-riscv/2019-August/005889.html >> >> OpenSBI: >> http://lists.infradead.org/pipermail/opensbi/2019-August/000386.html >> >> In my testing, I no longer the stress-ng error or glibc local install >> issue if I use following command. >> >> sudo make -j8 localedata/install-locale-files >> DESTDIR=/home/atish/glibc/build/install >> >> >> I still see segmentation fault if I use a archieve locale install >> command. >> >> sudo make -j8 localedata/install-locales >> DESTDIR=/home/atish/glibc/build/install >> > > I am also able to run above archive locale install command successfully > multiple times after removing the corrupted locale-archive files > present in install path. > > Let me know if it works for you as well. > > I am now running stress-ng & parallel glibc locale install together to > fully stress the system. > > Regards, > Atish >> But the error dump doesn't contain remap() error. Just a segmentation >> fault which may be due to userspace or just different version of old >> tlbflush problem. >> >> >> Regards, >> Atish >> > > Is this with stock linux-5.2.8 release, with no additional patches, or is there something we need to look at backporting? _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Random memory corruption with v5.2 2019-08-15 20:52 ` Atish Patra 2019-08-16 5:22 ` Atish Patra @ 2019-08-19 10:53 ` Andreas Schwab 1 sibling, 0 replies; 30+ messages in thread From: Andreas Schwab @ 2019-08-19 10:53 UTC (permalink / raw) To: Atish Patra; +Cc: linux-riscv, david.abdurachmanov On Aug 15 2019, Atish Patra <Atish.Patra@wdc.com> wrote: > Linux kernel: > http://lists.infradead.org/pipermail/linux-riscv/2019-August/005889.html I've been using that patch, without any changes to openSBI, to run bootstrap/regtest on gcc and to build glibc without issues. Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2019-08-19 10:53 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-29 10:51 Random memory corruption with v5.2 Andreas Schwab 2019-07-29 22:58 ` David Abdurachmanov 2019-07-30 4:27 ` Atish Patra 2019-07-30 6:56 ` Andreas Schwab 2019-07-31 0:22 ` Paul Walmsley 2019-07-31 7:39 ` Andreas Schwab 2019-07-31 8:14 ` Anup Patel 2019-08-01 19:57 ` Palmer Dabbelt 2019-07-31 10:19 ` Andreas Schwab 2019-07-31 12:57 ` Troy Benjegerdes 2019-07-31 13:10 ` Andreas Schwab 2019-08-01 18:32 ` Andreas Schwab 2019-08-02 2:00 ` Palmer Dabbelt 2019-08-02 2:15 ` Anup Patel 2019-08-05 14:08 ` Andreas Schwab 2019-08-05 14:34 ` Andreas Schwab 2019-08-05 15:36 ` Andreas Schwab 2019-08-05 22:34 ` Atish Patra 2019-08-06 0:25 ` Troy Benjegerdes 2019-08-06 0:30 ` Atish Patra 2019-08-06 6:41 ` Andreas Schwab 2019-08-06 7:43 ` Andreas Schwab 2019-08-02 7:25 ` Paul Walmsley 2019-08-02 12:08 ` Andreas Schwab 2019-08-02 17:32 ` Paul Walmsley 2019-08-05 7:13 ` Andreas Schwab 2019-08-15 20:52 ` Atish Patra 2019-08-16 5:22 ` Atish Patra 2019-08-16 15:38 ` Troy Benjegerdes 2019-08-19 10:53 ` Andreas Schwab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).