All of lore.kernel.org
 help / color / mirror / Atom feed
* Random memory corruption with v5.2
@ 2019-07-29 10:51 Andreas Schwab
  2019-07-29 22:58 ` David Abdurachmanov
  0 siblings, 1 reply; 30+ messages in thread
From: Andreas Schwab @ 2019-07-29 10:51 UTC (permalink / raw)
  To: linux-riscv

Since switching to 5.2 kernels I'm seeing random crashes and
misbehaviors on the HiFive, for example while building gcc or glibc.
Perhaps missing TLB flushes?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-29 10:51 Random memory corruption with v5.2 Andreas Schwab
@ 2019-07-29 22:58 ` David Abdurachmanov
  2019-07-30  4:27   ` Atish Patra
  2019-07-30  6:56   ` Andreas Schwab
  0 siblings, 2 replies; 30+ messages in thread
From: David Abdurachmanov @ 2019-07-29 22:58 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-riscv

On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote:
>
> Since switching to 5.2 kernels I'm seeing random crashes and
> misbehaviors on the HiFive, for example while building gcc or glibc.
> Perhaps missing TLB flushes?

Do you have some examples of crashes?

I am running 5.2-rc7 on a large number QEMU instances for builders,
and I see some strange behavior, but I haven't noticed any issues
on the board using OpenEmbedded build with the final 5.2 yet.

[17983.074847] Unable to handle kernel paging request at virtual
address 0fffffdff5e14700
[17983.085132] Oops [#1]

[133953.710130] kernel BUG at include/linux/mm.h:1023!
[133953.718204] Kernel BUG [#1]

[165770.567652] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000010

[148578.912479] kernel BUG at lib/list_debug.c:51!
[148578.917701] Kernel BUG [#1]

[163756.869949] EXT4-fs (vda2): pa 00000000e9971722: logic 512, phys.
2558464, len 512
[163756.889549] EXT4-fs error (device vda2):
ext4_mb_release_inode_pa:3837: group 78, free 0, pa_free 149
[163757.757600] EXT4-fs (vda2): pa 0000000066b479c3: logic 32, phys.
2558368, len 96

sbi_trap_error: hart1: misaligned store handler failed (error -10)
sbi_trap_error: hart1: mcause=0x0000000000000006 mtval=0x00000000000002c3
sbi_trap_error: hart1: mepc=0xffffffe0009dc1f4 mstatus=0x0000000000000802
sbi_trap_error: hart1: ra=0xffffffe0009dc1ee sp=0xffffffe1f3c17be0

[178876.406122] Unable to handle kernel paging request at virtual
address 0000000000012a28
[178876.423941] Oops [#1]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-29 22:58 ` David Abdurachmanov
@ 2019-07-30  4:27   ` Atish Patra
  2019-07-30  6:56   ` Andreas Schwab
  1 sibling, 0 replies; 30+ messages in thread
From: Atish Patra @ 2019-07-30  4:27 UTC (permalink / raw)
  To: David Abdurachmanov, Andreas Schwab; +Cc: linux-riscv

On 7/29/19 3:58 PM, David Abdurachmanov wrote:
> On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote:
>>
>> Since switching to 5.2 kernels I'm seeing random crashes and
>> misbehaviors on the HiFive, for example while building gcc or glibc.
>> Perhaps missing TLB flushes?
> 
> Do you have some examples of crashes?
> 
> I am running 5.2-rc7 on a large number QEMU instances for builders,
> and I see some strange behavior, but I haven't noticed any issues
> on the board using OpenEmbedded build with the final 5.2 yet.
> 

Looking at the timestamps, these seems to be different crashes in 
different instances. Is there any particular workload you were running 
or just happens randomly if you run long enough ?

If you have complete dmesg and/or vmlinux that will help as well.

> [17983.074847] Unable to handle kernel paging request at virtual
> address 0fffffdff5e14700
> [17983.085132] Oops [#1]
> 
> [133953.710130] kernel BUG at include/linux/mm.h:1023!
> [133953.718204] Kernel BUG [#1]
> 
> [165770.567652] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000010
> 
> [148578.912479] kernel BUG at lib/list_debug.c:51!
> [148578.917701] Kernel BUG [#1]
> 
> [163756.869949] EXT4-fs (vda2): pa 00000000e9971722: logic 512, phys.
> 2558464, len 512
> [163756.889549] EXT4-fs error (device vda2):
> ext4_mb_release_inode_pa:3837: group 78, free 0, pa_free 149
> [163757.757600] EXT4-fs (vda2): pa 0000000066b479c3: logic 32, phys.
> 2558368, len 96
> 
> sbi_trap_error: hart1: misaligned store handler failed (error -10)
> sbi_trap_error: hart1: mcause=0x0000000000000006 mtval=0x00000000000002c3
> sbi_trap_error: hart1: mepc=0xffffffe0009dc1f4 mstatus=0x0000000000000802
> sbi_trap_error: hart1: ra=0xffffffe0009dc1ee sp=0xffffffe1f3c17be0
> 
> [178876.406122] Unable to handle kernel paging request at virtual
> address 0000000000012a28
> [178876.423941] Oops [#1]
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 


-- 
Regards,
Atish

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-29 22:58 ` David Abdurachmanov
  2019-07-30  4:27   ` Atish Patra
@ 2019-07-30  6:56   ` Andreas Schwab
  2019-07-31  0:22     ` Paul Walmsley
  2019-08-15 20:52     ` Atish Patra
  1 sibling, 2 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-07-30  6:56 UTC (permalink / raw)
  To: David Abdurachmanov; +Cc: linux-riscv

On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote:

> On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote:
>>
>> Since switching to 5.2 kernels I'm seeing random crashes and
>> misbehaviors on the HiFive, for example while building gcc or glibc.
>> Perhaps missing TLB flushes?
>
> Do you have some examples of crashes?

While building glibc:

an_ES.UTF-8...realloc(): invalid pointer
/bin/sh: line 1:  7841 Aborted                 (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT
 /glibc-2.29-0.riscv64 $locale
make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134

While building gcc:

../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate':
../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized]
../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
+===========================GNAT BUG DETECTED==============================+
| 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) |
| Storage_Error stack overflow or erroneous memory access                  |
| Error detected at output.ads:39:8                                        |
realloc(): invalid pointer

raised PROGRAM_ERROR : unhandled signal
make[3]: *** [../../gcc/ada/gcc-interface/Make-lang.in:140: ada/exp_ch3.o] Error 1

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-30  6:56   ` Andreas Schwab
@ 2019-07-31  0:22     ` Paul Walmsley
  2019-07-31  7:39       ` Andreas Schwab
                         ` (3 more replies)
  2019-08-15 20:52     ` Atish Patra
  1 sibling, 4 replies; 30+ messages in thread
From: Paul Walmsley @ 2019-07-31  0:22 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-riscv, David Abdurachmanov

[-- Attachment #1: Type: text/plain, Size: 3256 bytes --]

On Tue, 30 Jul 2019, Andreas Schwab wrote:

> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote:
> 
> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote:
> >>
> >> Since switching to 5.2 kernels I'm seeing random crashes and
> >> misbehaviors on the HiFive, for example while building gcc or glibc.
> >> Perhaps missing TLB flushes?
> >
> > Do you have some examples of crashes?
> 
> While building glibc:
> 
> an_ES.UTF-8...realloc(): invalid pointer
> /bin/sh: line 1:  7841 Aborted                 (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT/glibc-2.29-0.riscv64 $locale
> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134
> 
> While building gcc:
> 
> ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate':
> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized]
> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
> +===========================GNAT BUG DETECTED==============================+
> | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) |
> | Storage_Error stack overflow or erroneous memory access                  |
> | Error detected at output.ads:39:8                                        |
> realloc(): invalid pointer

I personally haven't seen these issues; but then again, I haven't done any 
glibc or gcc builds on v5.2.  Will take a closer look.

Reflecting on the recent commits, there weren't too many recent 
RISC-V-specific changes that could have an impact here.  So if these 
problems are relatively repeatable, and they didn't happen with v5.1, 
there are a few patches that might be worth reverting to see if the 
situation improves.  Here would be my short list:

- Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize 
MMU after pte change") 

- Commit 6dd91e0eacff0a5c822ca37565d6b5740c4d2a80 ("RISC-V: defconfig: 
Enable NO_HZ_IDLE and HIGH_RES_TIMERS")

- Commit 671f9a3e2e24cdeb2d2856abee7422f093e23e29 ("RISC-V: Setup initial 
page tables in two stages")

Of course, it's also possible that someone could have made a change 
outside arch/riscv that are causing these problems.  If that's the case, 
we're probably stuck bisecting it.


- Paul

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-31  0:22     ` Paul Walmsley
@ 2019-07-31  7:39       ` Andreas Schwab
  2019-07-31  8:14         ` Anup Patel
  2019-08-01 19:57         ` Palmer Dabbelt
  2019-07-31 10:19       ` Andreas Schwab
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-07-31  7:39 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: linux-riscv, David Abdurachmanov

On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:

> - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize 
> MMU after pte change") 

That would be my favorite.

> - Commit 6dd91e0eacff0a5c822ca37565d6b5740c4d2a80 ("RISC-V: defconfig: 
> Enable NO_HZ_IDLE and HIGH_RES_TIMERS")

I had these enabled forever already.

> - Commit 671f9a3e2e24cdeb2d2856abee7422f093e23e29 ("RISC-V: Setup initial 
> page tables in two stages")

I don't think a one-time initial setup can have such a subtle effect.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-31  7:39       ` Andreas Schwab
@ 2019-07-31  8:14         ` Anup Patel
  2019-08-01 19:57         ` Palmer Dabbelt
  1 sibling, 0 replies; 30+ messages in thread
From: Anup Patel @ 2019-07-31  8:14 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-riscv, David Abdurachmanov, Paul Walmsley

On Wed, Jul 31, 2019 at 1:09 PM Andreas Schwab <schwab@suse.de> wrote:
>
> On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
> > - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize
> > MMU after pte change")
>
> That would be my favorite.
>
> > - Commit 6dd91e0eacff0a5c822ca37565d6b5740c4d2a80 ("RISC-V: defconfig:
> > Enable NO_HZ_IDLE and HIGH_RES_TIMERS")
>
> I had these enabled forever already.
>
> > - Commit 671f9a3e2e24cdeb2d2856abee7422f093e23e29 ("RISC-V: Setup initial
> > page tables in two stages")
>
> I don't think a one-time initial setup can have such a subtle effect.

The initial page table setup patch is not present in 5.2. It was merged in 5.3.

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-31  0:22     ` Paul Walmsley
  2019-07-31  7:39       ` Andreas Schwab
@ 2019-07-31 10:19       ` Andreas Schwab
  2019-07-31 12:57         ` Troy Benjegerdes
  2019-08-01 18:32       ` Andreas Schwab
  2019-08-02  7:25       ` Paul Walmsley
  3 siblings, 1 reply; 30+ messages in thread
From: Andreas Schwab @ 2019-07-31 10:19 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: linux-riscv, David Abdurachmanov

On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:

> - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize 
> MMU after pte change") 

When I revert that commit, I'm getting soft lockups.  Doesn't that point
to some deeper issue with TLB flushes?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-31 10:19       ` Andreas Schwab
@ 2019-07-31 12:57         ` Troy Benjegerdes
  2019-07-31 13:10           ` Andreas Schwab
  0 siblings, 1 reply; 30+ messages in thread
From: Troy Benjegerdes @ 2019-07-31 12:57 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-riscv, David Abdurachmanov, Paul Walmsley



> On Jul 31, 2019, at 5:19 AM, Andreas Schwab <schwab@suse.de> wrote:
> 
> On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
> 
>> - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize 
>> MMU after pte change") 
> 
> When I revert that commit, I'm getting soft lockups.  Doesn't that point
> to some deeper issue with TLB flushes?
> 
> Andreas.

What are you using for filesystem/storage? Is it the SDcard, network, or something else?


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-31 12:57         ` Troy Benjegerdes
@ 2019-07-31 13:10           ` Andreas Schwab
  0 siblings, 0 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-07-31 13:10 UTC (permalink / raw)
  To: Troy Benjegerdes; +Cc: linux-riscv, David Abdurachmanov, Paul Walmsley

On Jul 31 2019, Troy Benjegerdes <troy.benjegerdes@sifive.com> wrote:

> What are you using for filesystem/storage?

NFS.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-31  0:22     ` Paul Walmsley
  2019-07-31  7:39       ` Andreas Schwab
  2019-07-31 10:19       ` Andreas Schwab
@ 2019-08-01 18:32       ` Andreas Schwab
  2019-08-02  2:00         ` Palmer Dabbelt
  2019-08-02  2:15         ` Anup Patel
  2019-08-02  7:25       ` Paul Walmsley
  3 siblings, 2 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-08-01 18:32 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: linux-riscv, opensbi, David Abdurachmanov

On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:

> On Tue, 30 Jul 2019, Andreas Schwab wrote:
>
>> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote:
>> 
>> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote:
>> >>
>> >> Since switching to 5.2 kernels I'm seeing random crashes and
>> >> misbehaviors on the HiFive, for example while building gcc or glibc.
>> >> Perhaps missing TLB flushes?
>> >
>> > Do you have some examples of crashes?
>> 
>> While building glibc:
>> 
>> an_ES.UTF-8...realloc(): invalid pointer
>> /bin/sh: line 1:  7841 Aborted                 (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDR
 OOT/glibc-2.29-0.riscv64 $locale
>> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134
>> 
>> While building gcc:
>> 
>> ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate':
>> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized]
>> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
>> +===========================GNAT BUG DETECTED==============================+
>> | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) |
>> | Storage_Error stack overflow or erroneous memory access                  |
>> | Error detected at output.ads:39:8                                        |
>> realloc(): invalid pointer
>
> I personally haven't seen these issues; but then again, I haven't done any 
> glibc or gcc builds on v5.2.  Will take a closer look.

I think there is some fundamental problem with SBI_REMOTE_SFENCE_VMA or
the kernel interface to it.

For exmaple, flush_tlb_page is defined as:

#define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)

But the third argument of flush_tlb_range is supposed to be the end
address, so this should actually be:

#define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE)

Alas, that doesn't fix the crashes.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-31  7:39       ` Andreas Schwab
  2019-07-31  8:14         ` Anup Patel
@ 2019-08-01 19:57         ` Palmer Dabbelt
  1 sibling, 0 replies; 30+ messages in thread
From: Palmer Dabbelt @ 2019-08-01 19:57 UTC (permalink / raw)
  To: schwab; +Cc: linux-riscv, david.abdurachmanov, Paul Walmsley

On Wed, 31 Jul 2019 00:39:10 PDT (-0700), schwab@suse.de wrote:
> On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
>> - Commit bf587caae305ae3b4393077fb22c98478ee55755 ("riscv: mm: synchronize
>> MMU after pte change")
>
> That would be my favorite.

If that patch causes memory corruption then something scary is going on.  I
haven't been following the thread closely enough to know how easy this is to
reproduce, but do you mind trying a kernel with a reverted version of that
commit?

This is also availiable on the "for-andreas" branch of git.kernel.org/palmer/linux.git

    commit 07d45256aa8641057c141f1a661bb29dd99eb32e
    gpg: Signature made Thu 01 Aug 2019 12:46:22 PM PDT
    gpg:                using RSA key 00CE76D1834960DFCE886DF8EF4CA1502CCBAB41
    gpg:                issuer "palmer@dabbelt.com"
    gpg: Good signature from "Palmer Dabbelt <palmer@dabbelt.com>" [ultimate]
    gpg:                 aka "Palmer Dabbelt <palmer@sifive.com>" [ultimate]
    Author: Palmer Dabbelt <palmer@sifive.com>
    Date:   Thu Aug 1 12:45:12 2019 -0700
    
        Revert "riscv: mm: synchronize MMU after pte change"
    
        Andreas Schwab is seeing some random memory corruption with 5.2, and he
        thinks the reverted comit is the most likely candidate.  The commit
        itself doesn't revert cleanly, but that's just because getting the
        comment right took two commits.
    
        If this does fix the issue then we're in a bit of trouble, as this TLB
        flush should be pretty safe.
    
        This reverts commit bf587caae305ae3b4393077fb22c98478ee55755.
    
        Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
    
    diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
    index f960c3f4ce47..28dccb072255 100644
    --- a/arch/riscv/mm/fault.c
    +++ b/arch/riscv/mm/fault.c
    @@ -16,7 +16,6 @@
    
     #include <asm/pgalloc.h>
     #include <asm/ptrace.h>
    -#include <asm/tlbflush.h>
    
     /*
      * This routine handles page faults.  It determines the address and the
    @@ -267,14 +266,6 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
                    if (!pte_present(*pte_k))
                            goto no_context;
    
    -               /*
    -                * The kernel assumes that TLBs don't cache invalid
    -                * entries, but in RISC-V, SFENCE.VMA specifies an
    -                * ordering constraint, not a cache flush; it is
    -                * necessary even after writing invalid entries.
    -                */
    -               local_flush_tlb_page(addr);
    -
                    return;
            }
     }

>> - Commit 6dd91e0eacff0a5c822ca37565d6b5740c4d2a80 ("RISC-V: defconfig:
>> Enable NO_HZ_IDLE and HIGH_RES_TIMERS")
>
> I had these enabled forever already.

IIRC that was the argument for enabling them in defconfig :)

>> - Commit 671f9a3e2e24cdeb2d2856abee7422f093e23e29 ("RISC-V: Setup initial
>> page tables in two stages")
>
> I don't think a one-time initial setup can have such a subtle effect.

As per Anup, it's not in 5.2.

>
> Andreas.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-01 18:32       ` Andreas Schwab
@ 2019-08-02  2:00         ` Palmer Dabbelt
  2019-08-02  2:15         ` Anup Patel
  1 sibling, 0 replies; 30+ messages in thread
From: Palmer Dabbelt @ 2019-08-02  2:00 UTC (permalink / raw)
  To: schwab; +Cc: linux-riscv, David Abdurachmanov, opensbi, Paul Walmsley

On Thu, 01 Aug 2019 11:32:33 PDT (-0700), schwab@suse.de wrote:
> On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
>> On Tue, 30 Jul 2019, Andreas Schwab wrote:
>>
>>> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote:
>>>
>>> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote:
>>> >>
>>> >> Since switching to 5.2 kernels I'm seeing random crashes and
>>> >> misbehaviors on the HiFive, for example while building gcc or glibc.
>>> >> Perhaps missing TLB flushes?
>>> >
>>> > Do you have some examples of crashes?
>>>
>>> While building glibc:
>>>
>>> an_ES.UTF-8...realloc(): invalid pointer
>>> /bin/sh: line 1:  7841 Aborted                 (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILD
 ROOT/glibc-2.29-0.riscv64
>
>>> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134
>>>
>>> While building gcc:
>>>
>>> ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate':
>>> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized]
>>> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
>>> +===========================GNAT BUG DETECTED==============================+
>>> | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) |
>>> | Storage_Error stack overflow or erroneous memory access                  |
>>> | Error detected at output.ads:39:8                                        |
>>> realloc(): invalid pointer
>>
>> I personally haven't seen these issues; but then again, I haven't done any
>> glibc or gcc builds on v5.2.  Will take a closer look.
>
> I think there is some fundamental problem with SBI_REMOTE_SFENCE_VMA or
> the kernel interface to it.
>
> For exmaple, flush_tlb_page is defined as:
>
> #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
>
> But the third argument of flush_tlb_range is supposed to be the end
> address, so this should actually be:
>
> #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE)
>
> Alas, that doesn't fix the crashes.

This line of reasoning smells like it'd find the issue: BBL just flushes the
entire TLB every time, but IIRC OpenSBI respects the ranges.  It looks like

    Fixes: 90cb4917b584 ("lib: Implement sfence.vma correctly.")

is what introduced the new behavior in OpenSBI, which may have triggered a lot
of latent bugs in Linux.  If you have an easy way to compile OpenSBI, does
something like

    $ git diff | cat
    diff --git a/lib/sbi/sbi_tlb.c b/lib/sbi/sbi_tlb.c
    index cffda52d66ab..007266b1f970 100644
    --- a/lib/sbi/sbi_tlb.c
    +++ b/lib/sbi/sbi_tlb.c
    @@ -133,50 +133,12 @@ static void sbi_tlb_flush_all(void)
    
     static void sbi_tlb_fifo_sfence_vma(struct sbi_tlb_info *tinfo)
     {
    -       unsigned long start = tinfo->start;
    -       unsigned long size  = tinfo->size;
    -       unsigned long i;
    -
    -       if ((start == 0 && size == 0) || (size == SBI_TLB_FLUSH_ALL)) {
    -               sbi_tlb_flush_all();
    -               return;
    -       }
    -
    -       for (i = 0; i < size; i += PAGE_SIZE) {
    -               __asm__ __volatile__("sfence.vma %0"
    -                                    :
    -                                    : "r"(start + i)
    -                                    : "memory");
    -       }
    +       sbi_tlb_flush_all();
     }
    
     static void sbi_tlb_fifo_sfence_vma_asid(struct sbi_tlb_info *tinfo)
     {
    -       unsigned long start = tinfo->start;
    -       unsigned long size  = tinfo->size;
    -       unsigned long asid  = tinfo->asid;
    -       unsigned long i;
    -
    -       if (start == 0 && size == 0) {
    -               sbi_tlb_flush_all();
    -               return;
    -       }
    -
    -       /* Flush entire MM context for a given ASID */
    -       if (size == SBI_TLB_FLUSH_ALL) {
    -               __asm__ __volatile__("sfence.vma x0, %0"
    -                                    :
    -                                    : "r"(asid)
    -                                    : "memory");
    -               return;
    -       }
    -
    -       for (i = 0; i < size; i += PAGE_SIZE) {
    -               __asm__ __volatile__("sfence.vma %0, %1"
    -                                    :
    -                                    : "r"(start + i), "r"(asid)
    -                                    : "memory");
    -       }
    +       sbi_tlb_flush_all();
     }
    
     void sbi_tlb_fifo_process(struct sbi_scratch *scratch, u32 event)

cause the issue to go away?  If so, then I'd bet we need to scour Linux for
broken TLB flushing, as given the one you found is pretty obvious I'd bet
there's a lot more...

>
> Andreas.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-01 18:32       ` Andreas Schwab
  2019-08-02  2:00         ` Palmer Dabbelt
@ 2019-08-02  2:15         ` Anup Patel
  2019-08-05 14:08           ` Andreas Schwab
  1 sibling, 1 reply; 30+ messages in thread
From: Anup Patel @ 2019-08-02  2:15 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-riscv, David Abdurachmanov, OpenSBI, Paul Walmsley

On Fri, Aug 2, 2019 at 12:02 AM Andreas Schwab <schwab@suse.de> wrote:
>
> On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
> > On Tue, 30 Jul 2019, Andreas Schwab wrote:
> >
> >> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote:
> >>
> >> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote:
> >> >>
> >> >> Since switching to 5.2 kernels I'm seeing random crashes and
> >> >> misbehaviors on the HiFive, for example while building gcc or glibc.
> >> >> Perhaps missing TLB flushes?
> >> >
> >> > Do you have some examples of crashes?
> >>
> >> While building glibc:
> >>
> >> an_ES.UTF-8...realloc(): invalid pointer
> >> /bin/sh: line 1:  7841 Aborted                 (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUIL
 DROOT/glibc-2.29-0.riscv64 $locale
> >> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134
> >>
> >> While building gcc:
> >>
> >> ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate':
> >> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
> >> +===========================GNAT BUG DETECTED==============================+
> >> | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) |
> >> | Storage_Error stack overflow or erroneous memory access                  |
> >> | Error detected at output.ads:39:8                                        |
> >> realloc(): invalid pointer
> >
> > I personally haven't seen these issues; but then again, I haven't done any
> > glibc or gcc builds on v5.2.  Will take a closer look.
>
> I think there is some fundamental problem with SBI_REMOTE_SFENCE_VMA or
> the kernel interface to it.
>
> For exmaple, flush_tlb_page is defined as:
>
> #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
>
> But the third argument of flush_tlb_range is supposed to be the end
> address, so this should actually be:
>
> #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE)

Instead of this can you try -1UL as the size:
#define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, -1UL)

If above works for you then there is some issue with range of virtual
memory we flush.

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-31  0:22     ` Paul Walmsley
                         ` (2 preceding siblings ...)
  2019-08-01 18:32       ` Andreas Schwab
@ 2019-08-02  7:25       ` Paul Walmsley
  2019-08-02 12:08         ` Andreas Schwab
  3 siblings, 1 reply; 30+ messages in thread
From: Paul Walmsley @ 2019-08-02  7:25 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-riscv, palmer, David Abdurachmanov


I was able to build glibc, and run most of the test suite, on v5.3-rc2 
with BBL, with no problems so far.  (The test suite is still running.)  
CPU at 1GHz.

Will try gcc in a few hours.  


- Paul

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-02  7:25       ` Paul Walmsley
@ 2019-08-02 12:08         ` Andreas Schwab
  2019-08-02 17:32           ` Paul Walmsley
  0 siblings, 1 reply; 30+ messages in thread
From: Andreas Schwab @ 2019-08-02 12:08 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: linux-riscv, palmer, David Abdurachmanov

On Aug 02 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:

> I was able to build glibc, and run most of the test suite, on v5.3-rc2 

Did you run install-locales?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-02 12:08         ` Andreas Schwab
@ 2019-08-02 17:32           ` Paul Walmsley
  2019-08-05  7:13             ` Andreas Schwab
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Walmsley @ 2019-08-02 17:32 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-riscv, palmer, David Abdurachmanov

On Fri, 2 Aug 2019, Andreas Schwab wrote:

> On Aug 02 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
> 
> > I was able to build glibc, and run most of the test suite, on v5.3-rc2 
> 
> Did you run install-locales?

I just ran "make -j4", "make -j4 check", "make -j4 xcheck".  This is with 
rootfs on microSD, rather than on NFS.  

Do you still see the failures if you only run the above commands, or does 
the failure only appear with install-locales?


- Paul

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-02 17:32           ` Paul Walmsley
@ 2019-08-05  7:13             ` Andreas Schwab
  0 siblings, 0 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-08-05  7:13 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: linux-riscv, palmer, David Abdurachmanov

On Aug 02 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:

> Do you still see the failures if you only run the above commands, or does 
> the failure only appear with install-locales?

Only during install-locales.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-02  2:15         ` Anup Patel
@ 2019-08-05 14:08           ` Andreas Schwab
  2019-08-05 14:34             ` Andreas Schwab
  0 siblings, 1 reply; 30+ messages in thread
From: Andreas Schwab @ 2019-08-05 14:08 UTC (permalink / raw)
  To: Anup Patel; +Cc: linux-riscv, David Abdurachmanov, OpenSBI, Paul Walmsley

On Aug 02 2019, Anup Patel <anup@brainfault.org> wrote:

> Instead of this can you try -1UL as the size:
> #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, -1UL)

That doesn't help either.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-05 14:08           ` Andreas Schwab
@ 2019-08-05 14:34             ` Andreas Schwab
  2019-08-05 15:36               ` Andreas Schwab
  2019-08-05 22:34               ` Atish Patra
  0 siblings, 2 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-08-05 14:34 UTC (permalink / raw)
  To: Anup Patel; +Cc: linux-riscv, Paul Walmsley, OpenSBI, David Abdurachmanov

But this does help:

--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start,
 
 	cpumask_clear(&hmask);
 	riscv_cpuid_to_hartid_mask(cmask, &hmask);
-	sbi_remote_sfence_vma(hmask.bits, start, size);
+	sbi_remote_sfence_vma(hmask.bits, 0, -1);
 }
 
 #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-05 14:34             ` Andreas Schwab
@ 2019-08-05 15:36               ` Andreas Schwab
  2019-08-05 22:34               ` Atish Patra
  1 sibling, 0 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-08-05 15:36 UTC (permalink / raw)
  To: Anup Patel; +Cc: linux-riscv, David Abdurachmanov, OpenSBI, Paul Walmsley

This helps too:

--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -50,10 +50,11 @@ static inline void remote_sfence_vma(struct cpumask *cmask, unsigned long start,
 	cpumask_clear(&hmask);
 	riscv_cpuid_to_hartid_mask(cmask, &hmask);
 	sbi_remote_sfence_vma(hmask.bits, start, size);
+	local_flush_tlb_all();
 }
 
 #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
-#define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
+#define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE)
 #define flush_tlb_range(vma, start, end) \
 	remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) - (start))
 #define flush_tlb_mm(mm) \

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-05 14:34             ` Andreas Schwab
  2019-08-05 15:36               ` Andreas Schwab
@ 2019-08-05 22:34               ` Atish Patra
  2019-08-06  0:25                 ` Troy Benjegerdes
                                   ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Atish Patra @ 2019-08-05 22:34 UTC (permalink / raw)
  To: anup, schwab; +Cc: linux-riscv, david.abdurachmanov, opensbi, paul.walmsley

On Mon, 2019-08-05 at 16:34 +0200, Andreas Schwab wrote:
> But this does help:
> 
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct cpumask
> *cmask, unsigned long start,
>  
>  	cpumask_clear(&hmask);
>  	riscv_cpuid_to_hartid_mask(cmask, &hmask);
> -	sbi_remote_sfence_vma(hmask.bits, start, size);
> +	sbi_remote_sfence_vma(hmask.bits, 0, -1);
>  }
>  
>  #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
> 

I am also able to reprduce the issue while doing a install-locales.
Here is the temporary fix that seems to solve the issue.

diff --git a/arch/riscv/include/asm/tlbflush.h
b/arch/riscv/include/asm/tlbflush.h
index 687dd19735a7..29b2bd7c9923 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -55,7 +55,7 @@ static inline void remote_sfence_vma(struct cpumask
*cmask, unsigned long start,
 #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
 #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
 #define flush_tlb_range(vma, start, end) \
-       remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) -
(start))
+       remote_sfence_vma(mm_cpumask((vma)->vm_mm), 0, -1)
 #define flush_tlb_mm(mm) \
        remote_sfence_vma(mm_cpumask(mm), 0, -1)

Can you please verify at your end?


While your fix flushes the entire tlb for every type of remote tlb
flush, this fix proves that the issue is with flush_tlb_range call
only.

I am looking at the OpenSBI/Kernel implementation to figure out if it
is an OpenSBI issue or something changed in kernel recently to trigger
this.

Additionally, do you know if a particular or group of locale install 
is causing this issue? 

It takes more than hour to finish the full install-locales on unleashe
board which makes it bit difficult to try out possible fixes.


> Andreas.
> 

-- 
Regards,
Atish
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-05 22:34               ` Atish Patra
@ 2019-08-06  0:25                 ` Troy Benjegerdes
  2019-08-06  0:30                   ` Atish Patra
  2019-08-06  6:41                 ` Andreas Schwab
  2019-08-06  7:43                 ` Andreas Schwab
  2 siblings, 1 reply; 30+ messages in thread
From: Troy Benjegerdes @ 2019-08-06  0:25 UTC (permalink / raw)
  To: Atish Patra
  Cc: david.abdurachmanov, anup, opensbi, paul.walmsley, schwab, linux-riscv



> On Aug 5, 2019, at 5:34 PM, Atish Patra <Atish.Patra@wdc.com> wrote:
> 
> On Mon, 2019-08-05 at 16:34 +0200, Andreas Schwab wrote:
>> But this does help:
>> 
>> --- a/arch/riscv/include/asm/tlbflush.h
>> +++ b/arch/riscv/include/asm/tlbflush.h
>> @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct cpumask
>> *cmask, unsigned long start,
>> 
>> 	cpumask_clear(&hmask);
>> 	riscv_cpuid_to_hartid_mask(cmask, &hmask);
>> -	sbi_remote_sfence_vma(hmask.bits, start, size);
>> +	sbi_remote_sfence_vma(hmask.bits, 0, -1);
>> }
>> 
>> #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
>> 
> 
> I am also able to reprduce the issue while doing a install-locales.
> Here is the temporary fix that seems to solve the issue.
> 
> diff --git a/arch/riscv/include/asm/tlbflush.h
> b/arch/riscv/include/asm/tlbflush.h
> index 687dd19735a7..29b2bd7c9923 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -55,7 +55,7 @@ static inline void remote_sfence_vma(struct cpumask
> *cmask, unsigned long start,
> #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
> #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
> #define flush_tlb_range(vma, start, end) \
> -       remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) -
> (start))
> +       remote_sfence_vma(mm_cpumask((vma)->vm_mm), 0, -1)
> #define flush_tlb_mm(mm) \
>        remote_sfence_vma(mm_cpumask(mm), 0, -1)
> 
> Can you please verify at your end?
> 
> 
> While your fix flushes the entire tlb for every type of remote tlb
> flush, this fix proves that the issue is with flush_tlb_range call
> only.
> 
> I am looking at the OpenSBI/Kernel implementation to figure out if it
> is an OpenSBI issue or something changed in kernel recently to trigger
> this.
> 
> Additionally, do you know if a particular or group of locale install 
> is causing this issue? 
> 
> It takes more than hour to finish the full install-locales on unleashe
> board which makes it bit difficult to try out possible fixes.
> 

Did you reproduce with SDcard, or NFS?

> 
>> Andreas.
>> 
> 
> -- 
> Regards,
> Atish
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-06  0:25                 ` Troy Benjegerdes
@ 2019-08-06  0:30                   ` Atish Patra
  0 siblings, 0 replies; 30+ messages in thread
From: Atish Patra @ 2019-08-06  0:30 UTC (permalink / raw)
  To: troy.benjegerdes
  Cc: david.abdurachmanov, anup, opensbi, paul.walmsley, schwab, linux-riscv

On Mon, 2019-08-05 at 19:25 -0500, Troy Benjegerdes wrote:
> > On Aug 5, 2019, at 5:34 PM, Atish Patra <Atish.Patra@wdc.com>
> > wrote:
> > 
> > On Mon, 2019-08-05 at 16:34 +0200, Andreas Schwab wrote:
> > > But this does help:
> > > 
> > > --- a/arch/riscv/include/asm/tlbflush.h
> > > +++ b/arch/riscv/include/asm/tlbflush.h
> > > @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct
> > > cpumask
> > > *cmask, unsigned long start,
> > > 
> > > 	cpumask_clear(&hmask);
> > > 	riscv_cpuid_to_hartid_mask(cmask, &hmask);
> > > -	sbi_remote_sfence_vma(hmask.bits, start, size);
> > > +	sbi_remote_sfence_vma(hmask.bits, 0, -1);
> > > }
> > > 
> > > #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
> > > 
> > 
> > I am also able to reprduce the issue while doing a install-locales.
> > Here is the temporary fix that seems to solve the issue.
> > 
> > diff --git a/arch/riscv/include/asm/tlbflush.h
> > b/arch/riscv/include/asm/tlbflush.h
> > index 687dd19735a7..29b2bd7c9923 100644
> > --- a/arch/riscv/include/asm/tlbflush.h
> > +++ b/arch/riscv/include/asm/tlbflush.h
> > @@ -55,7 +55,7 @@ static inline void remote_sfence_vma(struct
> > cpumask
> > *cmask, unsigned long start,
> > #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
> > #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
> > #define flush_tlb_range(vma, start, end) \
> > -       remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) -
> > (start))
> > +       remote_sfence_vma(mm_cpumask((vma)->vm_mm), 0, -1)
> > #define flush_tlb_mm(mm) \
> >        remote_sfence_vma(mm_cpumask(mm), 0, -1)
> > 
> > Can you please verify at your end?
> > 
> > 
> > While your fix flushes the entire tlb for every type of remote tlb
> > flush, this fix proves that the issue is with flush_tlb_range call
> > only.
> > 
> > I am looking at the OpenSBI/Kernel implementation to figure out if
> > it
> > is an OpenSBI issue or something changed in kernel recently to
> > trigger
> > this.
> > 
> > Additionally, do you know if a particular or group of locale
> > install 
> > is causing this issue? 
> > 
> > It takes more than hour to finish the full install-locales on
> > unleashe
> > board which makes it bit difficult to try out possible fixes.
> > 
> 
> Did you reproduce with SDcard, or NFS?
> 

I am running it on a nvme SSD attached to Microsemi expansion board.

Kernel version: 5.3-rc2
OpenSBI/U-Boot: Latest master

Regards,
Atish

> > > Andreas.
> > > 
> > 
> > -- 
> > Regards,
> > Atish
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-05 22:34               ` Atish Patra
  2019-08-06  0:25                 ` Troy Benjegerdes
@ 2019-08-06  6:41                 ` Andreas Schwab
  2019-08-06  7:43                 ` Andreas Schwab
  2 siblings, 0 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-08-06  6:41 UTC (permalink / raw)
  To: Atish Patra
  Cc: anup, linux-riscv, david.abdurachmanov, opensbi, paul.walmsley

On Aug 05 2019, Atish Patra <Atish.Patra@wdc.com> wrote:

> It takes more than hour to finish the full install-locales on unleashe
> board which makes it bit difficult to try out possible fixes.

When it fails it usually fails pretty fast.  Did you run it in parallel?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-05 22:34               ` Atish Patra
  2019-08-06  0:25                 ` Troy Benjegerdes
  2019-08-06  6:41                 ` Andreas Schwab
@ 2019-08-06  7:43                 ` Andreas Schwab
  2 siblings, 0 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-08-06  7:43 UTC (permalink / raw)
  To: Atish Patra
  Cc: anup, linux-riscv, david.abdurachmanov, opensbi, paul.walmsley

On Aug 05 2019, Atish Patra <Atish.Patra@wdc.com> wrote:

> On Mon, 2019-08-05 at 16:34 +0200, Andreas Schwab wrote:
>> But this does help:
>> 
>> --- a/arch/riscv/include/asm/tlbflush.h
>> +++ b/arch/riscv/include/asm/tlbflush.h
>> @@ -49,7 +49,7 @@ static inline void remote_sfence_vma(struct cpumask
>> *cmask, unsigned long start,
>>  
>>  	cpumask_clear(&hmask);
>>  	riscv_cpuid_to_hartid_mask(cmask, &hmask);
>> -	sbi_remote_sfence_vma(hmask.bits, start, size);
>> +	sbi_remote_sfence_vma(hmask.bits, 0, -1);
>>  }
>>  
>>  #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
>> 
>
> I am also able to reprduce the issue while doing a install-locales.
> Here is the temporary fix that seems to solve the issue.
>
> diff --git a/arch/riscv/include/asm/tlbflush.h
> b/arch/riscv/include/asm/tlbflush.h
> index 687dd19735a7..29b2bd7c9923 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -55,7 +55,7 @@ static inline void remote_sfence_vma(struct cpumask
> *cmask, unsigned long start,
>  #define flush_tlb_all() sbi_remote_sfence_vma(NULL, 0, -1)
>  #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
>  #define flush_tlb_range(vma, start, end) \
> -       remote_sfence_vma(mm_cpumask((vma)->vm_mm), start, (end) -
> (start))
> +       remote_sfence_vma(mm_cpumask((vma)->vm_mm), 0, -1)
>  #define flush_tlb_mm(mm) \
>         remote_sfence_vma(mm_cpumask(mm), 0, -1)
>
> Can you please verify at your end?

This is equivalent to my patch since all other uses already pass 0,-1.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-07-30  6:56   ` Andreas Schwab
  2019-07-31  0:22     ` Paul Walmsley
@ 2019-08-15 20:52     ` Atish Patra
  2019-08-16  5:22       ` Atish Patra
  2019-08-19 10:53       ` Andreas Schwab
  1 sibling, 2 replies; 30+ messages in thread
From: Atish Patra @ 2019-08-15 20:52 UTC (permalink / raw)
  To: david.abdurachmanov, schwab; +Cc: linux-riscv

On Tue, 2019-07-30 at 08:56 +0200, Andreas Schwab wrote:
> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com>
> wrote:
> 
> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de>
> > wrote:
> > > Since switching to 5.2 kernels I'm seeing random crashes and
> > > misbehaviors on the HiFive, for example while building gcc or
> > > glibc.
> > > Perhaps missing TLB flushes?
> > 
> > Do you have some examples of crashes?
> 
> While building glibc:
> 
> an_ES.UTF-8...realloc(): invalid pointer
> /bin/sh: line 1:  7841 Aborted                 (core dumped)
> I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/elf/ld-linux-riscv64-lp64d.so.1 --library-path
> /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl
> /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef
> $flags --alias-file=../intl/locale.alias -i locales/$input -f
> charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT/glibc-
> 2.29-0.riscv64 $locale
> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error
> 134
> 
> While building gcc:
> 
> ../../gcc/ada/exp_aggr.adb: In function
> 'Exp_Aggr.Expand_N_Aggregate':
> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
> +===========================GNAT BUG
> DETECTED==============================+
> > 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-
> > suse-linux) |
> > Storage_Error stack overflow or erroneous memory
> > access                  |
> > Error detected at
> > output.ads:39:8                                        |
> realloc(): invalid pointer
> 
> raised PROGRAM_ERROR : unhandled signal
> make[3]: *** [../../gcc/ada/gcc-interface/Make-lang.in:140:
> ada/exp_ch3.o] Error 1
> 
> Andreas.
> 

Can you give it a try with following patch in OpenSBI & Kernel ?

Linux kernel:
http://lists.infradead.org/pipermail/linux-riscv/2019-August/005889.html

OpenSBI:
http://lists.infradead.org/pipermail/opensbi/2019-August/000386.html

In my testing, I no longer the stress-ng error or glibc local install
issue if I use following command.

sudo make -j8 localedata/install-locale-files
DESTDIR=/home/atish/glibc/build/install


I still see segmentation fault if I use a archieve locale install
command.

sudo make -j8 localedata/install-locales
DESTDIR=/home/atish/glibc/build/install

But the error dump doesn't contain remap() error. Just a segmentation
fault which may be due to userspace or just different version of old
tlbflush problem.


Regards,
Atish

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-15 20:52     ` Atish Patra
@ 2019-08-16  5:22       ` Atish Patra
  2019-08-16 15:38         ` Troy Benjegerdes
  2019-08-19 10:53       ` Andreas Schwab
  1 sibling, 1 reply; 30+ messages in thread
From: Atish Patra @ 2019-08-16  5:22 UTC (permalink / raw)
  To: david.abdurachmanov, schwab; +Cc: linux-riscv

On Thu, 2019-08-15 at 13:52 -0700, Atish Patra wrote:
> On Tue, 2019-07-30 at 08:56 +0200, Andreas Schwab wrote:
> > On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com>
> > wrote:
> > 
> > > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de>
> > > wrote:
> > > > Since switching to 5.2 kernels I'm seeing random crashes and
> > > > misbehaviors on the HiFive, for example while building gcc or
> > > > glibc.
> > > > Perhaps missing TLB flushes?
> > > 
> > > Do you have some examples of crashes?
> > 
> > While building glibc:
> > 
> > an_ES.UTF-8...realloc(): invalid pointer
> > /bin/sh: line 1:  7841 Aborted                 (core dumped)
> > I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/elf/ld-linux-riscv64-lp64d.so.1 --library-path
> > /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
> > base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl
> > /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef
> > $flags --alias-file=../intl/locale.alias -i locales/$input -f
> > charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT/glibc-
> > 2.29-0.riscv64 $locale
> > make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8]
> > Error
> > 134
> > 
> > While building gcc:
> > 
> > ../../gcc/ada/exp_aggr.adb: In function
> > 'Exp_Aggr.Expand_N_Aggregate':
> > ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> > ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
> > +===========================GNAT BUG
> > DETECTED==============================+
> > > 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-
> > > suse-linux) |
> > > Storage_Error stack overflow or erroneous memory
> > > access                  |
> > > Error detected at
> > > output.ads:39:8                                        |
> > realloc(): invalid pointer
> > 
> > raised PROGRAM_ERROR : unhandled signal
> > make[3]: *** [../../gcc/ada/gcc-interface/Make-lang.in:140:
> > ada/exp_ch3.o] Error 1
> > 
> > Andreas.
> > 
> 
> Can you give it a try with following patch in OpenSBI & Kernel ?
> 
> Linux kernel:
> http://lists.infradead.org/pipermail/linux-riscv/2019-August/005889.html
> 
> OpenSBI:
> http://lists.infradead.org/pipermail/opensbi/2019-August/000386.html
> 
> In my testing, I no longer the stress-ng error or glibc local install
> issue if I use following command.
> 
> sudo make -j8 localedata/install-locale-files
> DESTDIR=/home/atish/glibc/build/install
> 
> 
> I still see segmentation fault if I use a archieve locale install
> command.
> 
> sudo make -j8 localedata/install-locales
> DESTDIR=/home/atish/glibc/build/install
> 

I am also able to run above archive locale install command successfully
multiple times after removing the corrupted locale-archive files
present in install path. 

Let me know if it works for you as well. 

I am now running stress-ng & parallel glibc locale install together to
fully stress the system.

Regards,
Atish
> But the error dump doesn't contain remap() error. Just a segmentation
> fault which may be due to userspace or just different version of old
> tlbflush problem.
> 
> 
> Regards,
> Atish
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-16  5:22       ` Atish Patra
@ 2019-08-16 15:38         ` Troy Benjegerdes
  0 siblings, 0 replies; 30+ messages in thread
From: Troy Benjegerdes @ 2019-08-16 15:38 UTC (permalink / raw)
  To: Atish Patra; +Cc: schwab, linux-riscv, david.abdurachmanov



> On Aug 15, 2019, at 10:22 PM, Atish Patra <Atish.Patra@wdc.com> wrote:
> 
> On Thu, 2019-08-15 at 13:52 -0700, Atish Patra wrote:
>> On Tue, 2019-07-30 at 08:56 +0200, Andreas Schwab wrote:
>>> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com>
>>> wrote:
>>> 
>>>> On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de>
>>>> wrote:
>>>>> Since switching to 5.2 kernels I'm seeing random crashes and
>>>>> misbehaviors on the HiFive, for example while building gcc or
>>>>> glibc.
>>>>> Perhaps missing TLB flushes?
>>>> 
>>>> Do you have some examples of crashes?
>>> 
>>> While building glibc:
>>> 
>>> an_ES.UTF-8...realloc(): invalid pointer
>>> /bin/sh: line 1:  7841 Aborted                 (core dumped)
>>> I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/elf/ld-linux-riscv64-lp64d.so.1 --library-path
>>> /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-
>>> base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl
>>> /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef
>>> $flags --alias-file=../intl/locale.alias -i locales/$input -f
>>> charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILDROOT/glibc-
>>> 2.29-0.riscv64 $locale
>>> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8]
>>> Error
>>> 134
>>> 
>>> While building gcc:
>>> 
>>> ../../gcc/ada/exp_aggr.adb: In function
>>> 'Exp_Aggr.Expand_N_Aggregate':
>>> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used
>>> uninitialized in this function [-Wmaybe-uninitialized]
>>> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
>>> +===========================GNAT BUG
>>> DETECTED==============================+
>>>> 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-
>>>> suse-linux) |
>>>> Storage_Error stack overflow or erroneous memory
>>>> access                  |
>>>> Error detected at
>>>> output.ads:39:8                                        |
>>> realloc(): invalid pointer
>>> 
>>> raised PROGRAM_ERROR : unhandled signal
>>> make[3]: *** [../../gcc/ada/gcc-interface/Make-lang.in:140:
>>> ada/exp_ch3.o] Error 1
>>> 
>>> Andreas.
>>> 
>> 
>> Can you give it a try with following patch in OpenSBI & Kernel ?
>> 
>> Linux kernel:
>> http://lists.infradead.org/pipermail/linux-riscv/2019-August/005889.html
>> 
>> OpenSBI:
>> http://lists.infradead.org/pipermail/opensbi/2019-August/000386.html
>> 
>> In my testing, I no longer the stress-ng error or glibc local install
>> issue if I use following command.
>> 
>> sudo make -j8 localedata/install-locale-files
>> DESTDIR=/home/atish/glibc/build/install
>> 
>> 
>> I still see segmentation fault if I use a archieve locale install
>> command.
>> 
>> sudo make -j8 localedata/install-locales
>> DESTDIR=/home/atish/glibc/build/install
>> 
> 
> I am also able to run above archive locale install command successfully
> multiple times after removing the corrupted locale-archive files
> present in install path. 
> 
> Let me know if it works for you as well. 
> 
> I am now running stress-ng & parallel glibc locale install together to
> fully stress the system.
> 
> Regards,
> Atish
>> But the error dump doesn't contain remap() error. Just a segmentation
>> fault which may be due to userspace or just different version of old
>> tlbflush problem.
>> 
>> 
>> Regards,
>> Atish
>> 
> 
> 

Is this with stock linux-5.2.8 release, with no additional patches, or is there something we need to look at backporting?


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Random memory corruption with v5.2
  2019-08-15 20:52     ` Atish Patra
  2019-08-16  5:22       ` Atish Patra
@ 2019-08-19 10:53       ` Andreas Schwab
  1 sibling, 0 replies; 30+ messages in thread
From: Andreas Schwab @ 2019-08-19 10:53 UTC (permalink / raw)
  To: Atish Patra; +Cc: linux-riscv, david.abdurachmanov

On Aug 15 2019, Atish Patra <Atish.Patra@wdc.com> wrote:

> Linux kernel:
> http://lists.infradead.org/pipermail/linux-riscv/2019-August/005889.html

I've been using that patch, without any changes to openSBI, to run
bootstrap/regtest on gcc and to build glibc without issues.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-08-19 10:53 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 10:51 Random memory corruption with v5.2 Andreas Schwab
2019-07-29 22:58 ` David Abdurachmanov
2019-07-30  4:27   ` Atish Patra
2019-07-30  6:56   ` Andreas Schwab
2019-07-31  0:22     ` Paul Walmsley
2019-07-31  7:39       ` Andreas Schwab
2019-07-31  8:14         ` Anup Patel
2019-08-01 19:57         ` Palmer Dabbelt
2019-07-31 10:19       ` Andreas Schwab
2019-07-31 12:57         ` Troy Benjegerdes
2019-07-31 13:10           ` Andreas Schwab
2019-08-01 18:32       ` Andreas Schwab
2019-08-02  2:00         ` Palmer Dabbelt
2019-08-02  2:15         ` Anup Patel
2019-08-05 14:08           ` Andreas Schwab
2019-08-05 14:34             ` Andreas Schwab
2019-08-05 15:36               ` Andreas Schwab
2019-08-05 22:34               ` Atish Patra
2019-08-06  0:25                 ` Troy Benjegerdes
2019-08-06  0:30                   ` Atish Patra
2019-08-06  6:41                 ` Andreas Schwab
2019-08-06  7:43                 ` Andreas Schwab
2019-08-02  7:25       ` Paul Walmsley
2019-08-02 12:08         ` Andreas Schwab
2019-08-02 17:32           ` Paul Walmsley
2019-08-05  7:13             ` Andreas Schwab
2019-08-15 20:52     ` Atish Patra
2019-08-16  5:22       ` Atish Patra
2019-08-16 15:38         ` Troy Benjegerdes
2019-08-19 10:53       ` Andreas Schwab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.